137 112 95MB
English Pages 935 [928] Year 2022
LNAI 13395
De-Shuang Huang · Kang-Hyun Jo · Junfeng Jing · Prashan Premaratne · Vitoantonio Bevilacqua · Abir Hussain (Eds.)
Intelligent Computing Methodologies 18th International Conference, ICIC 2022 Xi’an, China, August 7–11, 2022 Proceedings, Part III
Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Series Editors Randy Goebel University of Alberta, Edmonton, Canada Wolfgang Wahlster DFKI, Berlin, Germany Zhi-Hua Zhou Nanjing University, Nanjing, China
Founding Editor Jörg Siekmann DFKI and Saarland University, Saarbrücken, Germany
13395
More information about this subseries at https://link.springer.com/bookseries/1244
De-Shuang Huang · Kang-Hyun Jo · Junfeng Jing · Prashan Premaratne · Vitoantonio Bevilacqua · Abir Hussain (Eds.)
Intelligent Computing Methodologies 18th International Conference, ICIC 2022 Xi’an, China, August 7–11, 2022 Proceedings, Part III
Editors De-Shuang Huang Tongji University Shanghai, China
Kang-Hyun Jo University of Ulsan Ulsan, Korea (Republic of)
Junfeng Jing Xi’an Polytechnic University Xi’an, China
Prashan Premaratne The University of Wollongong North Wollongong, NSW, Australia
Vitoantonio Bevilacqua Polytecnic of Bari Bari, Italy
Abir Hussain Liverpool John Moores University Liverpool, UK
ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Artificial Intelligence ISBN 978-3-031-13831-7 ISBN 978-3-031-13832-4 (eBook) https://doi.org/10.1007/978-3-031-13832-4 LNCS Sublibrary: SL7 – Artificial Intelligence © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022, corrected publication 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The International Conference on Intelligent Computing (ICIC) was started to provide an annual forum dedicated to the emerging and challenging topics in artificial intelligence, machine learning, pattern recognition, bioinformatics, and computational biology. It aims to bring together researchers and practitioners from both academia and industry to share ideas, problems, and solutions related to the multifaceted aspects of intelligent computing. ICIC 2022, held in Xi’an, China, during August 7–11, 2022, constituted the 18th International Conference on Intelligent Computing. It built upon the success of the previous ICIC events held at various locations in China (2005–2008, 2010–2016, 2018– 2019, 2021) and in Ulsan, South Korea (2009), Liverpool, UK (2017), and Bari, Italy (2020). This year, the conference concentrated mainly on the theories, methodologies, and emerging applications of intelligent computing. Its aim was to unify the picture of contemporary intelligent computing techniques as an integral concept that highlights the trends in advanced computational intelligence and bridges theoretical research with applications. Therefore, the theme for this conference was “Advanced Intelligent Computing Technology and Applications”. Papers focused on this theme were solicited, addressing theories, methodologies, and applications in science and technology. ICIC 2022 received 449 submissions from authors in 21 countries and regions. All papers went through a rigorous peer-review procedure and each paper received at least three review reports. Based on the review reports, the Program Committee finally selected 209 high-quality papers for presentation at ICIC 2022, which are included in three volumes of proceedings published by Springer: two volumes of Lecture Notes in Computer Science (LNCS) and one volume of Lecture Notes in Artificial Intelligence (LNAI). Among the 449 submissions to the conference were 57 submissions for the six special sessions and nine workshops featured the ICIC this year. All these submissions were reviewed by members from the main Program Committee and 22 high-quality papers were selected for presentation at ICIC 2022 and included in the proceedings based on the topic. This volume of Lecture Notes in Artificial Intelligence (LNAI) includes 71 papers. The organizers of ICIC 2022, including the EIT Institute for Advanced Study, Xi’an Polytechnic University, Shenzhen University, and the Guangxi Academy of Sciences, made an enormous effort to ensure the success of the conference. We hereby would like to thank the members of the Program Committee and the referees for their collective effort in reviewing and soliciting the papers. In particular, we would like to thank all the authors for contributing their papers. Without the high-quality submissions from the authors, the success of the conference would not have been possible. Finally, we are
vi
Preface
especially grateful to the International Neural Network Society and the National Science Foundation of China for their sponsorship. June 2022
De-Shuang Huang Kang-Hyun Jo Junfeng Jing Prashan Premaratne Vitoantonio Bevilacqua Abir Hussain
Organization
General Co-chairs De-Shuang Huang Haiyan Wang
Tongji University, China Xi’an Polytechnic University, China
Program Committee Co-chairs Kang-Hyun Jo Junfeng Jing Prashan Premaratne Vitoantonio Bevilacqua Abir Hussain
University of Ulsan, South Korea Xi’an Polytechnic University, China University of Wollongong, Australia Polytechnic University of Bari, Italy Liverpool John Moores University, UK
Organizing Committee Co-chairs Pengfei Li Kaibing Zhang Lei Zhang
Xi’an Polytechnic University, China Xi’an Polytechnic University, China Xi’an Polytechnic University, China
Organizing Committee Hongwei Zhang Minqi Li Zhaoliang Meng Peng Song
Xi’an Polytechnic University, China Xi’an Polytechnic University, China Xi’an Polytechnic University, China Xi’an Polytechnic University, China
Award Committee Co-chairs Kyungsook Han Valeriya Gribova
Inha University, South Korea Far Eastern Branch of the Russian Academy of Sciences, Russia
Tutorial Co-chairs Ling Wang M. Michael Gromiha
Tsinghua University, China Indian Institute of Technology Madras, India
viii
Organization
Publication Co-chairs Michal Choras Hong-Hee Lee Laurent Heutte
Bydgoszcz University of Science and Technology, Poland University of Ulsan, South Korea Université de Rouen Normandie, France
Special Session Co-chairs Yu-Dong Zhang Vitoantonio Bevilacqua Hee-Jun Kang
University of Leicester, UK Polytechnic University of Bari, Italy University of Ulsan, South Korea
Special Issue Co-chairs Yoshinori Kuno Phalguni Gupta
Saitama University, Japan Indian Institute of Technology Kanpur, India
International Liaison Co-chair Prashan Premaratne
University of Wollongong, Australia
Workshop Co-chairs Jair Cervantes Canales Chenxi Huang Dhiya Al-Jumeily
Autonomous University of Mexico State, Mexico Xiamen University, China Liverpool John Moores University, UK
Publicity Co-chairs Chun-Hou Zheng Dhiya Al-Jumeily Jair Cervantes Canales
Anhui University, China Liverpool John Moores University, UK Autonomous University of Mexico State, Mexico
Sponsors and Exhibits Chair Qinghu Zhang
Tongji University, China
Program Committee Abir Hussain Angelo Ciaramella Antonino Staiano Antonio Brunetti
Liverpool John Moores University, UK Parthenope University of Naples, Italy Parthenope University of Naples, Italy Polytechnic University of Bari, Italy
Organization
Bai Xue Baitong Chen Ben Niu Bin Liu Bin Qian Bin Wang Bin Yang Bingqiang Liu Binhua Tang Bo Li Bo Liu Bohua Zhan Changqing Shen Chao Song Chenxi Huang Chin-Chih Chang Chunhou Zheng Chunmei Liu Chunquan Li Dah-Jing Jwo Dakshina Ranjan Kisku Daowen Qiu Dhiya Al-Jumeily Domenico Buongiorno Dong Wang Dong-Joong Kang Dunwei Gong Eros Gian Pasero Evi Sjukur Fa Zhang Fabio Stroppa Fei Han Fei Guo Fei Luo Fengfeng Zhou Gai-Ge Wang Giovanni Dimauro Guojun Dai
ix
Institute of Software, CAS, China Xuzhou No. 1 Peoples Hospital, China Shenzhen University, China Beijing Institute of Technology, China Kunming University of Science and Technology, China Anhui University of Technology, China Zaozhuang University, China Shandong University, China Hohai University, China Wuhan University of Science and Technology, China Academy of Mathematics and Systems Science, CAS, China Institute of Software, CAS, China Soochow University, China Harbin Medical University, China Xiamen University, China Chung Hua University, Taiwan, China Anhui University, China Howard University, USA Harbin Medical University, China National Taiwan Ocean University, Taiwan, China National Institute of Technology Durgapur, India Sun Yat-sen University, China Liverpool John Moores University, UK Politecnico di Bari, Italy University of Jinan, China Pusan National University, South Korea China University of Mining and Technology, China Politecnico di Torino, Italy Monash University, Australia Institute of Computing Technology, CAS, China Stanford University, USA Jiangsu University, China Central South University, China Wuhan University, China Jilin University, China Ocean University of China, China University of Bari, Italy Hangzhou Dianzi University, China
x
Organization
Haibin Liu Han Zhang Hao Lin Haodi Feng Ho-Jin Choi Hong-Hee Lee Hongjie Wu Hongmin Cai Jair Cervantes Jian Huang Jian Wang Jiangning Song Jiawei Luo Jieren Cheng Jing Hu Jing-Yan Wang Jinwen Ma Jin-Xing Liu Ji-Xiang Du Joaquin Torres-Sospedra Juan Liu Junfeng Man Junfeng Xia Jungang Lou Junqi Zhang Ka-Chun Wong Kanghyun Jo Kyungsook Han Lejun Gong Laurent Heutte Le Zhang Lin Wang Ling Wang Li-Wei Ko
Beijing University of Technology, China Nankai University, China University of Electronic Science and Technology of China, China Shandong University, China Korea Advanced Institute of Science and Technology, South Korea University of Ulsan, South Korea Suzhou University of Science and Technology, China South China University of Technology, China Autonomous University of Mexico State, Mexico University of Electronic Science and Technology of China, China China University of Petroleum (East China), China Monash University, Australia Hunan University, China Hainan University, China Wuhan University of Science and Technology, China Abu Dhabi Department of Community Development, UAE Peking University, China Qufu Normal University, China Huaqiao University, China Universidade do Minho, Portugal Wuhan University, China Hunan First Normal University, China Anhui University, China Huzhou University, China Tongji University, China City University of Hong Kong, Hong Kong, China University of Ulsan, South Korea Inha University, South Korea Nanjing University of Posts and Telecommunications, China Université de Rouen Normandie, France Sichuan University, China University of Jinan, China Tsinghua University, China National Yang Ming Chiao Tung University, Taiwan, China
Organization
Marzio Pennisi Michael Gromiha Michal Choras Mine Sarac Mohd Helmy Abd Wahab Na Zhang Nicholas Caporusso Nicola Altini Peng Chen Pengjiang Qian Phalguni Gupta Ping Guo Prashan Premaratne Pu-Feng Du Qi Zhao Qingfeng Chen Qinghua Jiang Quan Zou Rui Wang Ruiping Wang Saiful Islam Seeja K. R. Shanfeng Zhu Shanwen Wang Shen Yin Shihua Zhang Shihua Zhang Shikui Tu Shitong Wang Shixiong Zhang Shunren Xia Sungshin Kim Surya Prakash Takashi Kuremoto Tao Zeng
xi
University of Eastern Piedmont, Italy Indian Institute of Technology Madras, India Bydgoszcz University of Science and Technology, Poland Stanford University, USA, and Kadir Has University, Turkey Universiti Tun Hussein Onn Malaysia, Malaysia Xuzhou Medical University, China Northern Kentucky University, USA Polytechnic University of Bari, Italy Anhui University, China Jiangnan University, China GLA University, India Beijing Normal University, China University of Wollongong, Australia Tianjin University, China University of Science and Technology Liaoning, China Guangxi University, China Harbin Institute of Technology, China University of Electronic Science and Technology of China, China National University of Defense Technology, China Institute of Computing Technology, CAS, China Aligarh Muslim University, India Indira Gandhi Delhi Technical University for Women, India Fudan University, China Xijing University, China Harbin Institute of Technology, China Academy of Mathematics and Systems Science, CAS, China Wuhan University of Science and Technology, China Shanghai Jiao Tong University, China Jiangnan University, China Xidian University, China Zhejiang University, China Pusan National University, South Korea Indian Institute Technology Indore, India Nippon Institute of Technology, Japan Guangzhou Laboratory, China
xii
Organization
Tatsuya Akutsu Tieshan Li Valeriya Gribova
Vincenzo Randazzo Waqas Haider Bangyal Wei Chen Wei Jiang Wei Peng Wei Wei Wei-Chiang Hong Weidong Chen Weihong Deng Weixiang Liu Wen Zhang Wenbin Liu Wen-Sheng Chen Wenzheng Bao Xiangtao Li Xiaodi Li Xiaofeng Wang Xiao-Hua Yu Xiaoke Ma Xiaolei Zhu Xiaoli Lin Xiaoqi Zheng Xin Yin Xin Zhang Xinguo Lu Xingwen Liu Xiujuan Lei Xiwei Liu Xiyuan Chen Xuequn Shang
Kyoto University, Japan University of Electronic Science and Technology of China, China Institute of Automation and Control Processes, Far Eastern Branch of the Russian Academy of Sciences, Russia Politecnico di Torino, Italy University of Gujrat, Pakistan Chengdu University of Traditional Chinese Medicine, China Nanjing University of Aeronautics and Astronautics, China Kunming University of Science and Technology, China Tencent Technology, Norway Asia Eastern University of Science and Technology, Taiwan, China Shanghai Jiao Tong University, China Beijing University of Posts and Telecommunications, China Shenzhen University, China Huazhong Agricultural University, China Guangzhou University, China Shenzhen University, China Xuzhou University of Technology, China Jilin University, China Shandong Normal University, China Hefei University, China California Polytechnic State University, USA Xidian University, China Anhui Agricultural University, China Wuhan University of Science and Technology, China Shanghai Normal University, China Laxco Inc., USA Jiangnan University, China Hunan University, China Southwest Minzu University, China Shaanxi Normal University, China Tongji University, China Southeast University, China Northwestern Polytechnical University, China
Organization
Xuesong Wang Xuesong Yan Xu-Qing Tang Yan-Rui Ding Yansen Su Yi Gu Yi Xiong Yizhang Jiang Yong-Quan Zhou Yonggang Lu Yoshinori Kuno Yu Xue Yuan-Nong Ye Yu-Dong Zhang Yue Ming Yunhai Wang Yupei Zhang Yushan Qiu Zhanheng Chen Zhan-Li Sun Zhen Lei Zhendong Liu Zhenran Jiang Zhenyu Xuan Zhi-Hong Guan Zhi-Ping Liu Zhiqiang Geng Zhongqiu Zhao Zhu-Hong You Zhuo Wang Zuguo Yu
xiii
China University of Mining and Technology, China China University of Geosciences, China Jiangnan University, China Jiangnan University, China Anhui University, China Jiangnan University, China Shanghai Jiao Tong University, China Jiangnan University, China Guangxi University for Nationalities, China Lanzhou University, China Saitama University, Japan Huazhong University of Science and Technology, China Guizhou Medical University, China University of Leicester, UK Beijing University of Posts and Telecommunications, China Shandong University, China Northwestern Polytechnical University, China Shenzhen University, China Shenzhen University, China Anhui University, China Institute of Automation, CAS, China Shandong Jianzhu University, China East China Normal University, China University of Texas at Dallas, USA Huazhong University of Science and Technology, China Shandong University, China Beijing University of Chemical Technology, China Hefei University of Technology, China Northwestern Polytechnical University, China Hangzhou Dianzi University, China Xiangtan University, China
Contents – Part III
Fuzzy Theory and Algorithms An Incremental Approach Based on Hierarchical Classification in Multikernel Fuzzy Rough Sets Under the Variation of Object Set . . . . . . . . . . . Wei Fan, Chunlin He, Anping Zeng, and Ke Lin
3
A Clustering Method Based on Improved Density Estimation and Shared Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying Guan, Yaru Li, Bin Li, and Yonggang Lu
18
Bagging-AdaTSK: An Ensemble Fuzzy Classifier for High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guangdong Xue, Bingjie Zhang, Xiaoling Gong, and Jian Wang
32
Some Results on the Dominance Relation Between Conjunctions and Disjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lizhu Zhang and Gang Li
44
Robust Virtual Sensors Design for Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . Alexey Zhirabok, Alexander Zuev, Vladimir Filaretov, Changan Yuan, Alexander Protcenko, and Kim Chung Il Clustering Analysis in the Student Academic Activities on COVID-19 Pandemic in Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G. Miranda-Piña, R. Alejo, E. Rendón, E. E. Granda-Gutíerrez, R. M. Valdovinos, and F. del Razo-López Application of Stewart Platform as a Haptic Device for Teleoperation of a Mobile Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Duc-Vinh Le and CheolKeun Ha Geometric Parameters Calibration Method for Multilink Manipulators . . . . . . . . Anton Gubankov, Dmitry Yukhimets, Vladimir Filaretov, and Changan Yuan
55
67
80
93
A Kind of PWM DC Motor Speed Regulation System Based on STM32 with Fuzzy-PID Dual Closed-Loop Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Wang Lu, Zhang Zaitian, Cheng Xuwei, Ren Haoyu, Chen Jianzhou, Qiu Fengqi, Yan Zitong, Zhang Xin, and Zhang Li
xvi
Contents – Part III
Machine Learning and Data Mining Research on Exchange and Management Platform of Enterprise Power Data Uniftcation Summit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Yangming Yu, Zhiyong Zha, Bo Jin, Geng Wu, and Chenxi Dong Application of Deep Learning Autoencoders as Features Extractor of Diabetic Foot Ulcer Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Abbas Saad Alatrany, Abir Hussain, Saad S. J. Alatrany, and Dhiya Al-Jumaily MPCNN with Knowledge Augmentation: A Model for Chinese Text Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Xiaozeng Zhang and Ailian Fang An Improved Mobilenetv2 for Rust Detection of Angle Steel Tower Bolts Based on Small Sample Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Zhiyu Cheng, Jun Liu, and Jinfeng Zhang Generate Judge-View of Online Dispute Resolution Based on Pretrained-Model Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Qinhua Huang and Weimin Ouyang An Effective Chinese Text Classification Method with Contextualized Weak Supervision for Review Autograding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Yupei Zhang, Md Shahedul Islam Khan, Yaya Zhou, Min Xiao, and Xuequn Shang Comparison of Subjective and Physiological Stress Levels in Home and Office Work Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Matthew Harper, Fawaz Ghali, and Wasiq Khan Cross Distance Minimization for Solving the Nearest Point Problem Based on Scaled Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Qiangkui Leng, Erjie Jiao, Yuqing Liu, Jiamei Guo, and Ying Chen Nut Recognition and Positioning Based on YOLOv5 and RealSense . . . . . . . . . . 209 JinFeng Zhang, TianZhong Zhang, Jun Liu, Zhiwen Gong, and Lei Sun Gait Identification Using Hip Joint Movement and Deep Machine Learning . . . . 220 Luke Topham, Wasiq Khan, Dhiya Al-Jumeily, Atif Waraich, and Abir Hussain
Contents – Part III
xvii
Study on Path Planning of Multi-storey Parking Lot Based on Combined Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Zhongtian Hu, Jun Yan, Yuli Wang, Changsong Yang, Qiming Fu, Weizhong Lu, and Hongjie Wu A Systematic Review of Distributed Deep Learning Frameworks for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Francesco Berloco, Vitoantonio Bevilacqua, and Simona Colucci Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce Through Innovative Graphs and Cyberthreat Intelligence Employment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Rafał Kozik, Marek Pawlicki, Mateusz Szczepa´nski, Rafał Renk, and Michał Chora´s Federated Sparse Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Xiangyang Guo, Daqing Wu, and Jinwen Ma Classification of Spoken English Accents Using Deep Learning and Speech Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Zaid Al-Jumaili, Tarek Bassiouny, Ahmad Alanezi, Wasiq Khan, Dhiya Al-Jumeily, and Abir Jaafar Hussain A Stable Community Detection Approach for Large-Scale Complex Networks Based on Improved Label Propagation Algorithm . . . . . . . . . . . . . . . . . 288 Xiangtao Chen and Meijie Zhao An Effective Method for Yemeni License Plate Recognition Based on Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 Hamdan Taleb, Zhipeng Li, Changan Yuan, Hongjie Wu, Xingming Zhao, and Fahd A. Ghanem Topic Analysis of Public Welfare Microblogs in the Early Period of the COVID-19 Epidemic Based on LDA Model . . . . . . . . . . . . . . . . . . . . . . . . . 315 Ji Li and Yujun Liang Intelligent Computing in Computer Vision Object Detection Networks and Mixed Reality for Cable Harnesses Identification in Assembly Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Yixiong Wei, Hongqi Zhang, Hongqiao Zhou, Qianhao Wu, and Zihan Niu Improved YOLOv5 Network with Attention and Context for Small Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Tian-Yu Zhang, Jun Li, Jie Chai, Zhong-Qiu Zhao, and Wei-Dong Tian
xviii
Contents – Part III
Inverse Sparse Object Tracking via Adaptive Representation . . . . . . . . . . . . . . . . . 353 Jian-Xun Mi, Yun Gao, and Renjie Li A Sub-captions Semantic-Guided Network for Image Captioning . . . . . . . . . . . . . 367 Wei-Dong Tian, Jun-jun Zhu, Shuang Wu, Zhong-Qiu Zhao, Yu-Zheng Zhang, and Tian-yu Zhang A Novel Gaze Detection Method Based on Local Feature Fusion . . . . . . . . . . . . . 380 Juan Li, Yahui Dong, Hui Xu, Hui Sun, and Miao Qi Vehicle Detection, Classification and Counting on Highways - Accuracy Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Prashan Premaratne, Rhys Blacklidge, and Mark Lee Image Dehazing Based on Deep Multiscale Fusion Network and Continuous Memory Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Qiang Li, Zhihua Xie, Sha Zong, and Guodong Liu Improved YOLOv5s Model for Vehicle Detection and Recognition . . . . . . . . . . . 423 Xingmin Lu and Wei Song Garbage Classification Detection Model Based on YOLOv4 with Lightweight Neural Network Feature Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Xiao-Feng Wang, Jian-Tao Wang, Li-Xiang Xu, Ming Tan, Jing Yang, and Yuan-yan Tang Detection of Personal Protective Equipment in Factories: A Survey and Benchmark Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448 Zhiyang Liu, Thomas Weise, and Zhize Wu Intelligent Control and Automation A Novel IoMT System for Pathological Diagnosis Based on Intelligent Mobile Scanner and Whole Slide Image Stitching Method . . . . . . . . . . . . . . . . . . . 463 Peng Jiang, Juan Liu, Di Xiao, Baochuan Pang, Zongjie Hao, and Dehua Cao Deep Reinforcement Learning Algorithm for Permutation Flow Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Yuanyuan Yang, Bin Qian, Rong Hu, and Dacheng Zhang Model Predictive Control for Voltage Regulation in Bidirectional Boost Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Duy-Long Nguyen, Huu-Cong Vu, Quoc-Hoan Tran, and Hong-Hee Lee
Contents – Part III
xix
Fed-MT-ISAC: Federated Multi-task Inverse Soft Actor-Critic for Human-Like NPCs in the Metaverse Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492 Fangze Lin, Wei Ning, and Zhengrong Zou Development of AUV Two-Loop Sliding Control System with Considering of Thruster Dynamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Filaretov Vladimir, Yukhimets Dmitry, and Changan Yuan An Advanced Terminal Sliding Mode Controller for Robot Manipulators in Position Tracking Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 Anh Tuan Vo, Thanh Nguyen Truong, Hee-Jun Kang, and Tien Dung Le An Observer-Based Fixed Time Sliding Mode Controller for a Class of Second-Order Nonlinear Systems and Its Application to Robot Manipulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Thanh Nguyen Truong, Anh Tuan Vo, Hee-Jun Kang, and Tien Dung Le A Robust Position Tracking Strategy for Robot Manipulators Using Adaptive Second Order Sliding Mode Algorithm and Nonsingular Sliding Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 Tan Van Nguyen, Cheolkeun Ha, Huy Q. Tran, Dinh Hai Lam, and Nguyen Thi Hoa Cuc Intelligent Data Analysis and Prediction A Hybrid Daily Carbon Emission Prediction Model Combining CEEMD, WD and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 Xing Zhang and Wensong Zhang A Hybrid Carbon Price Prediction Model Based on VMD and ELM Optimized by WOA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Xing Zhang and Wensong Zhang A Comparative Study of Autoregressive and Neural Network Models: Forecasting the GARCH Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 Firuz Kamalov, Ikhlaas Gurrib, Sherif Moussa, and Amril Nazir A Novel DCT-Based Video Steganography Algorithm for HEVC . . . . . . . . . . . . . 604 Si Liu, Yunxia Liu, Cong Feng, and Hongguo Zhao Dynamic Recurrent Embedding for Temporal Interaction Networks . . . . . . . . . . . 615 Qilin Liu, Xiaobo Zhu, Changgan Yuan, Hongje Wu, and Xinming Zhao
xx
Contents – Part III
Deep Spatio-Temporal Attention Network for Click-Through Rate Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 Xin-Lu Li, Peng Gao, Yuan-Yuan Lei, Le-Xuan Zhang, and Liang-Kuan Fang A Unified Graph Attention Network Based Framework for Inferring circRNA-Disease Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639 Cun-Mei Ji, Zhi-Hao Liu, Li-Juan Qiao, Yu-Tian Wang, and Chun-Hou Zheng Research on the Application of Blockchain Technology in the Evaluation of the “Five Simultaneous Development” Education System . . . . . . . . . . . . . . . . . 654 Xian-hong Xu, Feng-yang Sun, and Yu-qing Zheng Blockchain Adoption in University Archives Data Management . . . . . . . . . . . . . . 662 Cong Feng and Si Liu A Novel Two-Dimensional Histogram Shifting Video Steganography Algorithm for Video Protection in H.265/HEVC . . . . . . . . . . . . . . . . . . . . . . . . . . . 672 Hongguo Zhao, Yunxia Liu, and Yonghao Wang Problems and Countermeasures in the Construction of Intelligent Government Under the Background of Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 684 ZhaoBin Pei and Ying Wang Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698 Yunyi Bai, Mang Xu, and Pengjiang Qian Remaining Useful Life Prediction Based on Improved LSTM Hybrid Attention Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709 Mang Xu, Yunyi Bai, and Pengjiang Qian Medical Image Registration Method Based on Simulated CT . . . . . . . . . . . . . . . . 719 Xuqing Wang, Yanan Su, Ruoyu Liu, Qianhui Qu, Hao Liu, and Yi Gu Research on Quantitative Optimization Method Based on Incremental Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Ying Chen, Youjun Huang, and Lichao Gao An Improved Waste Detection and Classification Model Based on YOLOV5 . . . 741 Fan Hu, Pengjiang Qian, Yizhang Jiang, and Jian Yao
Contents – Part III
xxi
An Image Compression Method Based on Compressive Sensing and Convolution Neural Network for Massive Imaging Flow Cytometry Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755 Long Cheng and Yi Gu Intelligent Computing and Optimization Optimization Improvement and Clustering Application Based on Moth-Flame Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 Lvyang Ye, Huajuan Huang, and Xiuxi Wei Application of Improved Fruit Fly Optimization Algorithm in Three Bar Truss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785 Dao Tao, Xiuxi Wei, and Huajuan Huang Adaptive Clustering by Fast Search and Find of Density Peaks . . . . . . . . . . . . . . . 802 Yuanyuan Chen, Lina Ge, Guifen Zhang, and Yongquan Zhou A “Push-Pull” Workshop Logistics Distribution Under Single Piece and Small-Lot Production Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Mengxia Xu, Hao Zhang, Xue Wang, and Jianfeng Lu Greedy Squirrel Search Algorithm for Large-Scale Traveling Salesman Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830 Chenghao Shi, Zhonghua Tang, Yongquan Zhou, and Qifang Luo Multiple Populations-Based Whale Optimization Algorithm for Solving Multicarrier NOMA Power Allocation Strategy Problem . . . . . . . . . . . . . . . . . . . . 846 Zhiwei Liang, Qifang Luo, and Yongquan Zhou Complex-Valued Crow Search Algorithm for 0–1 KP Problem . . . . . . . . . . . . . . . 860 Yan Shi, Yongquan Zhou, Qifang Luo, and Huajuan Huang Discrete Artificial Electric Field Optimization Algorithm for Graph Coloring Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 Yixuan Yu, Yongquan Zhou, Qifang Luo, and Xiuxi Wei Automatic Shape Matching Using Improved Whale Optimization Algorithm with Atomic Potential Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891 Yuanfei Wei, Ying Ling, Qifang Luo, and Yongquan Zhou Correction to: Bagging-AdaTSK: An Ensemble Fuzzy Classifier for High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guangdong Xue, Bingjie Zhang, Xiaoling Gong, and Jian Wang
C1
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
Fuzzy Theory and Algorithms
An Incremental Approach Based on Hierarchical Classification in Multikernel Fuzzy Rough Sets Under the Variation of Object Set Wei Fan1,2 , Chunlin He1 , Anping Zeng2(B) , and Ke Lin1 1 China West Normal University, Nanchong 637000, Sichuan, China 2 Yibin University, Yibin 644007, Sichuan, China
[email protected]
Abstract. In the context of the era of big data, hybrid data are multimodality, including numerical, images, audio, etc., and even attribute values may have unknown values. Multikernel fuzzy rough sets can effectively solve large-scale multimodality attributes. At the same time, the decision attribute values may have hierarchical structure relationships, and the multikernel fuzzy rough sets based on hierarchical classification can solve the hierarchical relationships among decision attribute values. In real life, data often change dynamically. The article discusses the updating method when one object changes in the multimodality incomplete decision system based on hierarchical classification, and according to the variation of the tree-based hierarchical class structure, the upper and lower approximations are updated. Finally the deduction is carried out through relevant examples. Keywords: Multikernel fuzzy rough sets · Hierarchical classification · Multimodality incomplete decision system · Variation of object set
1 Introduction Fuzzy rough sets skillfully combined the rough sets that deal with classification uncertainty and the fuzzy sets that deal with boundary uncertainty to handle various classes of data types [1]. Subsequently, many scholars expanded and developed fuzzy rough sets [2–7]. To deal with multimodality attributes, Hu proposed multikernel fuzzy rough sets [8], different kernel functions used to process multimodality attributes. Actually, there are semantic hierarchies among most data types [9]. Chen and others constructed a decision tree among classes with a tree-like structure [10]. Inspired by fuzzy rough sets theory, Wang proposed deep fuzzy trees [11]. Zhao embed the hierarchical structure into fuzzy rough sets [12]. Qiu proposed a fuzzy rough sets method using the Hausdorff distance of the sample set for hierarchical feature selection [13]. The expansion and updating of data information will also cause data loss. Therefore, in the multimodality incomplete information system, the incremental updating algorithm of the approximations is particularly important, Zeng proposed an incremental © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 3–17, 2022. https://doi.org/10.1007/978-3-031-13832-4_1
4
W. Fan et al.
updating algorithm for the variation of object set [14] and the attribute value based on the hybrid distance [6]. Dong L proposed an incremental algorithm for attribute reduction when samples and attributes change simultaneously [15]. Huang proposed a multi-source hybrid rough set (MCRS) [16], under variation of the object, attributes set and attribute values, a matrix-based incremental mechanism studied. However, multikernel fuzzy rough sets incremental algorithm based on hierarchical classification is not considered now. This paper is organized as follows: Sect. 2 introduces the basic knowledge. In Sect. 3, the hierarchical structure among decision attribute values is considered into the multikernel fuzzy rough sets, the tree-based hierarchical class structure is imported, and the upper and lower approximations of all nodes is proposed. In Sect. 4, an incremental updating algorithm of upper and lower approximations for the immigration and emmigration of single object. Finally a corresponding example is given for deduction. The paper ends with conclusions and further research topics in Sect. 5.
2 An Introduction of Fuzzy Rough Sets In this section, some related content of fuzzy rough sets will be briefly introduced. Definition 1 [17]. Given an fuzzy approximate space {U , R}, ∀x, y, z ∈ U . If R satisfies: Reflexivity: R(x, x) = 1; Symmetry: R(x, y) = R(y, x); Min-max transitivity: min(R(x, y), R(y, z)) ≤ R(x, z). Then R is said to be a fuzzy equivalence relation on U. Definition 2 [4]. Given a fuzzy approximate space {U , R}, R is a fuzzy equivalence relation on the U, and the fuzzy lower and upper approximations are respectively defined as follows: ⎧ ⎪ ⎨ Rs X (x) = inf S(N (R(x, y)), X (y)) y∈u (1) ⎪ ⎩ RT X (x) = sup T (R(x, y), X (y)) y∈u
T and S are the triangular mode. N is a monotonically decreasing mapping function. In a multimodality information system, the attributes of samples are multimodality, and multikernel learning is an effective method, often using different kernel functions to extract information from different attributes [8]. At the same time, there may also be unknown value in the attribute values. In this paper, the case where the attribute values exist unknown value is considered into the multimodality information system. Definition 3 [6]. Given a multimodality incomplete information system {U , MC}, MC is a multimodality conditional attribute, ∀x, y ∈ U , ∀M ∈ MC, and it exists unknown value. M (x), M (y) are the M attribute values of x and y, respectively. Unknown values are marked as “?”. The similarity relationships extracted from this data using a matching kernel. K(M (x), M (y)) = 1
(2)
An Incremental Approach Based on Hierarchical Classification
5
It is easy to prove that the kernel function satisfies Definition 1. Therefore, the similarity relation calculated is fuzzy equivalence relation. Definition 4 [8]. Given a {U , MC}, MC divides the multimodality information system into p subsets with different attributes, referred to as MC/U = {M1 , M2 , ..., MP }, Mi ∈ MC, Ki is a fuzzy similarity relation computed with single attribute. ∀x, y ∈ U , the fuzzy similarity relation based on combination kernels is defined as follows: p p Ki (x, y) − 1 − Ki (x, y)2 , 0 (3) KTcos (x, y) = max i=1
i=1
Example 1. Given a {U , MC, D}, there are seven objects, MC = {K1 , K2 , K3 , K4 }, D = {d1 , d2 , d3 , d4 , d5 }, The details are shown in Table 1. Table 1. Dataset used in the example X
K1
K2
K3
K4
D
x1
26
1.56
102
Yes
d4
x2
25
1.83
121
?
d2
x3
27
1.80
87
No
d3
x4
22
1.78
89.8
No
d1
x5
26
1.85
105
?
d5
x6
27
1.93
102
Yes
d1
x7
52
1.69
87
Yes
d4
x8
25
1.6
105
Yes
d4
In Table 1, there are three types of conditional attributes, numerical type, categorical type, and unknown values. It can be regarded as a multimodality incomplete decision system, and different kernel functions is used to extract the fuzzy similarity relationships of different attributes. x1 and x2 as an example, calculating the fuzzy similarity relationships based on combination kernels: K1 (x1 , x2 ) = 0.990, K2 (x1 , x2 ) = 0.930, K3 (x1 , x2 ) = 0.965, K4 (x1 , x2 ) = 1. KTcos (x1 , x2 ) = max(
4 i=1
Ki (x1 , x2 ) −
4
1 − Ki (x1 , x2 )2 , 0) = 0.888.
i=1
In the same way, the fuzzy similarity relationships based on combination kernels among other objects can be obtained, and the fuzzy similarity relationships matrix
6
W. Fan et al.
KTcos (x, y) can be obtained as follows: ⎞ ⎛ 1 0.888 0 0 0.918 0.863 0.001 0.973 ⎜ 0.888 1 0.855 0.824 0.965 0.918 0.001 0.956 ⎟ ⎟ ⎜ ⎜ 0 0.855 1 0.779 0.956 0 0 0 ⎟ ⎟ ⎜ ⎟ ⎜ 0 0 ⎟ ⎜ 0 0.824 0.779 1 0.827 0 KTcos (x, y) = ⎜ ⎟ ⎜ 0.918 0.965 0.956 0.827 1 0.983 0.001 0.965 ⎟ ⎟ ⎜ ⎜ 0.863 0.918 0 0 0.983 1 0.002 0.906 ⎟ ⎟ ⎜ ⎝ 0.001 0.001 0 0 0.001 0.002 1 0.001 ⎠ 0.973 0.956 0 0 0.965 0.906 0.001 1
3 Multikernel Fuzzy Rough Sets Based on Hierarchical Classifification Given a multimodality incomplete decision system, in addition to multimodality attributes of objects, there may be hierarchical relationships among decision attribute values. The hierarchical classification is mostly based on a tree structure. A multikernel fuzzy rough sets based on hierarchical classification takes the hierarchical relationships of decision attribute values into account in the fuzzy rough sets. Definition 5 [4]. Given a {U , MC}. KTcos is a fuzzy equivalence relation based on combination kernels, X is a fuzzy subset of U, and the approximations are respectively defined as as follows: ⎧ ⎪ ⎨ KT X (x) = inf S N KTcos (x, y) , X (y) y∈u (4) ⎪ K X (x) = sup T KTcos (x, y), X (y) ⎩ T y∈u
Definition 6 [12]. {U , MC, Dtree } is a multimodality decision system based on hierarchical classification, DTree is decision attribute based on classification, and hierarchical divides U into q subsets, referred to as U /DTree = d1 , ......, dp , sib(di ) represents the sibling node of di .∀x ∈ U , if x ∈ sib(di ), then di (x) = 0, else di (x) = 1. The approximations of decision class di is defined as follows: ⎧ 2 ⎪ ⎨ KT sibling di (x) = inf { 1 − KTcos (x, y)} y∈{sib(di )} (5) ⎪ ⎩ KT sibling di (x) = sup {KTcos (x, y)} y∈di
Proposition 1. This paper extends the lower approximations algorithm of the decision class to any node. When the decision class is a non-leaf node, the upper approximations is obtained by finding the least upper bound of the upper approximations of its child nodes. Leaf(d) represents no child node, the child node of di is marked as dich = {di1 , di1 , ..., dik }, where k is the number of child nodes, there are: ⎧ ⎨ inf 1 − KT2cos (x, y) sib(di ) = ∅ KT sibling di (x) = y∈{sib(di )} (6) ⎩ 0 else
An Incremental Approach Based on Hierarchical Classification
⎧ ⎨ sup KTcos (x, y) di ∈ leaf (d ) KT sibling di (x) = y∈di ⎩ sup K T sibling dich (x) else
7
(7)
Given a {U , MC, Dtree }, the algorithm for the lower and upper approximations is designed in Algorithm 1.
Algorithm 1: The algorithm for the lower and upper approximations based on hierarchical classification Input: {U ,MC,Dtree } and combination kernels KTcos ( x, y ) Output: The approximations KT and KT sibling sibling for each di ∈ U / DTree ψdo for each d ∈ U / DTree ψdo if d ∈ leaf (d ) then for each x ∈ U do
(
KT sibling d ( x ) = max KTcos ( x, y )
)
end //Calculate the upper approximations of leaf nodes. end end
(
KT sibling d ( x ) = max KT sibling d ch ( x )
)
//Calculate the upper approximations of non-leaf nodes. for each x ∈ U ψdo for each y ∈ sib ( d ) do KT
sibling
= inf
{ 1− K
2 Tcos
( x, y )}
//Calculate the lower approximations. end end end
Example 2. On the basis of Example 1, the decision attributes are divided into five subsets, namely d1 , d2 , d3 , d4 , d5 . From Fig. 1, d1 = {x4 , x6 }, and from Table 1, sib(d1 ) = d4 = {x1 , x8 }, according to Proposition 1, the lower and upper approximations of the decision class are calculated as follows: KT sibling d1 (x2 ) = inf 1 − KT2cos (x2 , y) = min{0.475, 0.293} = 0.293 y∈{x1 ,x8 }
KT sibling d1 (x1 ) = 0, KT sibling d1 (x3 ) = 0, KT sibling d1 (x4 ) = 0, KT sibling d1 (x5 ) = 0.262. KT sibling d1 (x6 ) = 0.423, KT sibling d1 (x7 ) = 1, KT sibling d1 (x8 ) = 0.
8
W. Fan et al.
Because d1 is a non-leaf node, according to Proposition 1 there are: KT sibling d1 (x1 ) = sup KT sibling d3 (x1 ) = sup KTcos (x1 , x3 ) = 0 y∈d3
KT sibling d1 (x2 ) = 0.855, KT sibling d1 (x3 ) = 1, KT sibling d1 (x4 ) = 0.778. KT sibling d1 (x5 ) = 0.956, KT sibling d1 (x6 ) = 0, KT sibling d1 (x7 ) = 0, KT sibling d1 (x8 ) = 0. So the lower and upper approximations of d1 are: KT sibling d1 = {0.293/x2 , 0.262/x5 , 0.423/x6 , 1/x7 } KT sibling d1 = {0.885/x2 , 1/x3 , 0.778/x4 , 0.956/x5 } On the basis of Table 1, the tree-based hierarchical class structure is established in Fig. 1.
d0
d1
d4 d2
d5
d3
Fig. 1. The tree of decision class
4 Incremental Updating for Lower and Upper Approximations Under the Variation of Single Object (t) A {U , MC, Dtree } at time t is given. K Tsibling X represents lower approximations, and (t) K Tsibling X represents upper approximations. Given U , MC, Dtree represents a multimodality decision system based on hierarchical classification at time t + 1, x+ , x− represents immigration and emmigration of one object, respectively. The fuzzy upper and (t+1) (t+1) lower approximations at time t + 1 are denoted by K Tsibling X and K Tsibling X , respectively.
4.1 Immigration of Single Object The x+ immigrates into the U , MC, Dtree at time t + 1, in which U = U ∪ x+ . If no new decision class is generated at time t + 1, tree-based hierarchical class structure does not need to be updated. Otherwise, the tree will be updated. Next, the incremental updating is discussed by whether to generate a new decision class.
An Incremental Approach Based on Hierarchical Classification
9
Proposition 2. ∀di ∈ U /Dtree , x ∈ U , x+ will generate a new decision class, and the + + new class is marked as d n+1 . The labeled class d n+1 needs to be inserted into the treebased hierarchical class structure. Then the approximations updates of the decision class is shown: ⎧ 2 (x, y), K (t) ⎪ inf 1 − K d x+ ∈ {sib(di )} and (x) ⎪ i T T ⎪ + cos sibling ⎪ y=x ⎪ ⎪ ⎪ + ⎪ ⎪ {sib(di )} = d n+1 ⎨ (t+1) K Tsibling di (x) = K (t) / {sib(di )} and x = x+ x+ ∈ Tsibling di (x) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ 1 − KT2 (x, y) else ⎩ inf cos
y∈{sib(di )}
⎧ ⎪ ⎪ ⎪ ⎨
(8) (t)
K Tsibling di (x) x = x+ and di ∈ leaf (d ) (t+1) sup KTcos (x, y) x = x+ and di ∈ leaf (d ) K Tsibling di (x) = y∈d ⎪ i ⎪ ⎪ ⎩ sup K (t) d (x) else Tsibling ick
(9)
Proof. For the lower approximations of di , the approximations at time t + 1 is determined / {sib(di )}, the by the objects belonging to the sibling nodes of di . When x = x+ and x+ ∈ lower approximations are the same as time t; when x = x+ or the sibling nodes of at time t + 1 are newly decision class, calculating its lower approximations directly according to Proposition 1; when x+ ∈ {sib(di )} and all sibling nodes of di are not newly classes, ∀x = x+ , there is: (t+1) 2 K Tsibling di (x) = inf 1 − KTcos (x, y) y∈{sib(di )}
=
inf
y∈{sib(di )−x+ }∪x+
1 − KT2cos (x, y)
(t)
= K Tsibling di (x) ∧ inf
y=x+
1 − KT2cos (x, y)
(t) 2 = inf K Tsibling di (x), 1 − KTcos (x, y) y=x+
For the upper approximations of di , the upper approximations is determined by the object that belongs to di . When di is a leaf node and x = x+ at time t + 1, that is the newly decision subset not equal to di , and its upper approximations is the same as time t; When di is a leaf and x = x+ at time t + 1, direct computational approximations according to Proposition 1; In the tree-based hierarchical class structure, the decision class with the same parents node belongs to a major class, so di and dich belong to the same major class, When di is not a leaf node, we directly find the least upper bound for all child nodes of the decision class di , and obtain updating of the upper approximations.
10
W. Fan et al.
Proposition 3. ∀di ∈ U /Dtree , x ∈ U , x+ will not generate a new decision class, and then the tree-based hierarchical class structure does not need to be updated. The approximations updating of the decision class is shown: ⎧ (t) ⎪ inf 1 − KT2cos (x, y), K Tsibling di (x) x = x+ and ⎪ ⎪ ⎪ ⎪ ⎪ y=x+ ⎪ ⎪ ⎪ x+ ∈ {sib(di )} ⎨ (t) (t+1) x = x+ and (10) K Tsibling di (x) = K Tsibling di (x) ⎪ ⎪ + ∈ ⎪ {sib(d x / )} ⎪ i ⎪ ⎪ ⎪ 2 (x, y) ⎪ ⎪ else inf 1 − K ⎩ Tcos y∈{sib(di )}
⎧ (t) K Tsibling di (x) / di x = x+ , x+ ∈ ⎪ ⎪ ⎪ ⎪ ⎪ and d ∈ leaf (d ) ⎪ i ⎪ ⎪ (t) ⎪ + + ⎪ sup KTcos (x, y), K Tsibling di (x) x = x ,x ∈ di ⎪ ⎪ ⎪ ⎨ y=x+ (t+1) K Tsibling di (x) = and di ∈ leaf (d ) ⎪ ⎪ ⎪ sup K y) x = x+ (x, ⎪ Tcos ⎪ ⎪ y∈di ⎪ ⎪ ⎪ ⎪ and di ∈ leaf (d ) ⎪ ⎪ ⎪ ⎩ sup K (t) d (x) else Tsibling ich
(11)
Proof . The proof of the lower approximations updating is similar to Proposition 2; for the upper approximations, at time t + 1, when di is a non-leaf node, directly find the least upper bound for the child nodes of di , and the approximations updating is obtained. When di is a leaf node, the upper approximations at time t + 1 is determined by the object that belongs to di ; When x = x+ , the upper approximations is directly calculated / {di } the approximations is the same according to Proposition 1; When x = x+ and x+ ∈ as time t; when x = x+ and x+ ∈ {di } , there is: (t+1) K Tsibling di (x) = sup KTcos (x, y) y∈di
=
inf
y∈{sib(di )−x+ }∪x+
2 1 − KTcos (x, y)
(t) = sup K Tsibling di (x), KTcos (x, y) y=x+
Given a U , MC, Dtree , when immigration of one object, the algorithm for the lower and upper approximations is designed in Algorithm 2.
An Incremental Approach Based on Hierarchical Classification
Algorithm 2: Incremental Algorithm Based on Hierarchical Classification For the Immigration of single Object
{
}
(t )
) Input: U ,MC,D tree , K T(tsibling and K Tsibling . //Already obtained in Algorithm 1.
( t +1)
+1) Output: The approximations K T( tsibling and K Tsibling
KTcos ( x, y ) ⇐ KTcos ( x + , y ) for each d ∈ U / DTree ψdo for each x ∈ U ψdo
{
if d
+ n +1
}
≠ sib ( d ) and x + ∈ sib ( d ) or { x ≠ x + and x + ∈ sib ( d )} then
{ 1− K
K T( sibling) d ( x ) = inf t +1
y=x
+
2 Tcos
( x, y ) ,K T(t )
sibling
}
d ( x)
end if x ≠ x + and x + ∉ sib ( d ) then ) K T( sibling) d ( x ) = K T( sibling d ( x) t +1
t
end else K T( sibling) d ( x ) = inf t +1
y∈{sib ( d )}
{ 1− K
2 Tcos
( x, y )}
end //Low approximations. +
if d n +1 then if d ∈ leaf and x = x+ then ( t +1)
K Tsibling d ( x ) = sup K T
( x, y )
cos
y∈d
end if d ∈ leaf and x ≠ x+ then ( t +1)
(t )
K Tsibling d ( x ) = K Tsibling d ( x )
end else
{
( t +1)
(t )
}
K Tsibling di ( x ) = sup K Tsibling dick ( x )
end //Upper approximations when new decision classes are generated. end else if d ∈ leaf and x = x+ then ( t +1)
K Tsibling d ( x ) = sup K T y∈d
cos
( x, y )
11
12
W. Fan et al.
end if d ∈ leaf , x+ ∉ d and x = x+ then ( t +1)
(t )
K Tsibling d ( x ) = K Tsibling d ( x )
end if d ∈ leaf , x+ ∈ d and x = x+ then ( t +1)
{
(t )
K Tsibling d ( x ) = sup K T y=x
+
end else
{
( t +1)
cos
( x, y ) ,K T
sibling
}
d ( x)
}
(t )
K Tsibling di ( x ) = sup K Tsibling dick ( x )
end end //Upper approximations when no new decision classes are generated end end
Table 2. Information about immigration of single bject X
K1
K2
K3
K4
D
x+ 9
17
1.58
50
no
Normal_Weight
Example 3. On the basis of Table 1, one object x9+ immigrates into system, and its information is shown in Table 2. A new decision class d6 is generated. first the fuzzy similarity relationships with other objects based on combination kernels are calculated according to Definition 6, as follows: KTcos x9+ , x = 0 0.299 0.306 0.643 0.306 0 0 0 1 On the basis of Fig. 1, inserting the newly decision class d6 into the tree, the tree is update as follows:
An Incremental Approach Based on Hierarchical Classification
13
d0
d1
d4 d2
d5
d6
d3
Fig. 2. The tree of decision class
As shown in Fig. 2, d6 ∈ leaf , sib(d1 ) = {d4 , d6 } = x1 , x8 , x9+ . According to Proposition 2: (t+1) (t) K Tsibling d1 (x1 ) = inf 1 − KT2cos (x1 , y), K Tsibling d1 (x1 ) = min(0, 0) = 0 y=x9+
(t+1)
(t+1)
(t+1)
(t+1)
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x2 ) = 0.293, K Tsibling d1 (x3 ) = 0, K Tsibling d1 (x4 ) = 0, K Tsibling d1 (x5 ) = 0.263. (t+1)
K Tsibling d1 (x6 ) = 0, K Tsibling d1 (x7 ) = 0, K Tsibling d1 (x8 ) = 0, K Tsibling d1 (x9 ) = 0. From Fig. 2, d1 can be obtained as a non-leaf node, according Proposition 2: (t+1) (t+1) (t) K Tsibling d1 (x1 ) = sup K Tsibling d1ch (x1 ) = sup K Tsibling d3 (x1 ) = 0 y∈d3
y∈d3
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x2 ) = 0.855, K Tsibling d1 (x3 ) = 1, K Tsibling d1 (x4 ) = 0.778. (t+1)
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x5 ) = 0.956, K Tsibling d1 (x6 ) = 0, K Tsibling d1 (x7 ) = 0, K Tsibling d1 (x8 ) = 0. (t+1)
K Tsibling d1 (x9 ) = 0.306. So the lower and upper approximations of d1 are: (t+1)
K Tsibling d1 = {0.293/x2 , 0.262/x5 } (t+1)
K Tsibling d1 = {0.885/x2 , 1/x3 , 0.778/x4 , 0.956/x5 , 0.306/x9 } From Example 3, obviously, when immigration of one object in the multimodality decision system based on hierarchical classification, it will affect the upper and lower approximations of other decision classes. Only a small amount of update operations need to be performed according to properties 2 and 3, which greatly reduces the amount of calculation and time cost.
14
W. Fan et al.
4.2 Emmigration of Single Object x− emmigrates from the U , MC, Dtree at time t + 1, in which U = U − x− . If no decision class is removed at time t + 1, the tree-based hierarchical class structure does not need to be updated, otherwise, the tree is updated. Next, the incremental updating will be discussed by whether to remove a decision class. Proposition 4. ∀di ∈ U /Dtree , x ∈ U , x− emmigrates form system, which leads to the decision class is removed, marked as d l . The tree-based hierarchical class structure will be updated, and the d l will be removed from the tree. Then the approximations updating of the decision class is shown: ⎧ (t) ⎪ inf 1 − KT2cos (x, y), K Tsibling di (x) x− ∈ sib(di )and ⎪ ⎪ ⎪ ⎪ ⎨ y∈sib(di ) (t+1) sib(di ) = ∅ (12) K Tsibling di (x) = ⎪ (t) − ∈ ⎪ } d / sib{d x K (x) ⎪ Tsibling i i ⎪ ⎪ ⎩ 0 else (t+1)
(t)
K Tsibling di (x) = K Tsibling di (x)
(13)
Proof . At time t+1, the removed decision class has no upper and lower approximations, but x− will have the impact on the upper and lower approximations of other decision classes. If x− is the closest object to the sibling node to which the object x belongs, the lower approximations need to be recalculated, otherwise, the lower approximations remains unchanged; if the sibling node of di does not exist after x− emmigrates, the lower approximations is 0. For a decision class, the upper approximations is only related to the / di , so the upper approximations objects belonging to the decision class. At time t+1, x− ∈ remains the same. Proposition 5. ∀di ∈ U /Dtree , x ∈ U , x− will not cause the decision class to be removed, and then the tree-based hierarchical class structure does not need to be updated. Then the approximations updating of the decision class is shown: ⎧ (t) ⎪ ⎨ inf 1 − KT2cos (x, y), K Tsibling di (x) x− ∈ sib(di ) (t+1) (14) K Tsibling di (x) = y∈sib(di ) ⎪ ⎩ K (t) di (x) else Tsibling ⎧ (t) ⎪ K sup K d y), x− ∈ di and di ∈ leaf (d ) (x, (x) i Tsibling ⎪ − Tcos ⎪ ⎨ y=x (t+1) (t) K Tsibling di (x) = (15) K Tsibling di (x) / di and di ∈ leaf (d ) x− ∈ ⎪ ⎪ ⎪ (t) ⎩ sup K d (x) else Tsibling ich
The proof process is similar. Given a U , MC, Dtree , when one object emmigrates from system, the algorithm for the lower and upper approximations is designed in Algorithm 3.
An Incremental Approach Based on Hierarchical Classification Algorithm 3: Incremental Algorithm Based on Hierarchical Classification For the Emmigration of single Object
{
}
(t )
) Input: U ,MC,Dtree , K T(tsibling and K Tsibling . //Already obtained in Algorithm 1.. ( t +1 )
+1) Output: The approximations K T( tsibling and K Tsibling X
for each d ∈ U / DTree ψdo for each x ∈ U ψdo if d l then if x − ∈ sib ( d ) and sib ( d ) ≠ ∅ then K T( sibling) d ( x ) = inf t +1
y∈sib ( d )
{ 1− K
2 Tcos
( x, y ) ,K T(t )
sibling
}
d ( x)
end if x − ∉ sib ( d ) then ) K T( sibling) d ( x ) = K T( sibling d ( x) t +1
t
end else K T( sibling) d ( x ) = 0 t +1
end //Lower approximations without decision class removal. ( t +1 )
(t )
K Tsibling d ( x ) = K Tsibling d ( x )
end else if x − ∈ sib ( d ) then K T( sibling) d ( x ) = inf t +1
y∈sib ( d )
{ 1− K
2 Tcos
( x, y ) ,K T(t )
sibling
}
d ( x)
end else t +1 t) K T( sibling) d ( x ) = K T( sibling d ( x) end//Lower approximations with decision class removal. If x − ∈ d and d ∈l eaf ( d ) then
{
K T( sibling) d ( x ) = sup K T t +1
y = x−
cos
end if x − ∉ d and d ∈l eaf ( d ) then ) K T( sibling) d ( x ) = K T( sibling d ( x) t +1
t
end else ) K T( sibling) d ( x ) = K T( sibling d ( x) t +1
end end end end
t
(t )
( x, y ) ,K T
sibling
}
d ( x)
15
16
W. Fan et al.
Example 4. The object x3 emmigrates from the system on the basis of Table 1, which will cause the decision class d3 to be removed, and the tree will be updated as follows:
d0
d1
d4 d2
d5
d3
Fig. 3. The new tree for emmigration of one object x3
It can be seen from Fig. 3 that the child node of d3 has been removed, and d1 is a leaf node. According to Proposition 4, it can be obtained: (t+1)
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x1 ) = 0, K Tsibling d1 (x2 ) = 0.293, K Tsibling d1 (x3 ) = 0, K Tsibling d1 (x4 ) = 0. (t+1)
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x5 ) = 0.262, K Tsibling d1 (x6 ) = 0.423, K Tsibling d1 (x7 ) = 1, K Tsibling d1 (x8 ) = 0. Because d1 become a leaf node, where are: (t+1)
(t+1)
(t+1)
(t+1)
K Tsibling d1 (x1 ) = 0, K Tsibling d1 (x2 ) = 0.855, K Tsibling d1 (x3 ) = 0, K Tsibling d1 (x5 ) = 0.956. (t+1)
(t+1)
(t+1)
K Tsibling d1 (x6 ) = 0, K Tsibling d1 (x7 ) = 0, K Tsibling d1 (x8 ) = 0. So the lower and upper approximations of d1 are: (t+1)
K Tsibling d1 = {0.293/x2 , 0.262/x5 , 0.423/x6 , 1/x7 } (t+1)
K Tsibling d1 = {0.885/x2 , 0.778/x4 , 0.956/x5 } From Example 4, it can be seen that when one object emmigrate form the multimodality decision system based on hierarchical class, the emmigration of one objects have an impact on the upper and lower approximations of other decision classes, and only a small amount of update operations need to be performed according to Proposition 4 and 5, which greatly reduces the calculation and time cost.
5 Conclusions and Further Research In the multimodality decision system based on hierarchical classification, attributes of simples are multimodality, decision attribute values often have a hierarchical structure, and data changes frequently. The paper takes into account the fact that the attribute
An Incremental Approach Based on Hierarchical Classification
17
values are not known in the multimodality information system based on hierarchical classification, and proposes an incremental updating algorithm in multikernel fuzzy rough sets based on hierarchical classification. The specific process of this algorithm is demonstrated through relevant examples. This algorithm can effectively reduce the time cost caused by object set changes. In future work, the approximations updating algorithm of more objects changing in multikernel fuzzy rough sets based on hierarchical classification and the performance of the algorithm to test the algorithm using the UCI dataset will be the focus of the research.
References 1. Dubois, D., Prade, H.: Rough fuzzy sets and fuzzy rough sets. Int. J. Gen. Syst. 17, 191–209 (1990) 2. Morsi, N., Yakout, M.: Axiomatics for fuzzy rough sets. Fuzzy Sets Syst. 100, 327–342 (1998) 3. Mi, J., Zhang, W.: An axiomatic characterization of a fuzzy generalization of rough sets. Inf. Sci. 160, 235–249 (2004) 4. Yeung, D., Chen, D., Tsang, E.C.: On the generalization of fuzzy rough sets. IEEE Trans. Fuzzy Syst. 13, 343–361 (2005) 5. Hu, Q., Yu, D., Pedrycz, W., Chen, D.: Kernelized fuzzy rough sets and their applications. IEEE Trans. Knowl. Data Eng. 23, 1649–1667 (2011) 6. Zeng, A., Li, T., Luo, C., Hu, J.: Incremental updating fuzzy rough approximations for dynamic hybrid data under the variation of attribute values. In: 2015 International Conference on Machine Learning and Cybernetics (ICMLC), pp. 157–162 (2015) 7. Zeng, K., She, K.: Fuzzy rough set model based on multi-Kernelized granulation. J. Univ. Electron. Sci. Technol. China 5, 717–723 (2014) 8. Hu, Q., Zhang, L., Zhou, Y.: Large-scale multimodality attribute reduction with multi-kernel fuzzy rough sets. IEEE Trans. Fuzzy Syst. 26, 226–238 (2018) 9. Zhao, H., Cao, J., Chen, Q.: Methods for hierarchical multi-label text classification. J. Chin. Comput. Syst. 43, 673–683 (2022) 10. Ceci, M., Malerba, D.: Classifying web documents in a hierarchy of categories: a comprehensive study. J. Intell. Inf. Syst. 28, 37–78 (2007). https://doi.org/10.1007/s10844-0060003-2 11. Wang, Y., Hu, Q., Zhu, P., Li, L., Lu, B.: Deep fuzzy tree for large-scale hierarchical visual classification. IEEE Trans. Fuzzy Syst. 28, 1395–1406 (2019) 12. Zhao, H., Wang, P., Hu, Q.: Fuzzy rough set based feature selection for large-scale hierarchical classification. IEEE Trans. Fuzzy Syst. 27, 1891–1903 (2019) 13. Qiu, Z., Zhao, H.: A fuzzy rough set approach to hierarchical feature selection based on Hausdorff distance. Appl. Intell., 1–14 (2022). https://doi.org/10.1007/s10489-021-03028-4 14. Zeng, A., Li, T., Luo, C.: Incremental approach for updating approximations of gaussian kernelized fuzzy rough sets under variation of object set. Comput. Sci. 6, 173–178 (2013) 15. Dong, L., Chen, D.: Incremental attribute reduction with rough set for dynamic datasets with simultaneously increasing samples and attributes. Int. J. Mach. Learn. Cybern. 11(6), 1339–1355 (2020). https://doi.org/10.1007/s13042-020-01065-y 16. Huang, Y., Li, T., Luo, C.: Dynamic maintenance of rough approximations in multi-source hybrid information systems. Inf. Sci. 530, 108–127 (2020) 17. Chen, S., Li, J., W, X.: Fuzzy Set Theory and Its Application, pp. 100–123. Science Press (2006)
A Clustering Method Based on Improved Density Estimation and Shared Nearest Neighbors Ying Guan1 , Yaru Li1 , Bin Li2 , and Yonggang Lu1(B) 1 School of Information Science and Engineering, Lanzhou University, Lanzhou
730000, Gansu, China [email protected] 2 Gansu New Vispower Technology Co. Ltd., No. 1689 Yanbei Road, Lanzhou 730000, Gansu, China
Abstract. Density-based clustering methods can detect clusters of arbitrary shapes. Most traditional clustering methods need the number of clusters to be given as a parameter, but this information is usually not available. And some density-based clustering methods cannot estimate local density accurately. When estimating the density of a given point, each neighbor of the point should have different importance. To solve these problems, based on the K-nearest neighbor density estimation and shared nearest neighbors, a new density-based clustering method is proposed, which assigns different weights to k-nearest neighbors of the given point and redefines the local density. In addition, a new clustering process is introduced: the number of shared nearest neighbors between the given point and the higher-density points is calculated first, the cluster that the given point belongs to can be identified, and the remaining points are allocated according to the distance between them and the nearest higher-density point. Using this clustering process, the proposed method can automatically discover the number of clusters. Experimental results on synthetic and real-world datasets show that the proposed method has the best performance compared with K-means, DBSCAN, CSPV, DPC, and SNN-DPC. Keywords: Clustering · Shared nearest neighbors · Density estimation method · Local density
1 Introduction Clustering [1–4] is the important process of pattern recognition, machine learning, and other fields. It can be used as an independent tool for data distribution and can be applied in image processing [3–6], data mining [3], intrusion detection [7, 8], and bioinformatics [9]. Without any prior knowledge, clustering methods assign points into different clusters according to their similarity such that points in the same clusters are similar to each other while points in the different clusters have a low similarity. Clustering methods are © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 18–31, 2022. https://doi.org/10.1007/978-3-031-13832-4_2
A Clustering Method Based on Improved Density Estimation and SNN
19
divided into different categories [10]: density-based clustering method, centroid-based clustering method, model-based clustering method, and grid-based clustering method. Commonly centroid-based clustering methods include K-means [11] and K-medoids [12]. This type of method performs clustering by judging the distance between the point and the cluster center. Therefore, these methods can only identify spherical or spherical-like clusters and need the number of clusters as a priori [13]. The density-based clustering methods can identify clusters of arbitrary shapes, and it is not sensitive to noise [2]. Common and representative density-based clustering methods include DBSCAN [14], OPTICS [15], DPC [16], etc. DBSCAN defines a density threshold by using a neighborhood radius Eps and the minimum number of points Minpts. Based on this, it distinguishes core points and noisy points. As an effective extension of the DBSCAN, OPTICS only need to determine the value of Minpts and generate an augmented cluster ranking that represents the density-based clustering structure of each point. The DPC proposed by Liao et al. is based on two assumptions: the cluster center is surrounded by neighbors with lower local density, the distance between the cluster center and points with high local density is relatively large. It can effectively identify high-density centers [16]. At present, there are still drawbacks to most density-based clustering methods. DBSCAN-like methods can produce good clustering results but they depend on the distance threshold [17]. To avoid it, ARKNN-DBSCAN [18] and RNN-DBSCAN [19] redefine the local density of points by using the number of reverse nearest neighbors. Jian et al. [20] proposed a clustering center recognition standard based on relative density relationship, which is less affected by density kernel and density difference. IDDC [21] uses the relative density based on K-nearest neighbor to estimate the local density of points. CSPV [22] is a potential-based clustering method, it replaces density with the potential energy calculated by the distribution of all points. The one-step clustering process in some methods may lead to continuity errors, that is, once a point is incorrectly assigned, then more points may be assigned incorrectly [23]. To solve this problem, Yu et al. [24] proposed a method that can assign the non-grouped points to the suitable cluster according to the evidence theory and the information of K-nearest neighbors, improving the accuracy of clustering. Liu et al. [25] proposed a fast density peak clustering algorithm based on shared nearest neighbor(SNN-DPC), which improves the clustering process and reduces the impact of density peak and allocation process on clustering results to a certain extent. However, the location and number of cluster centers still need to be manually selected from the decision graph. These methods have solved the problems existing in the current methods to a certain extent, but these methods do not consider the importance of points, which may result in inaccurate density calculation. This paper attempts to solve the inaccurate definition of local density and the errors caused by one-step clustering process. Therefore, a new density-based clustering method is proposed. Based on the K-nearest neighbor density estimation [26, 27] and the shared nearest neighbor [25, 28], we redefine the K-nearest neighbor density estimation to calculate the local density, which assigns the different importance for each neighbor of the given point. A new clustering process is proposed: the number of shared nearest neighbors between the given point and the higher-density point is calculated first, the cluster that the given point belongs to can be identified, and
20
Y. Guan et al.
the remaining points are allocated according to the distance between them and the nearest higher-density point. To some extent, it avoids the continuity error caused by directly assigning points to the cluster where the nearest higher-density neighbor is located. After calculating the local density, all points are sorted in the descending order, and then the cluster centers are selected from the points whose density is higher than the given point. Through this process, the method can automatically discover both the cluster center and the number of clusters. The rest of this paper is organized as follows. Section 2 introduces relevant definitions and the new clustering method we proposed. In Sect. 3, we discuss the experimental results on synthetic datasets and real-world datasets and compare the proposed method with other classical clustering methods according to several evaluation metrics. Section 4 summarizes the paper and discusses future work. Table 1 illustrates the symbols and notations used in this paper. Table 1. Symbols and notations xi
A point in the dataset X
d
The dimension of the feature vector in the dataset
N
The number of points in the dataset
D
The distance matrix of the dataset
K
The number of nearest neighbors
rk (i)
The radius of xi to its K-th nearest neighbor
Ki
The number of weighted points according to shared nearest neighbors of xi
k1
A coefficient to calculate the parameter K
KNN (i)
The K-nearest neighbor set of point xi
SNN (i, j)
The numbers of shared nearest neighbors between xi and xj
Ri
The radius of xi to its K-th shared nearest neighbor
Vi
The volume of the high-dimensional sphere with radius Ri
ω
The weight coefficient
ρk (i)
The estimated density of xi
2 Method A new clustering method is proposed according to the new density estimation method and new allocation strategy in the clustering process. And the new density estimation method is based on the K-nearest neighbor density estimation, which is the nonparametric density estimation method proposed by fix and Hodges [26]. K-nearest neighbor density estimation [26, 27] is a well-known and simplest density estimation method which is based on the concept: the density function of a continuity point can be estimated using the number of neighbors observed in a small region near the point. In the dataset
A Clustering Method Based on Improved Density Estimation and SNN
21
X [N ] = {xi }N i=1 , the estimation of the density function is based on the distance from xi to its K-th nearest neighbor. For points in different density regions, the neighborhood size determined by the K-nearest neighbors is adaptive, ensuring the resolution of highdensity regions and the continuity of low-density regions. For each xi ∈ Rd , the estimated density of xi is: ρk (i) =
K N ∗ Vd ∗ rk (i)d
(1)
where Vd is the volume of the unit sphere in Rd , K is the number of neighbors, rk (i) represent the distance from xi to the K-th nearest neighbor in the dataset X [N ]. 2.1 The New Density Estimation Method Based on the K-nearest neighbor density estimation, the new density estimation method is proposed. According to the number of shared neighbors between the given point and others, the volume of a region containing the K shared nearest neighbors of the √ given point is used to estimate the local density. Generally, the parameter K = k1 × N , k 1 is coefficient. Local density estimation is redefined as: ρk (i) =
Ki N ∗ Vi
(2)
where N is the number of points in the dataset. For point xi , Ki is the number of weighted points according to shared nearest neighbors, Vi is the volume of the highdimensional sphere with radius Ri , and all the spheres considered in the experiment are closed Euclidean spheres. For K-nearest neighbors [27, 29] of each point, it refers that selecting K points according to the distance between points. For points xi and xj in the dataset, the Knearest neighbor sets of xi and xj are defined as KNN (i) and KNN (j). Based on K-nearest neighbors, the shared nearest neighbors [25, 28] between xi and xj are their common K-nearest neighbor sets, expressed as: SNN (i, j) = KNN (i) ∩ KNN (j)
(3)
That is to say, the matrix SNN represents the numbers of shared nearest neighbors between points. The points are sorted in descending order according to the number of shared neighbors. Point xj is the K-th shared nearest neighbor of the given point xi , and the neighborhood radius Ri of point xi is defined as: Ri = D i, j (4) D is the distance matrix of the dataset, and the distance is Euclidean distance. Given the radius Ri of point xi , the volume of its neighborhood can be calculated by: Vi = Ri d
(5)
22
Y. Guan et al.
where d is the dimension of the feature vector in the dataset. In general, for point xi , the importance of its each neighbor is different, and the contribution to the density estimation of point xi should be different. In our definition, this contribution is related to the number of shared nearest neighbors between xi and its each neighbor. According to the number of shared neighbors between points xi and xj , the weight coefficient formula is defined to assign different weights to K-nearest neighbors of any point: ω(i, j) =
|SNN (i, j)| K
(6)
where |SNN (i, j)| is the number of shared neighbors between point xi and point xj . Ki is redefined by adding the different weights to K-nearest neighbors of point xi , as shown in Eq. 7: Ki =
K j=1
ω(i, j)
(7)
As the neighbor of xi , if xj has the more number of shared nearest neighbors with xi , that is, the weight of xj is bigger, xj has more contribution in calculating the local density of point xi . Using Eq. 6 and Eq. 7, Eq. 1 can be expressed in the form of Eq. 8. K ρk (i) =
j=1
|SNN (i,j)| K
N ∗ Vi
(8)
In summary, when calculating the local density of xi , different weights are added to K points falling in a neighborhood according to the number of shared neighbors. The more the number of shared nearest neighbors with xi , the greater contribution to the local density estimation of xi . 2.2 A New Allocation Strategy in the Clustering Process The allocation process of some clustering methods has poor fault tolerance. When one point is assigned incorrectly, more subsequent points will be affected, which will have a severe negative impact on the clustering results [23, 24]. Therefore, a new clustering process is proposed to make the allocation more reasonable and avoid the continuity error caused by direct allocation to a certain extent. In the proposed clustering method, all points are sorted in the descending order according to the local density value. The sorted index is stored in the array sortedIdx[1 . . . N ]. Then, in the sorted points queue, points are accessed one by one with the local density value from the highest to the lowest. The first point in the queue has the highest local density and automatically becomes the center of the first cluster. For each subsequent point sortedIdx[i] in the queue, two special points are identified: a point parent1 is the nearest point to sortedIdx[i] in the visited points, and a point parent2 is the point that has the most number of shared neighbors with sortedIdx[i]. The number of shared nearest neighbors between sortedIdx[i] and parent2 is compared with K/2. If it is at least half of K/2, sortedIdx[i] is assigned to the cluster where parent2 belongs.
A Clustering Method Based on Improved Density Estimation and SNN
23
Otherwise, the distance between sortedIdx[i] and parent1 is compared with the given distance bandwidth parameter B. If the distance is greater than parameter B, sortedIdx[i] is the center of the new cluster; if not, it is assigned to the cluster where point parent1 belongs. This process continues until all points are visited and assigned to the proper clusters. The detail of the proposed method is shown in Algorithm 1.
24
Y. Guan et al.
3 Experiments In this section, we use classical synthetic datasets and real-world datasets to test the performance of the proposed method. Moreover, we take K-means, DBSCAN, CSPV, DPC, and SNN-DPC as the control group. According to several evaluation metrics, the performance of the proposed method is compared with five classical clustering methods. 3.1 Datasets and Processing To verify the performance of the proposed method, we select real-world datasets and synthetic datasets with different sizes, dimensions, and the number of clusters. The synthetic datasets include Flame, R15, D31, S2, and A3. The real-world datasets include Iris, Wine, Seeds, Breast, Wireless, Banknote, and Thyroid. The characteristics of datasets used in the experiments are presented in Table 2. The evaluation metrics used in experiments are as followed: Normal Mutual Information (NMI) [30], adjusted Rand index(ARI) [30], and Fowlkes-Mallows index (FMI) [31]. The upper bound is 1, where larger values indicate better clustering results. Table 2. Characteristics of datasets Dataset
Points
Dimensions
Clusters
Type
Iris
150
4
3
Real-world
Wine
178
13
3
Real-world
Seeds
210
7
3
Real-world
Breast
699
9
2
Real-world
Wireless
2000
7
4
Real-world
Banknote
1372
4
2
Real-world
Thyroid
215
5
3
Real-world
Flame
240
2
2
Synthetic
R15
600
2
15
Synthetic
D31
3100
2
31
Synthetic
S2
5000
2
15
Synthetic
A3
7500
2
50
Synthetic
3.2 Parameters Selection We set parameters of each method to ensure the comparison of their best performance. The parameters corresponding to the optimal results of different methods are chosen. The real number of clusters is assigned to K-means, DPC, and SNN-DPC.
A Clustering Method Based on Improved Density Estimation and SNN
25
The proposed method needs two key parameters: the number of nearest neighbor K and the distance bandwidth B. The selection of parameter B [22] is derived from the distance matrix D: (9) MinD(i) = j=1,...N ,jmin =i D i, j max B = i=1,...N (MinD(i))
(10) √ The parameter K is selected by the formula K = k1 ∗ N to determine the relationship between K and N, k 1 is the coefficient. The parameter is related to the size of the dataset and clusters. In the proposed method, k 1 is limited in (0,9] to adapt to different datasets. Figure 1 shows the FMI indices of some representative datasets with different k 1 values. It can be seen that for datasets S2 and R15, the FMI index is not sensitive to k 1 when k 1 is within region (0, 1.5), and for the Wine dataset, the FMI index is not sensitive to k 1 within the whole region.
Fig. 1. Results on different datasets with different k 1
3.3 Experimental Results We conduct comparison experiments on 12 datasets and evaluate the clustering results with different evaluation metrics. In the following experiments, we first verify the effects of the new density estimation method, which is proposed in this paper. Then to test the effectiveness of the automatically discovered number of clusters, the proposed method is compared with the other methods. And the whole proposed method with the other five commonly used methods is compared. The New Density Estimation Method. Based on the original K-nearest neighbor density estimation [26], the new density estimation method is proposed, which is described in Sect. 2.1. To check if the new method improves the accuracy of the local density calculation, the comparison experiment is conducted between the original method and
26
Y. Guan et al.
the new method. Firstly, the original and the new density estimation method are used to estimate the local density of the points. Secondly, after the local density is calculated and sorted in descending order, the same clustering process is used to assign points, which is proposed by [22]. Finally, the clustering results of datasets are evaluated by different metrics, which are shown in Fig. 2 and Fig. 3. Compared with the original method, the new method is superior on most real-world datasets but is slightly poor on the Seeds. On the synthetic datasets R15, D31, S2, and A3, the new method has good clustering results, which is not much different from the original method. In summary, the new method shows an advantage over the original density estimation method.
Fig. 2. Comparison of density estimation with the original method on FMI.
Fig. 3. Comparison of density estimation with the original method on NMI.
The Effectiveness of the Automatically Discovered Number of Clusters. The comparison experiments are conducted among DBSCAN, CSPV, and the proposed method
A Clustering Method Based on Improved Density Estimation and SNN
27
to verify the validity of the automatically discovered number of clusters. The proposed method is not compared with K-means, DPC, and SNN-DPC because the real number of clusters is used in these methods. The experimental results are shown in Table 3. The accuracy of the discovered number of clusters by DBSCAN, CSPV, and the proposed method are 42%, 42%, and 83% respectively. On 5 synthetic datasets, the proposed method can correctly discover the number of clusters; the proposed method outperforms DBSCAN and CSPV on real-world datasets Iris, Seeds, Breast, and Wireless. In summary, the proposed method is better than DBSCAN and CSPV for automatically discovering the number of clusters. Table 3. The number of clusters discovered by different methods Datasets
Real number of clusters
Discovered number of clusters DBSCAN
CSPV
Ours
Iris
3
4
3
3
Wine
3
1
5
2
Seeds
3
2
7
3
Breast
2
2
3
2
Wireless
4
4
1
4
Banknote
2
10
6
3
Thyroid
3
6
2
2
Flame
2
2
2
2
R15
15
15
18
15
D31
31
31
31
31
S2
15
13
17
15
A3
50
48
50
50
Experiments on the Different Datasets. The experiments are conducted on the different datasets, and the experimental results are presented in Table 4. From Table 4, the proposed method has the best clustering results on real-world datasets Iris, Seeds, Breast, Wireless, and Banknote than other clustering methods. On the Wine, the result of the proposed method is the best. For the dataset Thyroid, the proposed method performs better than DPC and SNN-DPC.
28
Y. Guan et al. Table 4. Comparison of clustering results on datasets with three measures
Method
NMI
ARI
FMI
parm
Iris K-means
0.7582
NMI
ARI
FMI
parm
0.7166
0.8106
3 1.7/46
Seeds 0.7302
0.8208
3 0.4/3
DBSCAN
0.7196
0.7063
0.7972
CSPV
0.7355
0.5638
0.7635
DPC
0.8705
0.8858
0.9234
SNN-DPC
0.8851
0.9038
0.9355
Ours
0.9011
0.9222
0.6949 0.5651
0.5975
0.7327
0.6452
0.6057
0.7287
0.6
0.6744
0.7170
0.8106
5
0.7566
0.7549
0.8364
6
0.9478
2.1
0.7539
0.7666
0.8441
1.9
Banknote
0.8
Breast
K-means
0.0303
0.0485
0.5518
2
0.0111
−0.0128
0.7192
2
DBSCAN
0.6798
0.6653
0.8175
1.8/9
0.7304
0.8250
0.9187
4.3/11
CSPV
0.1543
0.0969
0.6457
0.0122
−0.0129
0.7191
DPC
–
–
–
0.0111
−0.0128
0.7191
1
SNN-DPC
0.6145
0.6309
0.8173
23
0.0915
−0.0516
0.6586
30
Ours
0.6719
0.7101
0.8449
6.5
0.7318
0.8336
0.9257
8.4
Wine
Wireless
K-means
0.4288
0.3711
0.5835
3
0.8904
0.8885
0.9165
4
DBSCAN
–
–
0.5813
0.1/2
0.7891
0.7897
0.8407
8/49
CSPV
0.3830
0.2697
0.5011
–
–
0.4996
DPC
0.3913
0.4070
0.6069
0.5
0.8674
0.8541
0.8909
SNN-DPC
0.4316
0.4485
0.6361
36
0.8997
0.8838
0.9130
33
Ours
0.4307
0.4025
0.67
4.5
0.9308
0.9491
0.9618
2.8
0.5791
0.8063
3
0.4534
0.7364
2
3.7/2
1.3/28
Thyroid K-means
0.4946
0.2
Flame
DBSCAN
0.5407
0.6986
0.8731
CSPV
0.0893
0.0314
0.7318
DPC
0.2128
0.1815
0.6241
SNN-DPC
0.3628
0.4402
0.7596
Ours
0.3612
0.2665
0.3989 0.8097
0.8970
0.9790
0.4669
0.4256
0.7207
0.1
1
1
1
5
0.8883
0.9337
0.9696
6
0.7703
1.6
1
1
1
3.8
R15
2.7
D31
K-means
0.9942
0.9928
0.9932
15
0.9523
0.9061
0.9029
31
DBSCAN
0.9922
0.9893
0.9900
0.7/29
0.9174
0.8501
0.8551
1.1/48
CSPV
0.9727
0.9451
0.9494
0.9537
0.9278
0.9301
DPC
0.9942
0.9928
0.9932
0.9579
0.9370
0.9390
0.1
1
SNN-DPC
0.9942
0.9928
0.9932
11
0.9660
0.9509
0.9525
41
Ours
0.9942
0.9928
0.9932
0.6
0.9648
0.9484
0.9501
0.7
(continued)
A Clustering Method Based on Improved Density Estimation and SNN
29
Table 4. (continued) Method
NMI
ARI
FMI
parm
S2
NMI
ARI
FMI
parm
A3
K-means
0.9463
0.9378
0.9420
15
0.9760
0.9175
0.9199
50
DBSCAN
0.8866
0.7631
0.7886
4.5/45
0.9454
.0.8693
0.8728
1.8/50
CSPV
0.9313
0.9103
0.9163
0.9860
0.9770
0.9774
DPC
0.9454
0.9370
0.9412
0.7
0.9880
0.9809
0.9813
SNN-DPC
0.9402
0.9280
0.9328
35
0.9860
0.9772
0.9776
25
Ours
0.9481
0.9399
0.9439
0.7
0.9912
0.9862
0.9865
0.9
0.3
The clustering results of the proposed method for the 5 synthetic datasets are shown in Fig. 4. For 5 synthetic datasets, on the datasets Flame, S2, and A3, the proposed method has the best clustering result, especially on the Flame, the same result as the original data label is obtained. On the dataset D31, the proposed method is slightly poor than the best. The proposed method generates the same results as K-means, DPC, and SNN-DPC on dataset R15, but the proposed method can discover the number of clusters automatically. On the synthetic datasets, the results of the proposed method are similar to SNN-DPC, but slightly better than SNN-DPC. And the proposed method outperforms the other five methods on most real-world datasets. In summary, the proposed method has more advantages and outperformance than other methods in the effectiveness of clustering results in most cases. These results show that our redefinition of local density and the new clustering process is effective.
Fig. 4. Clustering results of the proposed method on synthetic datasets
30
Y. Guan et al.
4 Conclusion In this paper, a new clustering method is proposed according to the K-nearest neighbor density estimation and shared nearest neighbors. When calculating the local density, the number and the different contributions of points in the neighborhood are considered, which improves the accuracy of local density calculation to a certain extent. This paper proves that the proposed method can adapt to most different datasets and using the improved local density estimation can improve the √clustering performance. The proposed method has a parameter K, the formula K = k1 × N is used to determine the relationship between K and N, and k 1 is the coefficient. Although k 1 is limited in a reasonable range, k 1 had a considerable influence on the clustering results in some datasets. As a possible direction for future work, we will explore the possibility of reducing the influence of the parameter K on the clustering results. Acknowledgment. This work was partially supported by the Gansu Provincial Science and Technology Major Special Innovation Consortium Project (Project No. 1), the name of the innovation consortium is Gansu Province Green and Smart Highway Transportation Innovation Consortium, and the project name is Gansu Province Green and Smart Highway Key Technology Research and Demonstration.
References 1. Omran, M., Engelbrecht, A.P., Salman, A.: An overview of clustering methods. Intell. Data Anal. 11(6), 583–605 (2007) 2. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011) 3. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999) 4. Zhang, C., Wang, P.: A new method of color image segmentation based on intensity and hue clustering. In: Proceedings 15th International Conference on Pattern Recognition, ICPR-2000, vol. 3, pp. 613–616. IEEE (2000) 5. Reddy, S., Parker, A., Hyman, J., Burke, J., Estrin, D., Hansen, M.: Image browsing, processing, and clustering for participatory sensing: lessons from a dietsense prototype. In: Proceedings of the 4th Workshop on Embedded Networked Sensors, pp. 13–17 (2007) 6. Khan, Z., Ni, J., Fan, X., Shi, P.: An improved k-means clustering algorithm based on an adaptive initial parameter estimation procedure for image segmentation. Int. J. Innov. Comput. Inf. Control 13(5), 1509–1525 (2017) 7. Portnoy, L.: Intrusion detection with unlabeled data using clustering. Ph.D. thesis, Columbia University (2000) 8. Guan, Y., Ghorbani, A.A., Belacel, N.: Y-means: a clustering method for intrusion detection. In: CCECE 2003-Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No. 03CH37436), vol. 2, pp. 1083–1086. IEEE (2003) 9. Frank, E., Hall, M., Trigg, L., Holmes, G., Witten, I.H.: Data mining in bioinformatics using Weka. Bioinformatics 20(15), 2479–2481 (2004) 10. Rui, X., Wunsch, D.I.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
A Clustering Method Based on Improved Density Estimation and SNN
31
11. Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of Berkeley Symposium on Mathematical Statistics Probability (1965) 12. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Hoboken (2005) 13. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010) 14. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: Density-based spatial clustering of applications with noise. In: International Conference on Knowledge Discovery and Data Mining, vol. 240, p. 6 (1996) 15. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, 1–3 June 1999 (1999) 16. Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014) 17. Li, H., Liu, X., Li, T., Gan, R.: A novel density-based clustering algorithm using nearest neighbor graph. Pattern Recogn. 102, 107206 (2020) 18. Pei, P., Zhang, D., Guo, F.: A density-based clustering algorithm using adaptive parameter k-reverse nearest neighbor. In: 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), pp. 455–458. IEEE (2019) 19. Bryant, A., Cios, K.: RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans. Knowl. Data Eng. 30(6), 1109–1121 (2017) 20. Hou, J., Zhang, A., Qi, N.: Density peak clustering based on relative density relationship. Pattern Recogn. 108(8), 107554 (2020) 21. Wang, Y., Yang, Y.: Relative density-based clustering algorithm for identifying diverse density clusters effectively. Neural Comput. Appl. 33(16), 10141–10157 (2021). https://doi.org/10. 1007/s00521-021-05777-2 22. Lu, Y., Wan, Y.: Clustering by sorting potential values (CSPV): a novel potential-based clustering method. Pattern Recogn. 45(9), 3512–3522 (2012) 23. Jiang, J., Chen, Y., Meng, X., Wang, L., Li, K.: A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process. Phys. A 523, 702–713 (2019) 24. Yu, H., Chen, L., Yao, J.: A three-way density peak clustering method based on evidence theory. Knowl.-Based Syst. 211, 106532 (2021) 25. Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018) 26. Fukunaga, K., Hostetler, L.: Optimization of k nearest neighbor density estimates. IEEE Trans. Inf. Theory 19(3), 320–326 (1973) 27. Dasgupta, S., Kpotufe, S.: Optimal rates for k-NN density and mode estimation. In: Advances in Neural Information Processing Systems, vol. 27 (2014) 28. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 47–58. SIAM (2003) 29. Qaddoura, R., Faris, H., Aljarah, I.: An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. Int. J. Mach. Learn. Cybern. 11(3), 675–714 (2020) 30. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837– 2854 (2010) 31. Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78(383), 553–569 (1983)
Bagging-AdaTSK: An Ensemble Fuzzy Classifier for High-Dimensional Data Guangdong Xue1 , Bingjie Zhang2 , Xiaoling Gong1 , and Jian Wang3(B) 1 College of Control Science and Engineering, China University of Petroleum (East China),
Qingdao 266580, China 2 School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China 3 College of Science, China University of Petroleum (East China), Qingdao 266580, China
[email protected] Abstract. Using fuzzy systems to deal with high-dimensional data is still a challenging work, even though our recently proposed adaptive Takagi-Sugeno-Kang (AdaTSK) model equipped with Ada-softmin can be effectively employed to solve high-dimensional classification problems. Facing high-dimensional data, AdaTSK is prone to overfitting phenomenon, which results in poor performance. While ensemble learning is an effective technique to help the base learners to improve the final performance and avoid overfitting. Therefore, in this paper, we propose an ensemble fuzzy classifier integrating an improved bagging strategy and AdaTSK model to handle high-dimensional classification problems, which is named as Bagging-AdaTSK. At first, an improved bagging strategy is introduced and the original dataset is split into multiple subsets containing fewer samples and features. These subsets are overlapped with each other and can cover all the samples and features to guarantee the satisfactory accuracy. Then, on each subset, an AdaTSK model is trained as a base learner. Finally, these trained AdaTSK models are aggregated together to conduct the task, which results in so-called BaggingAdaTSK. The experimental results on high-dimensional datasets demonstrate that Bagging-AdaTSK has competitive performance. Keywords: Ensemble learning · Ada-softmin · Adaptive Takagi-Sugeno-Kang (AdaTSK) · Classification · High-dimensional datasets
1 Introduction 1.1 A Subsection Sample Fuzzy system is an effective technique to address nonlinear problems, which has been successfully employed in the areas of classification, regression, and function approximation problems [3–5]. Takagi-Sugeno-Kang (TSK) fuzzy classifier with interpretable rules has attracted many research interests of the scholars and obtained its significant success [13, 21, 22]. In the fuzzy system, the triangular norm (T-norm) is used to compute the firing strengths of the fuzzy rules, where the product and minimum are two popularly-employed The original version of this chapter was revised: some table citations was presented incorrectly. This was corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-031-13832-4_72 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022, corrected publication 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 32–43, 2022. https://doi.org/10.1007/978-3-031-13832-4_3
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
33
ones [9]. When solving high-dimensional problems, the former most likely causes numeric underflow problem that the result is too close to 0 to be represented by the computer [20]. While the latter is not differentiable, which brings big difficulties to the optimization process. Therefore, using fuzzy systems to solve high-dimensional problems is still a challenging task [15]. Although many of approaches of dimensionality reduction are used and introduced in the design of fuzzy systems [8, 18], this can not tackle the challenge fundamentally. The approximator of the minimum T-norm, called softmin, is often used to replace it in fuzzy systems since softmin is differentiable [2, 6, 11]. Based on softmin, we proposed an adaptive softmin (Ada-softmin) operator to compute the firing strengths in [20]. Then, the Ada-softmin based TSK (AdaTSK) model was developed, which can be effectively used on high-dimensional datasets without any dimensionality reduction method. Nonetheless, it is prone to overfitting phenomenon when dealing with highdimensional problems. Ensemble learning is an effective technique to avoid overfitting phenomenon, which combines some base learners together to perform the given task [23]. It is known to all that the ensemble model outperforms single base learner even though the base learners are weak [12]. Bagging is a representative ensemble method which has been widely used in many real-world tasks [7, 16, 19], in which the base learners are built on bootstrap replicas of the training set. Specifically, a given number of samples are randomly drawn, with replacement, from the original sample set, which are repeated several times to obtain some training subsets. A classifier is trained as a base learner on each training subset. These trained base learners are integrated together using combination method to classify the new points. Obviously, it is possible for some original samples that they are not selected for any subset, which means some information is not used in the classification task. On the other hand, ensemble diversity, that is, the difference among the base learners, is one of the fundamental points of the ensemble learning [1, 14]. However, different bootstrap replicas generated by the aforementioned method may have the same sample, which limits the diversity among the base learners. In order to enhance the performance of AdaTSK model on dealing with the classification problems, we propose an improved bagging strategy and develop BaggingAdaTSK classifier by integrating the proposed bagging strategy on AdaTSK. The main contributions are summarized as follows: – Based on both sample and feature split, an improved bagging strategy is introduced. The subsets partitioned by this improved bagging strategy are capable of covering all the samples and features. On the other hand, each subset contains different samples, which guarantees that the diversity of the base learners is satisfactory. – We adopt the improved bagging strategy on our recently proposed AdaTSK model and develop an ensemble classifier, Bagging-AdaTSK, which is able to effectively solve high-dimensional datasets. Comparing the original AdaTSK, Bagging-AdaTSK achieves definite improvement on the accuracy. – The proposed Bagging-AdaTSK model demonstrates superior performance on 7 highdimensional datasets with feature dimensions varying from 1024 to 7129.
34
G. Xue et al.
The remainder of this paper is structured as follows. The AdaTSK classifier is reviewed in the first subsection of Sect. 2. Subsections 2.2 and 2.3 introduce the improved bagging strategy and the proposed Bagging-AdaTSK classifier, respectively. Subsection 2.4 analyses the computational complexity of Bagging-AdaTSK. The performance comparison and sensitivity analysis are described in Sect. 3. The Sect. 4 concludes this study.
2 Methodology In this section, we first review AdaTSK model for classification problems. Secondly, the improved bagging strategy is elaborated. At last, the proposed Bagging-AdaTSK model is introduced. 2.1 AdaTSK Classifier Consider a classification problem involving D features and C classes. Let a specific sample or data point be represented by x = (x1 , x2 , · · · , xD ) ∈ RD . The number of fuzzy sets defined each feature is denoted by S. In this investigation, we adopt so called compactly combined fuzzy rule base (CoCo-FRB) [20] to construct the fuzzy system. As a result, the number of rules, R, is equal to S. In general, the r th (r = 1, 2, · · · , R) fuzzy rule of the first-order TSK model with C-dimensional output is described as below: Ruler : IF x1 is Ar,1 and · · · and xD is Ar,D , D 1 + 1 THEN yr1 (x) = pr,0 d =1 pr,d xd , · · · , D C C C yr (x) = pr,0 + d =1 pr,d xd ,
(1)
where Ar,d (d = 1, 2, · · · , D) is the fuzzy set associated with the d th feature used in the rth rule, yrc (x)(c = 1, 2, · · · , C) means the output of the rth rule for the cth class c represents the consequent parameter of the rth rule associated computed from x and pr,d with the d th feature for the cth class. As R = S, Ar,d is also the rth (r = 1, 2, · · · , S) fuzzy set defined on the d th feature. Here, the fuzzy set Ar,d is modeled by the simplified Gaussian membership function (MF) [5, 20], μr,d (x) = e−(xd −mr,d ) , 2
(2)
where μr,d (x) is the membership value of x computed on Ar,d , xd is the d th (d = 1, 2, · · · , D) component of x and mr,d represents the center of the rth MF defined on the d th input variable. Note that the function only uses the d th component of x, even though the argument of μr,d is shown as x. In AdaTSK, the firing strength of the rth rule, fr (x), is computed by Ada-softmin, which is defined as ⎛ fr (x) = ⎝
q q q μr,1 (x) + μr,2 (x) + · · · + μr,D (x) D
⎞1 q ⎠ ,
(3)
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
where
qˆ =
690
, ln min μr,1 (x), μr,2 (x), · · · , μr,D (x)
35
(4)
and · is the ceiling function. Note that q is adaptively changed according to the current membership values. Since (3) satisfies the following formula: ⎛ lim ⎝
qˆ
qˆ
qˆ
μr,1 (x) + μr,2 (x) + · · · + μr,D (x) D
qˆ →−∞
⎞1 qˆ
⎠ = mind μr,d ,
(5)
Ada-softmin is an approximator of the minimum operator, in which (4) is used to acquire a proper value of q to help (3) to get the minimum of a group of membership values. Following [20], the lower bound of q is set to −1000 in the simulation experiments. If the q calculated by (4) is less than −1000, we let q be −1000. The cth (c = 1, 2, · · · , C) component of the system output on x is
yc (x) =
R
f (x)yrc (x), r=1 r
(6)
where fr (x) , f r (x) = R i=1 fi (x)
(7)
and c + yrc (x) = pr,0
D
pc xd , d =1 r,d
Fig. 1. The neural network structure of the first-order TSK fuzzy system.
(8)
36
G. Xue et al.
f r (x) is the normalized firing strength of the rth rule on x. As described in (1), yrc (x) is the output of the rth rule associated with the cth class computed from x. The neural network structure of the AdaTSK model is shown in Fig. 1. The first layer is the input layer of features. The second layer is the fuzzification layer of which the output is computed by (2) for each node. The third layer is the rule layer, in which D membership values are used together to compute a firing strength by Ada-softmin. The firing strengths are normalized though (7) in the fourth layer. The lower part with two fully connected layers represents the consequent parts behind “THEN” described in (1). Defuzzification process is realized by the last two layers, which is shown in (6). 2.2 The Improved Bagging Strategy When solving high-dimensional datasets, AdaTSK tends to fall into overfitting dilemma, which reduces the performance. In order to alleviate this issue, we construct an ensemble classifier based on AdaTSK in the framework of an improved bagging strategy. In this section, we introduce this strategy in detail. As an effective technique in ensemble learning, bagging randomly draws samples, with replacement, from the original training set to obtain several subsets. Here, we randomly split the samples and features into a group of subsets. Moreover, for each subset, part of samples and features are randomly selected from the remaining subsets to pour into this subset. Consequently, these subsets are overlapped with each other. Suppose that N data points along with their target labels are contained in the training set, which are represented as U = {(x1 , z1 ), (x2 , z2 ), · · · , (xN , zN )},
(9)
where xn and zn (n = 1, 2, · · · , N ) are the nth sample and its target label, respectively. Note that xn is a D-dimensional feature vector. The original training set is divided into K subsets by the following two steps. 1. The original N training sample with D features are randomly split into K equal subsets (to the extent possible), i.e., {U1 , U2 , · · · , UK }. The kth (k = 1, 2, · · · , K) subset, Uk , contains Nk samples with Dk features, where Nk < N and Dk < D. 2. For each Uk , we randomly select a proportion of samples and features from the remaining subsets, {U1 , · · · , Uk−1 , Uk+1 , · · · , UK }, and integrate them into Uk . Hence, both the number of samples, Nk , and the number of features, Dk , of Uk are increased. By doing this, K subsets that overlap each other are obtained. Although the same original sample is selected by two different subsets, they are not exactly the same as each subset contains a different set of features. Therefore, the ensemble diversity is guaranteed. For example, assume that 10 samples or features are going to be split into 3 folds, of which the index set is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Firstly, this index set is randomly divided into 3 equal subsets to the extent possible, like, {1, 5, 8, 9}, {6, 7, 10} and {2, 3, 4}. For each subset, such as {1, 5, 8, 9}, 50% elements of the remaining two subsets are randomly selected and incorporated into it. Then, {1, 5, 8, 9} is extended to
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
37
{1, 2, 5, 8, 9, 10}. After random selection, the final three subsets are {1, 2, 5, 8, 9, 10}, {3, 4, 5, 6, 7, 9, 10} and {1, 2, 3, 4, 6, 7, 9}. The index set mentioned in this example is applicable for sample indices as well as feature indices. Where 50% is explained as the overlap rate. Using the split strategy, a high-dimensional dataset is divided into several low-dimensional subsets. In this investigation, two different overlap rates are set for samples and features, which are denoted by ρ1 and ρ2 , respectively. Assume that ρ is a overlap rate of the samples or features, the proportion of the samples or features contained in a subset to the whole training set is γ =
K −1 1−ρ 1 + ρ=ρ+ K K K
(10)
It is obvious that the number of samples or features contained in a subset decreases as K increases. A smaller K means the number of samples or features divided into a subset is more. Both the overlap rates are between 0 and 1, to which the sensitivities are analysed in Sect. 2. 2.3 Bagging-AdaTSK Classifier In the framework of the improved bagging strategy, K AdaTSK models are independently trained as the base learners. After training, these AdaTSK models are aggregated to predict the target labels. This ensemble classifier is named Bagging-AdaTSK. Several combination methods are popularly used, such as voting and averaging [23]. Comparatively speaking, the final results given by the voting are with more logical interpretability, while the averaging usually achieves better accuracy. Here, the averaging method is used for Bagging-AdaTSK.
Fig. 2. The framework of Bagging-AdaTSK model.
38
G. Xue et al.
Suppose that the predicted output of the kth AdaTSK on the given sample, x, is ϕk (x). The system output of Bagging-AdaTSK is. Φ(x) =
1 K ϕk (x), k=1 K
(11)
where both Φ(x) and ϕ(x) are C-dimensional vector. The framework of the BaggingAdaTSK is shown in Fig. 2. Two methods are used to optimize the base learners of the proposed BaggingAdaTSK. i.e., the gradient descent (GD) algorithm and least square error (LSE) estimation. The loss function of an AdaTSK is defined as L=
2 1 Nk C c y (xn ) − yc (xn ) , n=1 c=1 2Nk
(12)
where Nk is the number of training samples in terms of the kth base learner, yc (xn ) and yc (xn ) respectively correspond to the cth component of the system output and the true label vector (transformed by one-hot encoding) for the nth input instance, xn (n = 1, 2, · · · , Nk ). The gradients of the loss function with respect to the centers and consequent parameters are ∂L 1 N = n=1 2f r (xn ) xn,d − mr,d ∂mr,d N (13) c c c , × C c=1 (y (xn ) − yc (xn )) yr (xn ) − y (xn ) and 1 N c ∂L = f y , − y (x ) (x ) (x )x n c n n n,d r c n=1 ∂pr,d N
(14)
respectively, where xn,d is the d th component of the sample xn . We use the following formula to update them in the tth iteration, ω(t+1) = ω(t) − η
∂L , ∂ω(t)
(15)
where ω indicates the general parameters of the centers and consequent parts, η > 0 is the learning rate. In addition, LSE estimation method is also used to optimize the consequent parameters with fixed antecedents. The elaborate procedure for the LSE estimation is provided in [10]. Hence, we do not provide the formulas for LSE method here. 2.4 The Computation Complexity of Bagging-AdaTSK Classifier Here we analyse the increment of the computation complexity of Bagging-AdaTSK comparing with AdaTSK. For an AdaTSK classifier, the computational cost in terms of one instance is O(3DR + (2D + 5)R + 2R + 2DCR + (2R − 1)C), i.e., O(DCR), in the forward propagation. Similarly, we can compute the computational complexity for
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
39
the back-propagation, which is also O(DCR) for each instance. As a consequence, the overall complexity of the AdaTSK is O(DCR). In the Bagging-AdaTSK, the input dimension of each base learner is denoted by Dk (k = 1, 2, · · · , K) which is smaller than D as the feature space K D CR , is split. The computation complexity of Bagging-AdaTSK is O k k=1 K where Dk = D(ρ2 + (1 − ρ2 )/K). Hence, O k=1 Dk CR can be rewritten as O((1 + (K − 1)ρ2 )DCR). Note that ρ2 is between 0 and 1 and we set it to a very small value, say 0.01 and 0.001, in the high-dimensional tasks. On the other hand, K is the number of base learners defined by the user, of which the value is not big. Therefore, it can be concluded that the computation complexity increment of Bagging-AdaTSK is not large comparing with the original AdaTSK. Table 1. Summary of the 7 classification datasets. Datasets
#Features
#Classes
Dataset size
ORL
1024
40
400
Colon
2000
2
62
SRBCT
2308
4
83
ARP
2400
10
130
PIE
2420
10
210
Leukemia
7129
2
72
CNS
7129
5
42
3 Simulation Results To demonstrate the effectiveness of Bagging-AdaTSK, it is tested on 7 datasets with feature dimensions varying from 1024 to 7129, which are regarded as high-dimensional datasets according to [20]. Table 1 summarizes the information of these datasets, which includes the number of features (#Features), the number of classes (#Classes) and, the size of dataset. 3.1 The Classification Performance of Bagging-AdaTSK In our experiments, three fuzzy sets are defined on each feature for all these 7 datasets, i.e., R = S = 3. The centers of the membership functions are evenly placed on the interval min x , xmax for each feature, where xmin and xmax are the minimum and maximum value of a feature on the input domain. Specifically, the centers are initialized by r−1 mr,d = xdmin + xdmax − xdmin , R−1
(16)
40
G. Xue et al.
Table 2. The classification results of RF, SVM, BLS, AdaTSK, and three Bagging-AdaTSK models with different optimization strategies. Datasets
RF
SVM
BLS
AdaTSK
Bagging-AdaTSK GD, p
LSE, p
GD, m + p
ORL (1024)
0.9120
0.9440
0.9555
0.9300
0.8768
0.9133
0.8975
Colon (2000)
0.7938
0.7819
0.7062
0.6000
0.7652
0.8155
0.7726
SRBCT (2308)
0.9629
0.9261
0.9703
0.8747
0.9760
0.9772
0.9744
ARP (2400)
0.8608
0.9892
0.9615
0.9754
0.9131
0.9577
0.9469
PIE (2420)
0.9824
0.9886
0.9905
0.9800
0.9914
1.0000
0.9971
Leukemia (7129)
0.9339
0.8546
0.9189
0.8000
0.9384
0.9443
0.9418
CNS (7129)
0.6360
0.7980
0.7730
0.6060
0.8135
0.8125
0.8030
where r = 1, 2, · · · , R, d = 1, 2, · · · , D, xdmin and xdmax represent the minimum and maximum value of the d th feature on the input domain. Since three fuzzy sets are defined on each feature, the values of the centers initialized for the d th feature is xdmin ,
xdmin +xdmax , xdmax 2
. All consequent parameters are initialized to zero.
We build 10 base learners in Bagging-AdaTSK, i.e., K = 10. In other words, 10 AdaTSK classifiers are trained in the framework of the improved bagging strategy. On the other hand, two overlap rates, ρ1 and ρ2 , need to be set in Bagging-AdaTSK. According to (10), if bigger ρ is used, the proportion of the samples or features contained in a subset to the whole training set is bigger. We wish that each subset has enough, but not too many, samples or features to help AdaTSK classifier to achieve satisfactory performance. For the datasets listed in Table 1, their sample sizes are small and the feature dimensions of them are high. Therefore, ρ1 and ρ2 are artificially set to 0.5 and 0.01, respectively. The sensitivity of Bagging-AdaTSK to the overlap rates is analysed in the next subsection. Ten-fold cross-validation mechanism [3, 17] is employed in the simulations, which is repeated 10 times to report the average classification performance of Bagging-AdaTSK. The classification results of Bagging-AdaTSK are compared with those of four algorithms, i.e., Random Forest (RF), Support Vector Machine (SVM), Broad Learning System (BLS) and AdaTSK, on the 7 high-dimensional datasets listed in Table 1. Three fuzzy sets or rules are used in AdaTSK model and each base leaner of Bagging-AdaTSK. Since 10 base learners are contained in Bagging-AdaTSK, which means total 30 rules are used in Bagging-AdaTSK. Correspondingly, 30 trees are adopted for RF. The comparison results are reported in Table 2, where the best results are marked in bold. Using different optimization strategies for Bagging-AdaTSK, three groups of results are obtained. In Table 2, the results listed in the first two columns of Bagging-AdaTSK are acquired by only optimizing the consequent parameters, for which GD and LSE method are used, respectively. As for the third column under Bagging-AdaTSK in Table 2, both the centers and consequent parameters are updated by GD method.
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
41
From Table 2, it is easy to conclude that Bagging-AdaTSK outperforms both RF and AdaTSK classifier no matter which aforementioned optimization strategy is used. Therefore, the proposed bagging strategy is effective and helps AdaTSK model to achieve better results. Among three groups of results in terms of Bagging-AdaTSK, the second one is the best. In other words, fixing the antecedents and using LSE to estimate the consequent parameters is the best optimization strategy for Bagging-AdaTSK when solving high-dimensional data. Comparing the first group of result with the third group of result of Bagging-AdaTSK, the conclusion that optimizing the centers improves the classification performance is drawn. However, optimizing the antecedents increases the computational burden. How to efficiently optimize the antecedents needs to be further studied.
Fig. 3. The sensitivity of Bagging-AdaTSK to the overlap rates.
3.2 Sensitivity of Bagging-AdaTSK to the Overlap Rates Since the sample and feature overlap rates, i.e., ρ1 and ρ2 , are the specific parameters to Bagging-AdaTSK, we investigate the sensitivity of the model to them in this section. As described in Subsect. 3.1, the second optimization strategy of Bagging-AdaTSK is the best. Therefore, the analysis of the sensitivity to these two overlap rates is based on LSE method. Both sample and feature overlap rates are set to the values of {0, 0.05, 0.1, 0.15, · · · , 0.95}. When we observe the sensitivity to the sample overlap rate, the feature overlap rate is set to 0.01. On the other hand, ρ1 is set to 0.5 when the sensitivity to the sample overlap rate is investigated. The average classification testing accuracies on 10 repeated experiments are shown in Fig. 3, where Fig. 3 (a) and Fig. 3 (b) correspond to the sample overlap rate and feature overlap rate, respectively.
42
G. Xue et al.
As shown in Fig. 3 (a), Bagging-AdaTSK is not sensitive to the sample overlap rate when ρ1 is greater than 0.3. While in the interval, the classification performance increases appreciably, especially on ORL, ARP and PIE datasets. An interesting observation is that each dataset among these three ones contains more classes than the others of four datasets listed in Table 1. Perhaps, when we conduct the classification task involving more classes, more samples are needed, which deserves further study. Additionally, if the sample overlap rate is set to 0, the classification accuracies are not very satisfactory on most of the datasets. This means that it is necessary to let the subsets overlap each other so that each subset contains more samples. From Fig. 3 (b), it can be seen that the classification performance of BaggingAdaTSK does not vary greatly with ρ2 . Hence, Bagging-AdaTSK is not sensitive to the feature overlap rate. Moreover, on the ORL and ARP datasets, the accuracy is lower with ρ = 0 than with other values, which denotes that the feature overlap rate is helpful. Since the datasets used here are with thousands of features, a small value of ρ2 , say 0.01 and 0.05, is recommended in order to reduce the computational burden.
4 Conclusion Focusing on improving the classification performance of AdaTSK model, we propose an ensemble classifier called Bagging-AdaTSK. Firstly, an improved bagging strategy is introduced, in which the original dataset is split into a given number of subsets from the view of the samples and features. These subsets contain different samples and features which are overlapped with each other. Then, an AdaTSK classifier is trained on each subset. After training, these AdaTSK classifiers are combined by the averaging method to obtain the final predicted labels. Bagging-AdaTSK classifier is suitable for solving highdimensional datasets as they can be divided into several of low-dimensional datasets to be handled. In our experiments, Bagging-AdaTSK are tested on 7 datasets with feature dimensions varying from 1024 to 7129. The simulation results demonstrate that our proposed Bagging-AdaTSK is very effective and outperforms its four counterparts, RF, SVM, BLS and AdaTSK classifier. In addition, we analyse the sensitivity of BaggingAdaTSK to the sample and feature overlap rates. The investigation results claim that the proposed model is not sensitive to the them. How to adaptively determine the optimal values of the two overlap rates depending on different problems is going to be further studied in the future work. Besides, we plan to develop more efficient algorithm to optimize the antecedent parameters to improve the performance of Bagging-AdaTSK.
References 1. Bian, Y., Chen, H.: When does diversity help generalization in classification ensembles? IEEE Trans. Cybern. (2022). In Press, https://doi.org/10.1109/TCYB.2021.3053165 2. Chakraborty, D., Pal, N.R.: A neuro-fuzzy scheme for simultaneous feature selection and fuzzy rule-based classification. IEEE Trans. Neural Networks 15(1), 110–123 (2004) 3. Chen, Y., Pal, N.R., Chung, I.: An integrated mechanism for feature selection and fuzzy rule extraction for classification. IEEE Trans. Fuzzy Syst. 20(4), 683–698 (2012)
Bagging-AdaTSK: An Ensemble Fuzzy Classifier
43
4. Ebadzadeh, M.M., Salimi-Badr, A.: IC-FNN: a novel fuzzy neural network with interpretable, intuitive, and correlated-contours fuzzy rules for function approximation. IEEE Trans. Fuzzy Syst. 26(3), 1288–1302 (2018) 5. Feng, S., Chen, C.L.P.: Fuzzy broad learning system: a novel neuro-fuzzy model for regression and classification. IEEE Trans. Cybern. 50(2), 414–424 (2020) 6. Gao, T., Zhang, Z., Chang, Q., Xie, X., Wang, J.: Conjugate gradient-based Takagi-Sugeno fuzzy neural network parameter identification and its convergence analysis. Neurocomputing 364, 168–181 (2019) 7. Guo, F., et al.: A concise TSK fuzzy ensemble classifier integrating dropout and bagging for high-dimensional problems. IEEE Trans. Fuzzy Syst. (2022). In Press, https://doi.org/10. 1109/TFUZZ.2021.3106330 8. Lau, C., Ghosh, K., Hussain, M.A., Hassan, C.C.: Fault diagnosis of Tennessee Eastman process with multi-scale PCA and ANFIS. Chemom. Intell. Lab. Syst. 120, 1–14 (2013) 9. Mizumoto, M.: Pictorial representations of fuzzy connectives, part I: cases of t-norms, tconorms and averaging operators. Fuzzy Sets Syst. 31(2), 217–242 (1989) 10. Pal, N.R., Eluri, V.K., Mandal, G.K.: Fuzzy logic approaches to structure preserving dimensionality reduction. IEEE Trans. Fuzzy Syst. 10(3), 277–286 (2002) 11. Pal, N.R., Saha, S.: Simultaneous structure identification and fuzzy rule generation for TakagiSugeno models. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 38(6), 1626–1638 (2008) 12. Pratama, M., Pedrycz, W., Lughofer, E.: Evolving ensemble fuzzy classifier. IEEE Trans. Fuzzy Syst. 26(5), 2552–2567 (2018) 13. Rini, D.P., Shamsuddin, S.M., Yuhaniz, S.S.: Particle swarm optimization for ANFIS interpretability and accuracy. Soft Comput. 20(1), 251–262 (2014). https://doi.org/10.1007/s00 500-014-1498-z 14. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1), 1–39 (2010) 15. Safari Mamaghani, A., Pedrycz, W.: Genetic-programming-based architecture of fuzzy modeling: towards coping with high-dimensional data. IEEE Trans. Fuzzy Syst. 29(9), 2774–2784 (2021) 16. Wang, B., Pineau, J.: Online bagging and boosting for imbalanced data streams. IEEE Trans. Knowl. Data Eng. 28(12), 3353–3366 (2016) 17. Wang, J., Zhang, H., Wang, J., Pu, Y., Pal, N.R.: Feature selection using a neural network with group lasso regularization and controlled redundancy. IEEE Trans. Neural Netw. Learn. Syst. 32(3), 1110–1123 (2021) 18. Wu, D., Yuan, Y., Huang, J., Tan, Y.: Optimize TSK fuzzy systems for regression problems: minibatch gradient descent with regularization, DropRule, and AdaBound (MBGD-RDA). IEEE Trans. Fuzzy Syst. 28(5), 1003–1015 (2020) 19. Xie, Z., Xu, Y., Hu, Q., Zhu, P.: Margin distribution based bagging pruning. Neurocomputing 85, 11–19 (2012) 20. Xue, G., Chang, Q., Wang, J., Zhang, K., Pal, N.R.: An adaptive neuro-fuzzy system with integrated feature selection and rule extraction for high-dimensional classification problems. arXiv: 2201.03187 (2022) 21. Zhang, T., Deng, Z., Ishibuchi, H., Pang, L.M.: Robust TSK fuzzy system based on semisupervised learning for label noise data. IEEE Trans. Fuzzy Syst. 29(8), 2145–2157 (2021) 22. Zhou, T., Chung, F.L., Wang, S.: Deep TSK fuzzy classifier with stacked generalization and triplely concise interpretability guarantee for large data. IEEE Trans. Fuzzy Syst. 25(5), 1207–1221 (2017) 23. Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton (2012)
Some Results on the Dominance Relation Between Conjunctions and Disjunctions Lizhu Zhang
and Gang Li(B)
School of Mathematics and Statistics, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China [email protected]
Abstract. The dominance relations on the class of aggregation operators have the vital application in various areas of science, including fuzzy set theory, probabilistic metric space. The dominance relations between conjunctions and disjunctions are studied in this paper. We characterize the conjunctions (disjunctions) which dominate all triangular conorms (triangular norms). Moreover, as a generalization of the dominance relation, the weak dominance relation between conjunctions and disjunctions is also discussed. Keywords: Dominance · Conjunction · Disjunction · Triangular conorm · Triangular norm
1 Introduction The concept of the dominance relation was introduced in [1]. Then, Schweizer and Sklar [2] discussed the dominance relation for the class of associative binary operations. The domination plays a very important role in constructing Cartesian products of probabilistic metric spaces. Moreover, the domination of t-norms is used in the construction of fuzzy orderings [4], fuzzy equivalence relations [5], the open question of the transitivity [6, 7], inflexible querying, game theory and preference modelling. These applications initiated the study of dominance relation in the broader context of aggregation functions [5, 8, 10, 14]. Especially, the domination between aggregation operations is also vital to the aggregation procedures preserving T-transitivity of fuzzy relations [3, 9]. Besides these applications above, dominance is still an interesting mathematical notion. Due to the fact that all t-norms have common neutral element and their associativity and commutativity, dominance of t-norms constitutes a reflexive and antisymmetric relation. Moreover, some classical inequalities and equations [11] are related with the dominance relation such as the Minkowski inequality and bisymmetry equation. In [13], the dominance relation between two quasi-overlap functions was dis-cussed. In [16, 17] the researchers studied some special problems of the dominance relation for conjunctions and triangular conorms. Sarkoci [18] examined the characterization of all t-seminorms dominating every triangular conorm. Bentkowska et al. [19] deals with some properties of dominance between binary operations defined in partially ordered set. Furthermore, the notion of weak dominance on the set of all two binary operations was © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 44–54, 2022. https://doi.org/10.1007/978-3-031-13832-4_4
Some Results on the Dominance Relation
45
introduced in [11]. It can be viewed as a generalization of the dominance relation. On the other hand, the weak dominance can also be treated as a generalization of modularity equation [15] which is related with the associativity of aggregation operations and is usually used in fuzzy theory. So, it is interesting to discuss the weak dominance relation for conjunctions and triangular conorms. The structure of the paper is as follows. Firstly, we recall some basic definitions of binary operations which will be used in the sequel and the notion of (weak) domination concerning two binary operations. In Sect. 3, we characterize the class of conjunctions which dominate each triangular conorm, and through the duality we obtain the class of disjunctions which dominate each triangular norm. Section 4 is devoted to the weak dominance case. Finally, we will close the contribution with a short summary.
2 Preliminaries We recall here definitions of some binary operations which will be used in the sequel. Definition 1 [20]. A conjunction (disjunction) is any increasing binary operation C(D) : [0, 1]2 → [0, 1], fulfilling C(0, 0) = C(0, 1) = C(1, 0) = 0, C(1, 1) = 1 (D(1, 1) = D(0, 1) = D(1, 0) = 0, D(0, 0) = 1). If a conjunction C (disjunction D) has a neutral element e ∈ [0, 1] (i.e. C(x, e) = C(e, x) = x, ∀x ∈ [0, 1]) such that e = 1 (e = 0), then it is called a triangular seminorm (triangular semiconorm). Definition 2 [20]. A t-seminorm (t-semiconorm) is any increasing binary operation T (S) : [0, 1]2 → [0, 1] with neutral element 1 (0). Remark 1. Any t-seminorm T (t-semiconorm S) fulfils T (x, y) ≤ min(x, y) (S(x, y) ≥ max(x, y)) for all x, y ∈ [0, 1]. Example 1. The operation T : [0, 1]2 → [0, 1] given by ⎧ (x, y) ∈ [0, 21 ]2 , ⎨0 1 1 1 T (x, y) = 2(x − 2 )(y − 2 ) + 2 (x, y) ∈] 21 , 1]2 , ⎩ min(x, y) otherwise.
(1)
is a t-seminorm. A triangular seminorm (semiconorm) is called a triangular norm (triangular conorm) if it is associative and commutative. Definition 3 [21]. A t-norm is a two place function T : [0, 1]2 → [0, 1], such that for all x, y, z ∈ [0, 1] the following conditions are satisfied:
46
(a) (b) (c) (d)
L. Zhang and G. Li
T (x, y) = T (y, x). T (T (x, y), z) = T (x, T (y, z)). T (x, y) ≤ T (y, z) whenever y ≤ z. T (x, 1) = x.
The four basic t-norms TM , TP , TL , and TD are usually discussed in literature. They are defined by, respectively: TM (x, y) = min(x, y), TP (x, y) = x · y, TL (x, y) = max(x + y − 1, 0), TD (x, y) =
0 (x, y) ∈ [0, 1[2 , min(x, y) otherwise.
By duality we can obtain the definition of t-conorm and the four basic t-conorms SM , SP , SL , and SD [21]. Now we recall the notion of domination concerning two binary operations. Definition 4 [1]. Consider two binary functions F, G : [0, 1]2 → [0, 1]. We say that F dominates G, and denoted by F G, if F(G(a, b), G(c, d )) ≥ G(F(a, c), F(b, d ))
(2)
for all a, b, c, d ∈ [0, 1]. Remark 2. It is obvious that t-norm TM dominates every increasing operation. Every increasing operation dominates t-conorm SM . Moreover, every t-norm dominates itself and TD .
3 The Characterization of Conjunctions Dominating All Triangular Conorms In this section, along the study line in [18], we offer the characterizations of the tseminorms and conjunctions which dominate all triangular conorms. For the t-seminorm which dominates all t-conorms [18], the following result holds. Theorem 1. The t-seminorm C dominates the class of all t-conorms if C(x, y) ∈ {0, x, y} for any x, y ∈ [0, 1].
(3)
Some Results on the Dominance Relation
47
Proof. For the completeness of this paper. We provide the proof here. By the Remark 1, we know that C(x, y) ≤ min(x, y), ∀(x, y) ∈ [0, 1]2 . Let S be any t-conorm and x, y, u, v ∈ [0, 1]. If C(x, y) ∈ {0, x, y} for all (x, y) ∈ [0, 1] then the following cases are considered. – If C(x, u) = C(y, v) = 0, then we get C(S(x, y), S(u, v)) ≥ 0 = S(0, 0) = S(C(x, u), C(y, v)). – If C(x, u) = min(x, u) > 0 and C(y, v) = 0, we have S(C(x, u), C(y, v)) = S(min(x, u), 0) = min(x, u) and min(x, u) = C(x, u) ≤ C(S(x, y), S(u, v)). – If C(x, u) = 0 and C(y, v) = min(y, v) > 0. The proof is similar to the above case. – If C(x, y) = min(x, y) > 0 and C(y, v) = min(y, v) > 0. then C(S(x, y), S(u, v)) > 0 and C(S(x, y), S(u, v)) = min(S(x, y), S(u, v)). Moreover, by the monotonicity of t-conorm S, we have S(x, y) ≥ S(C(x, u), C(y, v)) and S(u, v) ≥ S(C(x, u), C(y, v)). Hence, C(S(x, y), S(u, v)) ≥ S(C(x, u), C(y, v)). Note that one cannot replace a t-seminorm by an arbitrary conjunction in Theorem 1. Next, we consider the case of the increasing operator dominating any triangular conorm. Theorem 2 [22]. Let C : [0, 1]2 → [0, 1] be an increasing operation. If C ≤ TM and dominates every t-conorm, if and only if C(x, y) ∈ {0, min(x, y)} for any x, y ∈ [0, 1]. Directly from Theorems 1 and 2 we obtain the following result.
(4)
48
L. Zhang and G. Li
Theorem 3 [22]. If C is a conjunction fulfilling the condition (4), then it dominates every t-conorm. Note that there exist conjunctions such that (3) hold but do not satisfy (4), for example, C defined by max(x, y) (x, y) ∈ [ 21 , 1]2 , (5) C(x, y) = min(x, y) otherwise. is a conjunction which does not dominate t-conorm SP . Example 2. By Theorem 3 the operation C : [0, 1]2 → [0, 1] given by C(x, y) = TM (x, y) = min(x, y)
(6)
dominates any t-conorm. This result also appears in Remark 2. Now, we characterize the binary operations dominating every triangular conorm. Theorem 4. Let C be a binary operation with C(1, 1) = 1. Then C is conjunction satisfying condition (4) if and only if there exists a decreasing function h : [0, 1] → [0, 1] such that ⎧ y < h(x) ⎨0 C(x, y) = min(x, y) (7) y > h(x) ⎩ 0 or min(x, y) y = h(x) and in intervals of constant values of function h with ⎧ y < au ⎨0 C(x, u) = min(x, u) y > au ⎩ 0 or min(x, u) y = au
(8)
Moreover, u ∈ [0, 1], Eu = {x : h(x) = u}, mu = inf Eu , nu = sup Eu , au ∈ [mu , nu ]. Proof. (Necessity) Define a set Ox = {y ∈ [0, 1] : C(x, y) = 0}, and let h(x) = sup Ox . According to the definition of conjunction, we have C(x, 0) ≤ C(1, 0) = 0, so 0 ∈ Ox , Ox is non-empty. Next we prove h is decreasing. First we note h(0) = 1, because C(0, 1) = 0. Let x < y. If h(x) = 1 then h(x) ≥ h(y). If h(x) < 1 then for all l > h(x), C(x, l) = min(x, l). By the monotonicity of C we have C(y, l) ≥ C(x, l) = min(x, l) > 0. Therefore, for all l > h(x) with C(y, l) = min(y, l). It means h(y) ≤ h(x). So, h is decreasing. Now, we prove that au ∈ [mu , nu ]. Let u ∈ [0, 1] and mu < nu . Define au = sup{x : C(x, u) = 0}. We divide the proof into two parts:
Some Results on the Dominance Relation
49
– If au < mu , then mu > 0 and u < 1. Let x ∈ (au , mu ). According to the definition of the set Eu and the monotonicity of the function h, we have h(au ) ≥ h(x) > h(mu ) = u. By (3) we get C(x, u) = 0, which leads to a contradiction. – If au > nu then nu < 1 and u > 0. Let x ∈ (nu , au ). According to the definition of the set Eu and the monotonicity of the function h, we have h(au ) ≤ h(x) < h(nu ) = u. By (3) we obtain C(x, u) = min(x, u) > 0, which leads to a contradiction. So, au ∈ [mu , nu ]. By the definition of the point au and (7) we obtain (8). (Sufficiency) Directly by (7) and (8) we deduce (4). Moreover, for all x ∈ [0, 1], because min(x, 0) = 0 we have C(0, 0) = C(0, 1) = C(1, 0) = 0, C(1, 1) = 1. And then we prove C is increasing. First, we prove the monotonicity with respect to the first variable, we consider a few cases. – If C(x, v) = min(x, v), then v ≥ h(x). By monotonicity of h we have v ≥ h(y) and by (7) and (8) we have C(y, v) = min(y, v) ≥ min(x, v) = C(x, v). – If C(x, v) = 0 ≤ C(y, v). It means that operation C is increasing with respect to the first variable. Next, we prove operation C is increasing with respect to the second variable. Let x, y, v ∈ [0, 1] and x < y. We prove it in the following cases. – If y ≤ h(v), then x < h(v) and C(v, x) = 0 ≤ C(v, y). – If x ≥ h(v), then we have y > h(v) and C(v, y) = min(v, y) ≥ min(v, x) ≥ C(v, x). – If x < h(v) < y, then we have C(v, y) = min(v, y) ≥ 0 = C(v, x). So, C is a conjunction and satisfies (4). The t-norms satisfying (4) can be characterized in the following result. Theorem 5. Let C : [0, 1]2 → [0, 1] be an increasing operation. Then C is a t-norm satisfying (4) if and only if there exists a subset I of ]0, 1[2 with the following properties:
50
L. Zhang and G. Li
i. I is symmetric, i.e., (x, y) ∈ I implies (y, x) ∈ I . ii. For all (x, y) ∈ I we have]0, x]×]0, y] ⊆ I . such that C is given by C(x, y) =
0 (x, y) ∈ I min(x, y) otherwise.
(9)
Proof. (Sufficiency) The commutativity (a) and the boundary condition (d) are satisfied by definition. And (c) is obvious. Then, we prove the associativity (b) of C. We divide the proof into three parts. – For all x, y, z ∈ / I , in this case, let x ≤ y and z ≤ y then we have C(C(x, y), z) = C(min(x, y), z) = C(x, z) = C(x, min(y, z)) = C(x, C(y, z)). – For all x, y, z ∈ I , in this case, we obtain C(C(x, y), z) = C(0, z) = 0 = C(x, 0) = C(x, C(y, z)). – If at most one of the values x, y and z is contained in I , then C(C(x, y), z) = min(x, y, z) = C(x, C(y, z)). (Necessity) If C is a t-norm satisfying (4), then C is a conjunction and satisfies (4). By the proof of Theorem 4, the set I = {(x, y) : C(x, y) = 0} and h(x) = sup{y : (x, y) ∈ I }. By the commutativity of t-norm C and Theorem 4, the set I satisfies the conditions (i) and (ii) and C has the form (9). By Proposition 9 and Remark 10 in [12], we can characterize continuous binary operations satisfying condition (4). Theorem 6. Let C : [0, 1]2 → [0, 1] be a continuous binary operation satisfying C(0, 0) = C(0, 1) = C(1, 0) = 0, C(1, 1) = 1 and (4). Then C = TM . By duality we may obtain a similar characterization of disjunctions which are dominated by any t-norm. Theorem 7 [22]. If D : [0, 1]2 → [0, 1] is a disjunction satisfying D(x, y) ∈ {1, max(x, y)}
(10)
for any x, y ∈ [0, 1], then D dominated by every t-norm. Theorem 8. Let D be a binary operation with D(0, 1) = D(1, 0) = D(1, 1) = 1. Then D is a disjunction satisfying condition disjunction dominates all t-norms if and only if, there exists a decreasing function g : [0, 1] → [0, 1] such that ⎧ y > g(x) ⎨1 D(x, y) = max(x, y) (11) y < g(x) ⎩ 1 or max(x, y) y = g(x)
Some Results on the Dominance Relation
in intervals of constant values of function g with ⎧ y > as ⎨1 D(x, u) = max(x, s) y < as ⎩ 1 or max(x, s) y = as
51
(12)
Moreover, s ∈ [0, 1], Es = {x : g(x) = u}, ms = inf Es , ns = sup Es , as ∈ [ms , ns ]. Theorem 9. Let D : [0, 1]2 → [0, 1] be an increasing operation. Then D is a t-conorm satisfying (10) if and only if there exists a subset I of ]0, 1[2 with the following properties: iii. I is symmetric, i.e., (x, y) ∈ I implies (y, x) ∈ I . iv. For all (x, y) ∈ I we have [x, 1[×[y, 1[⊆ I . such that D is given by D(x, y) =
1 (x, y) ∈ I max(x, y) otherwise.
(13)
Theorem 10. Let D : [0, 1]2 → [0, 1] be a continuous binary operation satisfying D(0, 0) = 0, D(0, 1) = D(1, 0) = D(1, 1) = 1 and (10). Then D = SM .
4 The Characterization of Disjunctions Weakly Dominating All Triangular Norms We recall the definition of weak dominance about two binary operations. Definition 5 [1]. Consider two binary functions F, G : [0, 1]2 → [0, 1], we say that F weakly dominates G, and denoted by F >> G, if F(G(a, b), c) ≥ G(F(a, c), b)
(14)
for all a, b, c ∈ [0, 1]. For the relation between dominance and weak dominance, the following statement holds. Proposition 1. Consider two binary functions F, G : [0, 1]2 → [0, 1] having a common neutral element e ∈ [0, 1]. If F G then F >> G. Proof. Taking d = e in (2), we get the result. Contrary to Theorem 1, we have the following result about the t-seminorms weakly dominating all t-conorms. Proposition 2. There exists no t-seminorm F and t-conorm G such that F weakly dominates G.
52
L. Zhang and G. Li
Proof. On the contrary, suppose that there exist a t-seminorm F and t-conorm G such that F >> G. Then for arbitrary x ∈ [0, 1], taking a = 1, b = c = x in (14), we have x = F(1, x) = F(G(1, x), x) ≥ G(F(1, x), x) = G(x, x). Since G(x, x) ≥ SM (x, x) = x, G(x, x) = x for arbitrary x ∈ [0, 1]. Hence, G = SM by the dual result of Proposition 1.9 in [20]. Taking a = 0,b = c = x in (14), we have F(x, x) = F(G(0, x), x) ≥ G(F(0, x), x) = G(0, x) = x. Since F(x, x) ≤ F(1, x) = x, F(x, x) = x for arbitrary x ∈ [0, 1]. However, taking a = c = 41 , b = 34 , we have 1 3 3 1 F(G(a, b), c) = F( , ) ≤ < = G(F(a, c), b). 4 4 4 4 Hence, the result holds. Corollory 1. There exists no t-seminorm which weakly dominates every t-conorm. Proposition 3. Let D be a t-semiconorm. If D weakly dominates every t-norm T then D(a, c) ∈ {1, max(a, c)} for all a, c ∈ [0, 1]. Proof. On the contrary, suppose that there exists a ≤ c ∈]0, 1[ such that c < D(a, c) = d < 1. Then by Proposition 3.63 in [21], we can construct a t-norm defined by 0 (x, y) ∈]0, c]×]0, 1[∪]0, 1[×]0, c], T (x, y) = min(x, y) otherwise. Taking b ∈]c, d [ in (14), we have c = D(0, c) = D(T (a, b), c) ≥ T (D(a, c), b) = T (d , b) = b, a contradiction with the assumption. Remark 3. i. If a t-semiconorm D satisfying D(a, c) ∈ {1, max(a, c)} for all a, c ∈ [0, 1] then D may not weakly dominate any t-norm T which can be compared with Theorem 1 in Sect. 3 (see Example 3 below). ii. If the t-semiconorm D weakly dominates every t-norm T then D has the form (13) by Theorem 9 and Proposition 1. Example 3. Let the t-semiconorms D : [0, 1]2 → [0, 1] given by 1 (x, y) ∈]0, 1] × [0.5, 1] ∪ [0.5, 1]×]0, 1], D(x, y) = max(x, y) otherwise.
Some Results on the Dominance Relation
53
It is obvious that D(a, c) ∈ {1, max(a, c)} for all a, c ∈ [0, 1]. Consider t-norm TL (x, y) = max{0, x + y − 1}. Taking a = 0.1, b = 0.6, c = 0.5, we have D(TL (a, b), c) = 0.5 < 0.6 = TL (D(a, c), b). Hence, t-semiconorms D does not weakly dominate t-norm TL . Moreover, by Proposition 3 and Example 3, we also know that the similar results about disjunctions do not hold comparing to Theorems 2 and 3 in Sect. 3. Remark 4. i. Let D : [0, 1]2 → [0, 1] be an increasing operation. If D ≥ SM and weakly dominates every t-norm, then D(x, y) ∈ {0, max(x, y)}
(15)
for any x, y ∈ [0, 1]. The converse statement is not true (see Example 3 above). ii. If a disjunction D satisfying D(a, c) ∈ {1, max(a, c)} for all a, c ∈ [0, 1], then D may not weakly dominate any t-norm T (see Example 3 above).
5 Conclusion The dominance relation between binary operators is an interesting relation that arises in some mathematical problems, such as the retention of certain properties of fuzzy relations which is important in fuzzy decision. In this paper, we characterize the conjunctions dominating all triangular conorms in different cases and offer the dual results. Furthermore, we also partially provide the characterization of t-semiconorms which weakly dominate all t-norms. Acknowledgements. This work is supported by National Nature Science Foundation of China under Grant 61977040 and Natural Science Foundation of Shandong Province under Grant ZR2019MF055.
References 1. Tardiff, R.M.: Topologies for probabilistic metric spaces. Pacific J. Math. 65(1), 233–251 (1976) 2. Schweizer, B., Sklar, A.: Probabilistic metric spaces, North-Holland Series in Probability and Applied Mathematics. North-Holland Publishing Co., New York (1983) 3. Saminger, S., Mesiar, R., Bodenhofer, U.: Domination of aggregation operators and preservation of transitivity. Internat. J. Uncertain. Fuzziness Knowledge-Based Systems 10(1), 11–35 (2002) 4. Bodenhofer, U.: A Similarity-Based Generalization of Fuzzy Orderings. Universitätsverlag Rudolf Trauner Linz, Austria (1999) 5. De Baets, B., Mesiar, R.: T-partitions. Fuzzy Sets Syst. 97(2), 211–223 (1998)
54
L. Zhang and G. Li
6. Petr´lłk, M.: On generalized mulholland inequality and dominance on nilpotent triangular norms. In: 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems (IFSA-SCIS), pp. 1– 6 (2017) 7. Petr´lłk, M.: Dominance on strict triangular norms and Mulholland inequality. Fuzzy Sets Syst. 335, 3–17 (2018) 8. D´lłaz, S., Montes, S., De Baets, B.: Transitivity bounds in additive fuzzy preference structures. IEEE Trans. Fuzzy Syst. 15(2), 275–286 (2007) 9. Bentkowska, U., Król, A.: Preservation of fuzzy relation properties based on fuzzy conjunctions and disjunctions during aggregation process. Fuzzy Sets Syst. 291, 98–113 (2016) 10. Mesiar, R., Saminger, S.: Domination of ordered weighted averaging operators over t-norms. Soft. Comput. 8(8), 562–570 (2004) 11. Alsina, C., Schweizer, B., Frank, M.J.: Associative Functions: Triangular Norms and Copulas. World Scientific, Hackensack (2006) 12. Martín, J., Mayor, G., Suñer, J.: On binary operations with finite external range. Fuzzy Sets Syst. 146(1), 19–26 (2004) 13. Mezzomo, I., Frazão, H., Bedregal, B., da Silva Menezes, M.: On the dominance relation between ordinal sums of quasi-overlap functions. In: 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7 (2020) 14. Petr´lłk, M.: Dominance on continuous Archimedean triangular norms and generalized Mulholland inequality. Fuzzy Sets Syst. 403, 88–100 (2021) 15. Su, Y., Riera, J.V., Ruiz-Aguilera, D., Torrens, J.: The modularity condition for uninorms revisited. Fuzzy Sets Syst. 357, 27–46 (2019) 16. Drewniak, J., Dryga´s, P., Dudziak, U.: Relation of domination, Abstracts FSTA, pp. 43–44 (2004) 17. Drewniak, J., Król, A.: On the problem of domination between triangular norms and conorms. J. Electr. Eng. 56(12), 59–61 (2005) 18. Sarkoci, P.: Conjunctors dominating classes of t-conorms. In: International Conference on fuzzy Sets Theory and its Applications, FSTA (2006) 19. Bentkowska, U., Drewniak, J., Dryga´s, P., Król, A., Rak, E.: Dominance of binary operations on posets. In: Atanassov, K.T., et al. (eds.) IWIFSGN 2016. AISC, vol. 559, pp. 143–152. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-65545-1_14 20. De Cooman, G., Kerre, E.E.: Order norms on bounded partially ordered sets. J. Fuzzy Math. 2(2), 281–310 (1994) 21. Klement, E.P., Mesiar, R., Pap, E.: Triangular Norms. Springer, New York (2013). https://doi. org/10.1007/978-94-015-9540-7 22. Dryga´s, P., Król, A.: Some remarks on the domination between conjunctions and disjunctions. Ann. Univ. Paedagog. Crac. Stud. Math. 45(1), 67–75 (2007)
Robust Virtual Sensors Design for Linear Systems Alexey Zhirabok1,2 , Alexander Zuev1,2(B) , Vladimir Filaretov1,3 Changan Yuan4 , Alexander Protcenko1,2 , and Kim Chung Il1
,
1 Far Eastern Federal University, Vladivostok 690091, Russia
[email protected]
2 Institute of Marine Technology Problems, Vladivostok 690091, Russia 3 Institute of Automation and Control Processes, Vladivostok 690014, Russia 4 Guangxi Academy of Science, Nanning 530007, China
Abstract. The problem of virtual sensors design to solve the problems of control and fault diagnosis in nonlinear systems is studied. The problem is solved in three steps: at the first step, the linear model invariant with respect to the disturbance is designed; at the second step, a possibility to estimate the given variable is checked; finally, stability of the observer is obtained. The relations to design virtual sensor of minimal dimension estimating given component of the state vector of the system are obtained. The theoretical results are illustrated by practical example. Keyword: Linear systems · Virtual sensors · Canonical forms · Reduced order models
1 Introduction Different sensors are an integral part of modern complex technical systems, they are used for measuring the state vector components, in particular, to solve the problems of control and fault diagnosis. Clearly, the more components are measured, the simpler solution can be obtained. The use of additional physical sensors may result in extra expenses and not always can be realized in practice. Besides, physical sensors are of not high reliability. In this case, virtual sensors are of most interest. There are many papers considering different problems in design and application of virtual sensors [1–9, 12–16]. Most of these papers consider different practical applications of virtual sensors: for health monitoring of automotive engine [1], for active reduction of noise in active control systems [3], for hiding the fault from the controller point of view [5], in walking legged robots [6], for failure diagnosis in morphing aircraft to increase its reliability [7], in the process of fault detection in industrial motor [8], for fault detection, isolation and data recovery in a bicomponent mixing machine [9], in the sensor-cloud platform [13]. A new architectural paradigm for remotely deployed sensors whereby a sensor’s software is separated from the hardware are presented in [14]. In [2, 12], different theoretical aspects of using virtual sensors in linear systems © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 55–66, 2022. https://doi.org/10.1007/978-3-031-13832-4_5
56
A. Zhirabok et al.
are considers; in [15], virtual sensors are used for fault tolerant control in linear descriptor systems. Detailed procedure to design virtual sensors of full dimension for linear systems is suggested in [4]. The main contribution of the present paper is that virtual sensors of minimal dimension for linear systems estimating prescribed components of the state vector are designed. Such sensors are insensitive or have minimal sensitivity to the disturbance. The set of the prescribed components depends on the problem of control or fault diagnosis under consideration.
2 Problem Solution 2.1 The Mail Models Consider systems described by linear dynamic model x˙ (t) = Fx(t) + Gu(t) + Lρ(t), y(t) = Hx(t),
(1)
where x(t) ∈ Rn , u(t) ∈ Rm , y(t) ∈ Rl are vectors of state, control and output, F, G,H , and L are known constant matrices, ρ(t) ∈ Rp is the unmatched disturbance, it is assumed that ρ(t) is an unknown bounded function of time and ρ(t) ≤ ρ∗ . The problem is as follows: given the variable yv (t) = Hv x(t) for known matrix Hv , construct virtual sensor of minimal dimension estimating the variable yv (t). Solution of the problem is based on the reduced order model of the original system estimating the variable yv (t) and insensitive or having minimal sensitivity to the disturbance. Such a model can be constructed using different canonical forms, in particular, identification canonical form (ICF) and Jordan canonical form (JCF). In addition to yv (t), the ICFbased model should estimate some output variable y∗ (t) to generate the residual r(t) to guarantee stability. Such a model is described by x˙ v (t) = F∗ xv (t) + G∗ u(t) + J∗ y0 (t), yv (t) = H∗v xv (t) + Qy(t), y∗ (t) = H∗ xv (t), r(t) = R∗ y(t) − y∗ (t),
(2)
where x∗ (t) ∈ Rk , k < n, is the state vector, F∗ , G∗ , J∗ , H∗ , H∗v , Q, and R∗ are matrices to be determined, y(t) H x(t) = y0 (t) = H0 x(t) = . yv (t) Hv The JCF-based model ensures stability by construction and is described by x˙ v (t) = F∗ xv (t) + G∗ u(t) + J∗ y0 (t), yv (t) = H∗v xv (t) + Qy(t).
(3)
Robust Virtual Sensors Design for Linear Systems
57
If ICF- or JCF-based model insensitive to the disturbance exists, the problem can be solved by Luenberger observer based on the model (2) or (3). Otherwise the robust model with minimal sensitivity to the disturbance is designed. The estimation of the variable yv (t) in this case is provided by observer which is based on such a model. Initially consider the ways to design ICF- and JCF-based models insensitive to the disturbance. 2.2 ICF-Based Model Design The problem is solved in three steps: at the first step, the model of minimal dimension insensitive to the disturbance and estimating the variable y∗ (t) is designed, then a possibility to estimate the variable yv (t) is checked; finally, the matrix K∗ ensuring stability is found. To implement the first step, introduce the matrices Φ and R∗ such that xv (t) = Φx(t), y∗ (t) = R∗ y(t). It is known [17–19] that matrices describing the model satisfy the conditions R∗ H = H∗ Φ, ΦF = F∗ Φ + J∗ H0 , ΦG = G∗ , ΦL = 0.
(4)
The additional condition appears due to the equation yv (t) = Hv x(t) = H∗v xv (t) + Qy(t); since xv (t) = Φx(t) and y(t) = Hx(t), it follows Hv = H∗v Φ + QH .
(5)
Rewrite (5) in the form
Hv = (H∗v
Φ Q) H
(6)
that is equivalent to the condition rank
Φ H
⎞ Φ = rank ⎝ H ⎠. Hv ⎛
(7)
Previously, it is necessary to check a possibility to design the model insensitive to the disturbance. Introduce the matrix L0 of maximal rank such that L0 L = 0. Since the condition of invariance is of the form ΦL = 0, then Φ = NL0 for some matrix N . Replace the matrix Φ in the equation R∗ H = H∗ Φ by NL0 : R∗ H = H∗ NL0 , or H = 0. (R∗ − H∗ N ) L0
58
A. Zhirabok et al.
Clearly, the last relation is equivalent to the condition H < rank(H ) + rank(L0 ). rank L0
(8)
Similar replacement in ΦF = F∗ Φ + J∗ H0 and Hv = H∗v Φ + QH gives NL0 F = F∗ NL0 + J∗ H0 and Hv = H∗v NL∗ + QH which are equivalent to the conditions ⎛
⎞ L0 F L0 rank ⎝ H0 ⎠ < rank(L0 F) + , H0 L0 ⎛ ⎞ L0 L0 rank = rank ⎝ H ⎠, H Hv
(9)
respectively. If (8) or (9) is not satisfied, the model invariant with respect to the disturbance does not exist, and one has to find robust solution. Assume that both (8) and (9) are satisfied and construct the model. The matrices F∗ and H∗ are sought in the canonical form ⎛
⎞ 0 1 0 ... 0 ⎜ 0 0 1 ... 0 ⎟ ⎟ F∗ = ⎜ ⎝ ... ... ... ... ⎠, H∗ = ( 1 0 0 ... 0 ). 0 0 0 ... 0
(10)
Clearly, this is always possible if (F∗ , H∗ ) is observable. If (F∗ , H∗ ) is unobservable, system (2) can be transformed into observable canonical form [10] and then the matrices describing the observable part of this form can be presented in the form (10) of less dimension. Using these matrices, one obtains from (4) equations for rows of the matrices Φ and J∗ : Φ1 = R∗ H , Φi F = Φi+1 + J∗i H0 , i = 1, ..., k − 1, Φk F = J∗k H0 ,
(11)
where Φi and J∗i are i-th rows of the matrices Φ and J∗ , i = 1, ..., k. As is shown in [17], Eqs. (11) can be transformed into the single equation ( R∗ −J∗1 ... −J∗k )W (k) = 0, where ⎞ HF k ⎜ H0 F k−1 ⎟ ⎟ =⎜ ⎝ ... ⎠. ⎛
W (k)
H0
(12)
Robust Virtual Sensors Design for Linear Systems
59
The condition ΦL = 0 of insensitivity to the disturbance can be taken into account in the form [17] ( R∗ −J∗1 ... −J∗k )L(k) = 0, where ⎛
L(k)
⎞ HL HFL ... HF k−1 L ⎜ 0 H0 L ... H0 F k−2 L ⎟ ⎟. =⎜ ⎝ ... ... ... ⎠ ... 0 0 ... 0
The last equation and (12) result in single equation ( R∗ −J∗1 ... −J∗k )(W (k) L(k) ) = 0.
(13)
Equation (13) has a nontrivial solution if rank(V (k) L(k) ) < l + (l + 1)k.
(14)
To construct the model, find from (14) the minimal dimension k and the row ( R∗ −J∗1 . . . J∗k ) satisfying (13). Then calculate the rows of the matrix Φ based on (11). At the second step, a possibility to estimate the variable yv (t) is checked based on the condition (7). If it is satisfied, the variable yv (t) can be estimated by the model. Otherwise, one finds another solution of (13) with former or incremented dimension k. Assuming that (7) is satisfied for some k, we find the matrices H∗v and Q from (6) and set G∗ := ΦG. As a result, the model of minimal dimension insensitive to the disturbance and estimating the variable yv (t) has been designed. 2.3 Observer Design To transform the model into observer, introduce the estimation error ev (t) = Φx(t)−xv (t) and write down the equation for ev (t) taking into account relations (4): e˙ v (t) = ΦFx(t) + ΦGu(t) − (F∗ xv (t) + G∗ u(t) + J∗ y0 (t) + K∗ r(t)) = (ΦF − J∗ H0 )x(t) − F∗ xv (t) − K∗ (R∗ y(t) − y∗ (t)) = F∗ Φx(t) − F∗ xv (t) − K∗ (R∗ Hx(t) − H∗ xv (t)) = F∗ ev (t) − K∗ (H∗ Φx(t) − H∗ xv (t)) = (F∗ − K∗ H∗ )ev (t). It follows from (10) that the pair (F∗ , H∗ ) is observable, therefore the matrix K∗ exists such that F∗ − K∗ H∗ is stable matrix. Set K∗ = (K1 K2 ... Kk )T , then ⎛
−K1 ⎜ −K2 F∗ − K∗ H∗ = ⎜ ⎝ ... −Kk
⎞ 1 0 ... 0 0 1 ... 0 ⎟ ⎟. ... ... ... ⎠ 0 0 ... 0
60
A. Zhirabok et al.
It is known that if λ1 , λ2 , …, λk are desirable eigenvalues of the observer, then K1 = −(λ1 + ... + λk ), K2 = λ1 λ2 + ... + λk−1 λk , ... Kk = (−1)k λ1 ...λk .
2.4 JCF-Based Model Design In this case, the matrix F∗ is of the form ⎛ λ1 ⎜0 F∗ = ⎜ ⎝ ... 0
0 λ2 ... 0
⎞ 0 ... 0 0 ... 0 ⎟ ⎟. ... ... ⎠ 0 ... λk
(15)
The equation ΦF = F∗ Φ +J∗ H0 is presented in the form of k independent equations: Φi F = λi Φi + J∗i H0 , i = 1, 2, ..., k,
(16)
where Φi and J∗i are i-th rows of the matrices Φ and J∗ , respectively. Additional condition Φi L = 0 (insensitivity to the disturbance) can be taken into account as follows. As is shown above, Φ = NL0 for some matrix N . As a result, (16) can be rewritten as L (F − λi In ) = 0, i = 1, 2, ..., k, (17) (Ni − J∗i ) 0 H0 where In is the identical n × n-matrix. This equation has a solution if and only if L (F − λIk ) < rank(L0 (F − λIn )) + rank(H0 ). rank 0 (18) H0 When (18) is not satisfied for all λ < 0, the model insensitive to the disturbance does not exist, and one has to use the robust methods. Values λi < 0 and rows Φi = Ni L0 in (17) must be chosen in such a way that the matrix Φ with minimal number of rows satisfies the condition (7), then the matrices H∗z and Q are found from (6). Finally, the matrix G∗ = ΦG is calculated. Remark 1. If (17) has no solutions, the model insensitive to the disturbance cannot be constructed; in this case one has to use robust method described below.
Robust Virtual Sensors Design for Linear Systems
61
3 Robust Model Design 3.1 ICF-Based Model If for ICF-based model the condition (14) is not satisfied for all k < n, the model insensitive to the disturbance does not exist. In this case, the robust model can be designed as follows. Sensitivity of the model (2) to the disturbance ρ(t) is estimated
by the norm ΦLF of the matrix ΦL; it can be presented as ( R∗ −J∗1 ... −J∗k )L(k) F . To minimize the con
tribution of the disturbance, one has to minimize the norm ( R∗ −J∗1 ... −J∗k )L(k) F under the condition (12). To solve this problem, find minimal dimension k for which the Eq. (12) has several linearly independent solutions of the form ( R∗ −J∗1 ... −J∗k ). Represent the set of such solutions in the form ⎞ ⎛ (1) (1) (1) R∗ −J∗1 ... −J∗k ⎟ ⎜ (19) B=⎝ ... ⎠, (n∗ ) (n∗ ) (n∗ ) R∗ −J∗1 ... −J∗k n∗ is the number of all solutions for some k. It can be shown that an arbitrary linear combination of the rows of matrix B with the vector of weight coefficients w = (w1 , ..., wn∗ ) yields some solution as
well. The problem is to find the vector w such that w = 1
and the norm wBL(k) F is minimal. The constraint w = 1 is used to avoid the trivial solution w = 0. To solve this problem, find the singular value decomposition of the matrix BL(k) BL(k) = UL ΣL VL ,
(20)
where UL and VL are orthogonal matrices, ΣL = (diag(σ1 , ..., σc ) 0), or ΣL =
diag(σ1 , ..., σc ) 0
depending on the numbers of rows and columns of the matrix BL(k) , c = min(n∗ , kp), 0 ≤ σ1 ≤ ... ≤ σc are singular values of the matrix BL(k) ordered by magnitude [11]. Choose the first transposed column of the matrix UL as a vector of the weight coefficients w = (w1 , ..., wn∗ ). It follows from singular value decomposition and properties of orthogonal matrices that norm of the matrix wBL(k) is equal to the minimal singular value σ1 . Theorem 1. The vector w = (w1 , ..., wn∗ ) yields the optimal solution with minimal norm of the vector (R∗ −J∗1 ... −J∗k ) L(k) .
62
A. Zhirabok et al.
Proof. It follows immediately from the choice of the vector w = (w1 , ..., wn∗ ) and properties of singular value decomposition [11]. Note that if σ1 = 0, then wBL(k) = 0. This means that the linear combination of solutions represented by rows of the matrix B with the vector of weight coefficients w = (w1 , ..., wn∗ ) yields the solution insensitive to the disturbance. If yields the solution with minimal value of the norm σ1 = 0, such a linear combination
(R∗ −J∗1 ... −J∗k ) L(k) . F Then one finds the row (R∗ −J∗1 ... −J∗k ) = wW and the matrix Φ from (11). At the second step, a possibility to estimate the variable yv (t) is checked based on the condition (7). If it is satisfied, the variable yv (t) can be estimated by the model. Otherwise, one finds another vector w related to singular value greater than σ1 . Assuming that (7) is satisfied for some k, one finds the matrices H∗v and Q form (6) and set G∗ := ΦG and L∗ := ΦL. As a result, the model with minimal sensitivity to the disturbance estimating the variable yv (t) has been designed: x˙ v (t) = F∗ xv (t) + G∗ u(t) + J∗ y0 (t) + L∗ ρ(t), yv (t) = H∗v xv (t) + Qy(t), y∗ (t) = H∗ x∗ (t). Finally, this model can be transformed into observer by feedback K∗ r(t). 3.2 JCF-Based Model For JCF-based model, (17) is simplified as follows F − λi In = 0, i = 1, 2, ..., k. (Φi − J∗i ) H0
(21)
For i = 1, 2, ..., k one chooses λi < 0 for which (21) has a solution and finds all (1) (n ) solutions in the from Φi , ..., Φi i . Represent the set of such rows in the form ⎞ ⎛ (1) Φi ⎟ ⎜ (22) Φ∗i = ⎝ ... ⎠. (ni ) Φi Then one finds the singular value decomposition of the matrix Φ∗i L = UL∗ ΣL∗ VL∗ , chooses the first transposed column of the matrix UL∗ as a vector of the weight coefficients (1) w = (w1 , ..., wni ) and calculates the row Φi = wΦ∗i ; if ni = 1, set Φi := Φi . The rows Φ1 , ..., Φk are combined into the matrix ⎛ ⎞ Φ1 Φ = ⎝ ... ⎠, Φk and a possibility to estimate the variable yv (t) is checked based on the condition (7). If it is satisfied, the variable yv (t) can be estimated. Otherwise, for some λi < 0 another
Robust Virtual Sensors Design for Linear Systems
63
vector w = (w1 , ..., wni ) for singular value σ > σ1 is found and new matrix Φi is calculated. Assuming that (7) is satisfied for some k, one finds the matrices H∗v and Q form (6) and set G∗ := ΦG and L∗ := ΦL. As a result, the model with minimal sensitivity to the disturbance estimating the variable yv (t) has been designed: x˙ v (t) = F∗ xv (t) + G∗ u(t) + J∗ y0 (t) + L∗ ρ(t), yv (t) = H∗v xv (t) + Qy(t). Unlike the JCF-approach, the obtained model is the observer itself.
4 Example Consider control system x˙ 1 x˙ 2 x˙ 3 y1
= u1 /ϑ1 − a1 (x1 − x2 ), = u2 /ϑ2 + a1 (x1 − x2 ) − a2 (x2 − x3 ) + ρ1 , (23) = a2 (x2 − x3 ) − a3 (x3 − ϑ) + ρ2 , = x1 , y2 = x2 , √ √ √ where a1 = ϑ4 2ϑ7 /ϑ1 , a2 = ϑ5 2ϑ7 /ϑ2 , and a3 = ϑ6 2ϑ7 /ϑ3 . The Eqs. (23) constitute a linearized model of the well-known example of three tank system (Fig. 1). The system consists of three consecutively united tanks with areas of the cross-section ϑ1 , ϑ2 , and ϑ3 . The tanks are linked by pipes with areas of the cross-section ϑ4 and ϑ5 . The liquid flows into the first and the second tanks and follows from the third one through the pipe with area of the cross-section ϑ6 located at height ϑ; ϑ7 is the gravitational constant. The levels of liquid in the tanks are x1 , x2 , and x3 , respectively. Assume for simplicity that L = 0 and ϑ1 = ... = ϑ7 = 1, ϑ = 0, then a1 = a2 = a3 = 1. The system is described by matrices ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ −1 1 0 10 00 0 1 0 F = ⎝ 1 −2 1 ⎠, G = ⎝ 0 1 ⎠, H = , L = ⎝ 1 0 ⎠. 001 0 1 −2 00 01 The problem is to estimate the variable yv (t) = x1 (t). To solve the problem, we use JCF-approach. Since L0 = 1 0 0 , the Eq. (17) takes the form ⎛
⎞ −1 − λ 1 0 ⎜ 0 1 0⎟ ⎟ = 0. (Ni − J∗i )⎜ ⎝ 0 0 1⎠ 1 00 We set λ := −1 and obtain k = 1, N = J∗ = (1 0), that yields Φ = ( 1 0 0 ), G∗ = ( 1 0 ), L∗ = 0. The model (3) takes the form x˙ v (t) = u1 (t) − xv (t) + y1 (t), yv (t) = xv (t).
(24)
64
A. Zhirabok et al.
Fig. 1. Three-tank system
Clearly, this model is stable, and it can be used for the variable x1 (t) = yv (t) estimation without additional feedback. Note that in [20] similar estimation has been obtained with the ICF-based model with ρ1 (t) = 0 in (23) and has the dimension 2. It can be shown also that if ρ1 (t) = 0, the ICF-based model gives only approximate solution with L0 = 0. For simulation, consider the system (23) and the model (24); one takes u1 (t) = 1 since t = 1 and u2 (t) = 0.5 since t = 5; ρ1 (t) = −0.3 since t = 6 and ρ2 (t) = −0.4 since t = 10. Simulation results are shown in Fig. 2 for model (24).
Fig. 2. The variable x1 (t) and its estimation based on (24).
5 Conclusion In this paper, the problem of virtual sensor design has been studied for systems described by linear dynamic models under the disturbance. The suggested approach allows to obtain virtual sensor of minimal dimension based on both identification and Jordan canonical forms. The virtual sensors are designed to be insensitive or minimal sensitive to the disturbance.
Robust Virtual Sensors Design for Linear Systems
65
Acknowledgment. This paper was supported by Russian Scientific Foundation, project 22-2901303.
References 1. Ahmed, Q., Bhatti, A., Iqbal, M.: Virtual sensors for automotive engine sensors fault diagnosis in second-order sliding modes. IEEE Sens. J. 11(9), 1832–1840 (2011) 2. Albertos, P., Goodwin, G.: Virtual servers for control application. Annu. Rev. Control. 26, 101–112 (2002) 3. Berkhoff, A., Hekman, T.: Active noise control using finite element-based virtual sensors. In: Proceedings IEEE International Conference Acoustics, Speech and Signal Processing, Brighton, UK (2019) 4. Blanke, M., Kinnaert, M., Lunze, J., Staroswiecki, M.: Diagnosis and Fault-Tolerant Control. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35653-0 5. Galavizh, A., Hassanabadi, A.: Designing fuzzy fault tolerant controller for a DC microgrid based on virtual sensor. In: Proceedings 7th International Conference Control, Instrumentation and Automation, Tabriz, Iran (2021) 6. Hashlamon, I., Erbatur, K.: Joint sensor fault detection and recovery based on virtual sensor for walking legged robots. In: Proceedings IEEE 23rd International Symposium Industrial Electronics, Istanbul, Turkey, pp. 1210–1204 (2014) 7. Heredia, G., Ollero, A.: Virtual sensor for failure detection, identification and recovery in the transition phase of a morphing aircraft. Sensors 10, 2188–2201 (2010) 8. Hosseinpoor, Z., Arefi, M., Razavi-Far, R., Mozafari, N., Hazbavi, S.: Virtual sensors for fault diagnosis: a case of induction motor broken rotor bar. IEEE Sens. J. 21(4), 5044–5051 (2021) 9. Jove, E., Casteleiro-Roca, J., Quntian, H., Mendez-Perez, J., Calvo-Rolle, J.: Virtual sensor for fault detection, isolation and data recovery for bicomponent mixing machine monitoring. Informatica 30(4), 671–687 (2019) 10. Kwakernaak, H., Sivan, R.: Linear Optimal Control Systems. Wiley, Hoboken (1972) 11. Low, X., Willsky, A., Verghese, G.: Optimally robust redundancy relations for failure detection in uncertain systems. Automatica 22, 333–344 (1996) 12. Luzar, M., Witczak, M.: Fault-tolerant control and diagnosis for LPV system with Hinfinity virtual sensor. In: Proceedings 3rd Conference Control and Fault-Tolerant Systems, Barcelona, Spain, pp. 825–830 (2016) 13. Roy, C., Roy, A., Misra, S.: DIVISOR: dynamic virtual sensor formation for overlapping region in IoT-based sensor-cloud. In: Proceedings 2018 IEEE Wireless Communications and Networking Conference, Barcelona, Spain (2018) 14. Trevathan, J., Read, W., Sattar, A., Schmidtke, S., Sharp, T.: The virtual sensor concept. In: Proceedings 2020 IEEE SENSORS, Rotterdam, Netherlands (2020) 15. Wang, Y., Rotondo, D., Puig, V., Cembrano, G.: Fault-tolerant control based on virtual actuator and sensor for discrete-time descriptor systems. IEEE Trans. Circuits Syst. 67(12), 5316–5325 (2020) 16. Witczak, M.: Fault Diagnosis and Fault Tolerant Control Strategies for Nonlinear Systems. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-03014-2 17. Zhirabok, A., Shumsky, A., Solyanik, S., Suvorov, A.: Fault detection in nonlinear systems via linear methods. Int. J. Appl. Math. Comput. Sci. 27, 261–272 (2017) 18. Zhirabok, A., Zuev, A., Shumsky, A.: Sensor fault identification in mechatronic systems described by linear and nonlinear models. In: Proceedings 29th IEEE International Symposium on Industrial Electronics, Delft, The Netherlands, pp. 1071–1076 (2020)
66
A. Zhirabok et al.
19. Zhirabok, A., Zuev, A., Shumsky, A.: Sensor fault identification in nonlinear dynamic systems. IFAC Papers On Line 53–2, 750–755 (2020) 20. Zhirabok, A., Ir, K.C.: Virtual sensors for the functional diagnosis of nonlinear systems. J. Comput. Syst. Sci. Int. 61, 67–75 (2022). https://doi.org/10.1134/S1064230722010130
Clustering Analysis in the Student Academic Activities on COVID-19 Pandemic in Mexico G. Miranda-Piña1(B) , R. Alejo1 , E. Rendón1 , E. E. Granda-Gutíerrez2 R. M. Valdovinos3 , and F. del Razo-López1
,
1 Division of Postgraduate Studies and Research, National Institute of Technology of Mexico,
(TecNM) Campus Toluca, Av. Tecnológico s/n, Agrícola Bellavista, 52149 Metepec, México [email protected] 2 UAEM University Center at Atlacomulco, Universidad Autónoma del Estado de México, Carretera Toluca-Atlacomulco Km. 60, 50450 Atlacomulco, México 3 Faculty of Engineering, Universidad Autónoma del Estado de México, Cerro de Coatepec s/n, Ciudad Universitaria, 50100 Toluca, México
Abstract. The pandemic caused by the COVID-19 disease has affected all aspects of the life of the people in every region of the world. The academic activities at universities in Mexico have been particularly disturbed by two years of confinement; all activities were migrated to an online modality where improvised actions and prolonged isolation have implied a significant threat to the educational institutions. Amid this pandemic, some opportunities to use Artificial Intelligence tools for understanding the associated phenomena have been raised. In this sense, we use the K-means algorithm, a well-known unsupervised machine learning technique, to analyze the data obtained from questionaries applied to students in a Mexican university to understand their perception of how the confinement and online academic activities have affected their lives and their learning. Results indicate that the K-means algorithm has better results when the number of groups is bigger, leading to a lower error in the model. Also, the analysis helps to make evident that the lack of adequate computing equipment, internet connectivity, and suitable study spaces impact the quality of the education that students receive, causing other problems, including communication troubles with teachers and classmates, unproductive classes, and even accentuate psychological issues such as anxiety and depression. Keywords: COVID-19 · K-means · Machine learning · Student academic activities
1 Introduction Coronavirus disease (COVID-19) is an infectious respiratory illness caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The pandemic caused by This work has been partially supported under grants of project 11437.21-P from TecNM and 6364/2021SF from UAEMex. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 67–79, 2022. https://doi.org/10.1007/978-3-031-13832-4_6
68
G. Miranda-Piña et al.
COVID-19 is a problem the world faces due to its high transmission and case fatality rate. COVID-19 was declared a public health emergency by the World Health Organization (WHO) just two months after the first outbreak was recorded in Wuhan, China at the end of 2019 [19]. The virus is transmitted from one person to another attacking the respiratory system of the host, exhibiting fever, muscle pain, dry cough, and shortness of breath (in severe cases). The symptoms appear in approximately 10–14 days, but although the carriers of the virus do not exhibit any symptoms, they can spread the virus without knowing (asymptomatic subjects) and its spread is quite accelerated [5, 20]. At the origin of the pandemic, there was no antiviral agent to treat the infection, or a vaccine [4], so prevention measures were the best strategy against COVID-19 [12]. Then, some measures included mandatory confinement in cities, temperature scanning to identify people with symptoms associated with this pathology, suspension of public transport, travel restrictions, and border closures, among others [2, 6, 8, 16, 18]. In the international context, the timely response of the government and public and private organizations was key to mitigating the possible risk of contagion. In the case of Mexico, on March 20, the General Health Council declared a health emergency derived from a rapid spread of positive cases. The initial strategy included voluntary and mandatory lockdown and considered the suspension of non-essential activities until April 30, 2020, because the magnitude of the pandemic was not yet known; however, as the pandemic evolved, the lockdown period lasted for two years. Although the pandemic has impacted almost all aspects of our daily lives, from social, financial, labor, and health, academic activities were especially affected. To mitigate the COVID-19 propagation in students, all universities in Mexico closed their facilities in March 2020, and face-to-face classes migrated to the online modality. An unprecedented disruption in the history of higher education in the country occurred, and the short, medium and long-term impacts are yet difficult to quantify. The academic life of students and teachers has been affected in different ways; they were forced to enter an unplanned dynamic of online activities, which affected their daily lives and the continuity of their learning and mobility. The latter resulted in undesired changes in the behavior of the students. A lot of research on the behavior of COVID-19, from a medical point of view, has been performed [17], but also the impact of the pandemic on student behavior from the social and academic points of view is essential. Researchers from different fields of science (epidemiologists, health doctors, biochemists, etc.) are trying to learn more about the COVID-19, as well as its impact on the daily lives of citizens; then, there are currently many scientific papers that try to discover the behavior and evolution of patients [4]. Also, scientists have used artificial intelligence (AI), Big Data, and machine learning (ML) techniques to find trends in the evolution of COVID-19 [12], considering the large amount of data generated in hospitals and medical centers. These data are of different natures, such as demographics, previous illnesses, eating habits, and type of activity, among others [10]. AI techniques provide valuable insights and solutions for real-world problems based on data that otherwise are incomprehensible to humans. Clustering is a powerful machine learning approach widely used to analyze data. This is considered an unsupervised machine learning technique that uses unlabeled data to identify different trends in the data by creating groups or categories, even to allow to know what data features are
Clustering Analysis in the Student Academic Activities
69
of significant importance in the dataset. In this sense, we present a machine learning study based on a clustering approach to analyze the behavior of a group of students at a Mexican university with the aim of understanding how the COVID-19 pandemic has affected their academic life.
2 Related Work Relevant literature based on the topic of inquiry is presented in this section, including statistics, machine learning, and clustering. Campedelli et al. [3] present a study about the temporal distribution of civil disorders as a result of the coronavirus pandemic in 2020. Specifically, the temporal clustering and self-excitability of events were discussed. The authors proposed K-means clustering and the Hawkes process over data about three countries (Mexico, India, and Israel) to perform this study. The main result was the temporal clustering of pandemic-related demonstrations as a common feature in the studied countries. Sengupta et al. [25] propose K-means clustering to find the similarities between the two most affected districts in India by exploring two features: population density and specialty hospitals. As a result, similar groups of the analyzed districts could be ranked based on the burden placed on the healthcare system in terms of the number of confirmed cases, population density, and the number of hospitals dedicated to specialized COVID-19 treatment. Li et al. [14] explore the use of clustering (K-means) and classification techniques (decision tree) to find characteristics of patients infected with COVID-19. Two hundred twenty-two patients from Wuhan city were studied, and two groups were formed (common type and high-risk type). Alanezi et al. [1] propose a Twitter sentiment analysis to determine the impact of social distance on people during the COVID-19 pandemic, making a comparison between the K-means clustering and Mini-Batch K-means clustering approaches. Two datasets (English and Arabic) were used. Results showed that based on the word frequency, the people of Italy and India were more optimistic than people from other countries during the pandemic. Yang, Q. et al. [26] employ a decision tree model to predict death casualties in severe patients using training data with two classes (395 survivors and 57 non-survivor). Demographic, clinical, and laboratory features were used. As a result, the decision tree found that male COVID-19 patients were more prone to experience severe illness and death. Clinical characteristics and laboratory examinations were significantly different between severe and non-severe groups and between survivors and non-survivors. From the foregoing, it can be inferred that clustering methods are widely used as a basic tool to explain differences between groups in a population or data set, including the study of the impact of COVID-19 in various fields.
3 Cluster Model Clustering is an unsupervised learning technique whose goal is to find or discover groups or partitions in datasets or object collections [22]. These partitions are usually called
70
G. Miranda-Piña et al.
groups so that the objects that belong to the same group are similar to each other and dissimilar to the objects of the other groups [9]. Clustering is one of the essential tasks in data mining, and analysis [24]; it has been widely used in anomaly detection and identification of outstanding features in datasets in different areas of knowledge such as biology, anthropology, materials science, medicine, statistics, and mathematics, to name a few [7, 23]. A wide variety of clustering methods have been developed since their beginnings in the 1950s [13, 15], which have been divided into two groups: partitioning and hierarchical. Despite its age, the K-means method is one of the most widely used partitioning algorithms, and today it is the de facto standard for exploring and classifying unknown datasets [11]. 3.1 K-means Algorithm Most of the partition algorithms are based on the optimization of a criterion function [22]; generally, this function is represented by E (Eq. 1) for K-means, and its value depends on the partitions or groups (Ci ) in the dataset X. E=
K i=1
xCi
x − mi 2 ,
(1)
where E is the sum of the squared error of all the objects in the dataset X, and the centers (or means) mi (Eq. 2) of the group Ci , x is a point in the space that represents a given object in a multidimensional space [9], and mi =
1 Ci Xj. j Ci
(2)
The operation of the K-means algorithm starts by selecting or calculating k centers or initial means m0i ; depending on the selection criteria, commonly k objects are randomly taken from X. Next, each object xj X is assigned to its closest center mi . Subsequently, new centers or means mi (Eq. 2) are calculated until the algorithm converges to a minimum value of E (Eq. 1) or up to a maximum of repetitions Q, which are established at the beginning of the procedure, i.e., this process is repeated until E (q) − E (q−1) < or q = Q, where q = {1, 2, . . . , Q} and corresponds to the repetition number at that moment. This process is explained in detail in Algorithm 1.
Clustering Analysis in the Student Academic Activities
71
Algoritmo 1. Pseudo-code of the K -means algorithm Input: Dataset X, number of groups ( ); Output: Obtained groups { 1 , 2 , … , }; 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Set the convergence criterion: Minumum error (∆ = 0.0001) and máximum number of repetitions (Q = 5000); Randomly assign k objects of X as centers or initial means ( ); = 1; repeat =1 ∗ ← ( , ); // ∈ . // ∗ is the subset of nearest neighbors to ∗ ← ; // Assign the objects ∗ to their nearest center end for =1 = || ||; // Calculate the new centers 1 = ∑ ; // ∈ . end for q++;
14. until
3.2 Linkage Rules Several methods for evaluating the dissimilarity between clusters exist, which are identified as Linkage Rules (LR) [22]. LR allow to evaluate the cluster performance. In this work, we use the distance between samples xj of same group Ci in its center mi , to evaluate the intra-group-error (Eq. 3): intraEi =
Ci j
dist xj , mi ;
(3)
the inter-group-error, we use the distance between different centers To assess dist mj , mi , where i = j) as it is presented in Eq. 4. interEi,j = dist mi , mj (4)
4 Experimental Set up In this section, the experimental details to allow the proper replication of the results of this research and to support the conclusions, are described. 4.1 Dataset The dataset used for this work contains 1282 instances collected through questionnaires applied to students at the National Institute of Technology of Mexico, Campus Toluca. It is worth mentioning that to maintain the private data and security of each student; questionnaires were anonymous. All personal data were also omitted: date of birth, identification number, address of their permanent residence, and other private information elements.
72
G. Miranda-Piña et al.
After applying the questionnaire, the answers in a text format were transformed into a numerical form, so all data have the same data type to facilitate the work. This was done by assigning arbitrary numbers to each response; for example, if the answer is closed, 0 was given to identify a “no” and 1 for a “yes,” while 2 was used for “maybe” 0 “more than 1”. In the case of the sex section, 1 was set for women and 0 for men. The number label was assigned according to the total number of possible answers, starting from 0; if the question got four different answers, the labels should be 0, 1, 2, 3. At the end of applying the labels to the data, 43 attributes were obtained, which will be used to apply the K-means algorithm. Likewise, to obtain more details about the information considered for the task and to be able to determine which elements influenced the taking of virtual classes from March 2020 to July 2021, the data collected was organized into seven categories, establishing a label for each attribute and its description as shown in Table 1. Table 1. Descriptive table of the 43 attributes with their respective labels organized by category. Category
Attribute label Attribute description
Academic
A0
School grade
A3
Current semester
A4
Career
A5
Number of subjects currently taken
A6
Number of subjects taken last semester
A7
Number of subjects dropped in the current semester
A8
Did you suspend a semester between March 2020 and July 2021?
A10
Number of class hours dedicated per week
A1
Sex
Personal
Perception
A2
Age
A14
Do you receive an external economic support to family income?
A15
Do you have a job?
A16
Civil status
A17
Number of children
A9
Internet service satisfaction level
A11
Do you think preventive confinement affected in your learning quality?
A12
Do you think preventive confinement affected communication with your teachers and classmates? (continued)
Clustering Analysis in the Student Academic Activities
73
Table 1. (continued) Category
Health
Attribute label Attribute description A13
Do you think the level of knowledge acquired during the online classes was deficient?
A18
Do you suffer from any degenerative or chronic disease?
A19
Do you suffer from any disability?
A20
Do you suffer or have you suffered from COVID-19?
A21
Does anyone in your family have or has had COVID-19?
A22
Have you lost someone in your family due to COVID-19?
Tools for online classes A23
Common problems
Psychological effects
Computer
A24
Internet service
A25
Light
A26
Video camera
A27
Microphone
A28
Photocopier
A29
Smartphone
A30
Absence of the teacher
A31
Too much homework
A32
Lack of dynamism in the classes
A33
Lack of study spaces
A34
Little opening of subjects
A35
Technical difficulties
A36
Lack of organization of the teacher
A37
Schedule overlap between school and personal activities
A38
Acquired knowledge is deficient
A39
Lack of proper computer equipment
A40
Stress
A41
Anxiety
A42
Depression
4.2 K-means Models Performance To test the K-means model, we use the g-mean metric of intra-clusters distance (Eq. 5); it allows to have a simple measure to test the K-means performance in all clusters obtained
74
G. Miranda-Piña et al.
identified by it, i.e., in the only value we can see the overall performance of the K-means algorithm. g-mean is characterized by being sensitive to a local performance; in other words, whether some value used by this metric is low or high, they are reflected in the final value. In this work, the g-mean is computed using distances between samples of the same group, relative to its mean or centroid, as is presented in the following equation: (5) g − mean = k intraE1 ∗ intraE2 ∗ intraE3 ∗, . . . , intraEk where k is the total number of clusters and each intra-group-error by cluster is represented as intraEk (Eq. 3). It is worth mentioning that low values of the g-mean imply a better K-means model performance than high values.
5 Results and Discussion In this section, the main results obtained in the experimentation with K-means algorithm are presented. Figure 1 shows the g-mean values obtained after testing with different numbers of clusters, where big red points represent the low g-mean values for k clusters. To measure the performance of each cluster, we employ Eq. 3, which asses the distance between samples of the same group and its center; then we use the g-mean metric (Eq. 5) to obtain a single error value, to determine how effective is the algorithm with k different values. We choose the g-mean metric because it is sensitive to the classifier’s performance on each cluster; for example, if one of the clusters has a high error, it is reflected in the g-mean value. Thus, it ensures that the lowest g-mean error corresponds with the best K-means performance on all clusters. In accordance with Fig. 1, the clusters with the lowest errors are 6, 10, 17, 27, and 32. However, to simplify the analysis in the resultant graphics (see Figs. 2, 3 and 4 we only studied the first three clusters: 6, 10, and 17. Plotting every cluster implies very saturated background images, which could avoid focusing on the data analyses. Finally, experimental results presented in Fig. 1 exhibit that a larger number of clusters implies a decrease in the g-mean value, which may indicate that the clustering algorithm is working better. Figures 2, 3 and 4 exhibit the behavior of the 43 attributes on each centroid (which correspond to an individual cluster), and each line in the graphics represents the value of each attribute in its corresponding centroid or cluster. To analyze these figures, we focus on attributes where the lines show different behavior; for example, in Fig. 2, attribute 7 presents very similar values in all centroids, whereas in attribute 33 exists very different values in each centroid. In other words, the comportment of this group (or cluster) of people is different. Thus, it allows the discrimination of the attributes that describe the behavior of distinct groups or clusters. In addition, it is observed that there are lines that stand out more because it was considered to highlight the clusters with a more marked difference from others; for example, in the six test centers, the cluster “c2” and “c6” are more different from the rest, while in the ten test centers the cluster “c2”, “c7” and “c9” make the difference. Finally, the test of the seventeen centers shows that the most different clusters are “c3”,
Clustering Analysis in the Student Academic Activities
Fig. 1. g-mean (y axis) of intraE (Eq. 3) for different values of k (x axis).
Fig. 2. Results obtained by the K-means model using six centers
75
76
G. Miranda-Piña et al.
“c7”, “c13”, and “c17”. A similar analysis is commonly performed in others works, but using other concepts like Parallel Coordinates [21].
Fig. 3. Results obtained by the K-means model using ten centers
In Figs. 2, 3 and 4 is observed that some attributes such as A1, A11, A12, A13, A21, A22, A23, A26, A31, A32, A33, A34, A37, A38, A39, A41, and A42 are the ones that show the greatest difference between clusters. Taking into account the Table 1 and the previously highlighted attributes, the categories involved are “personal”, “perception”, “health”, “tools for online classes”, “common problems” and “psychological effects”. However, due to the number of attributes included in each category, some categories stand out more than others, such as “perception”, “common problems”, and “psychological effects”. These categories show which aspects were the most relevant for the students who took classes virtually from March 2020 to July 2021. In other words, the results show that the students were affected by the common problems that emerged while taking virtual classes, perceiving a poor quality in their learning and little communication between teachers and classmates, the same consequences that brought psychological effects such as anxiety and depression.
Clustering Analysis in the Student Academic Activities
77
Fig. 4. Results obtained by the K-means model using seventeen centers
6 Conclusion In this paper, we present an analysis of how preventive confinement affected the academic life of students at the National Technological Institute of Mexico, Campus Toluca, applying a clustering technique with the K-means algorithm and evaluating the performance with the g-mean metric. Experimental results expose that the test of seventeen clusters has a lower value of error using g-mean; thus, this result indicates that when a greater number of groups exist, the performance of the algorithm is better. From a qualitative viewpoint, the results shown in the figures discussed in Sect. 5 evidenced the affectations that the students perceived during the preventive confinement: they seeming low quality of the educative system, and their level of knowledge was poor, the communication with their teachers and classmates was minimal, so reaching an agreement was a difficult task for those involved. Also, the shortage of adequate computing and communications equipment and the lack of comfortable study spaces generated widespread difficulties in performing an effective online environment. These factors could cause psychological effects in the students, such as anxiety and depression, generating more problems in their academic life. In this work, the K-means clustering method has been used as a tool to evaluate university students’ perceptions, and the quantitative results exhibit a good agreement with the qualitative insight of the students. In future work, more profound investigations could be addressed to evaluate other clustering techniques, considering the results of this study as a reference, based on the universal validity of this clustering method, which is widely recognized in the literature as the benchmark algorithm.
78
G. Miranda-Piña et al.
References 1. ArunKum Alanezi, M.A., Hewahi, N.M.: Tweets sentiment analysis during covid-19 pandemic. In: 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), pp. 1–6 (2020). https://doi.org/10.1109/ICDABI 51230.2020.9325679 2. Boldog, P., Tekeli, T., Vizi, Z., Dénes, A., Bartha, F.A., Röst, G.: Risk assessment of novel coronavirus covid-19 outbreaks outside china. J. Clin. Med. 9(2), 571 (2020). https://doi.org/ 10.3390/jcm9020571 3. Campedelli, G.M., D’Orsogna, M.R.: Temporal clustering of disorder events during the COVID-19 pandemic. PLoS ONE 16(4), e0250433 (2021). https://doi.org/10.1371/journal. pone.0250433 4. Carracedo, S., Palmero, A., Neil, M., Hasan-Granier, A., Saenz, C., Reveiz, L.: The landscape of covid-19 clinical trials in Latin America and the caribbean: assessment and challenges. Rev. Panam. Salud Publica 44, e177 (2020). https://doi.org/10.26633/RPSP.2020.177 5. Chen, N., et al.: Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study. The Lancet 395(10223), 507–513 (2020). https://doi.org/10.1016/S0140-6736(20)30211-7 6. Cheng, V.C.C., Wong, S.-C., To, K.K.W., Ho, P.L., Yuen, K.-Y.: Preparedness and proactive infection control measures against the emerging novel coronavirus in China. J. Hosp. Infect. 104(3), 254–255 (2020). https://doi.org/10.1016/j.jhin.2020.01.010 7. Ghosal, A., Nandy, A., Das, A.K., Goswami, S., Panday, M.: A short review on different clustering techniques and their applications. In: Mandal, J.K., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics. AISC, vol. 937, pp. 69–83. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_9 8. Gostin, L.O., Wiley, L.F.: Governmental public health powers during the COVID-19 pandemic: stay-at-home orders, business closures, and travel restrictions. JAMA 323(21), 2137 (2020). https://doi.org/10.1001/jama.2020.5460 9. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 2 edn. Morgan Kaufmann Publishers, Waltham, Mass (2006) 10. Hawkins, R.B., Charles, E.J., Mehaffey, J.H.: Socio-economic status and COVID-19–related cases and fatalities. Pub. Health 189, 129–134 (2020). https://doi.org/10.1016/j.puhe.2020. 09.016 11. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651– 666 (2010). https://doi.org/10.1016/j.patrec.2009.09.011, http://www.sciencedirect.com/sci ence/article/pii/S0167865509002323, award winning papers from the 19th International Conference on Pattern Recognition (ICPR) 12. Jamshidi, M., et al.: Artificial intelligence and covid-19: deep learning approaches for diagnosis and treatment. IEEE Access 8, 109581–109595 (2020). https://doi.org/10.1109/ACC ESS.2020.3001973 13. Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Inter-Science (1990) 14. Li, Z., et al.: Efficient management strategy of covid-19 patients based on cluster analysis and clinical decision tree classification. Sci. Rep. 11, 9626 (2021). https://doi.org/10.1038/ s41598-021-89187-3 15. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967) 16. McAleer, M.: Prevention is better than the cure: risk management of covid-19. J. Risk Financ. Manage. 13(3) (2020). https://doi.org/10.3390/jrfm13030046. https://www.mdpi.com/19118074/13/3/46
Clustering Analysis in the Student Academic Activities
79
17. Melin, P., Monica, J.C., Sanchez, D., Castillo, O.: Multiple ensemble neural network models with fuzzy response aggregation for predicting covid-19 time series: the case of Mexico. Healthcare 8(2) (2020). https://doi.org/10.3390/healthcare8020181. https://www.mdpi.com/ 2227-9032/8/2/181 18. NPR: Chinese authorities begin quarantine of Wuhan city as coronavirus cases multiply (2020). https://n.pr/3vAxwBA 19. World Health Organization: Who statement regarding cluster of pneumonia cases in Wuhan, China (September 2020) 20. Patel, A., Jernigan, D.B., nCoV CDC Response Team: Initial public health response and interim clinical guidance for the 2019 novel coronavirus outbreak - United States. Morb. Mortal. Wkly Rep. (MMWR) 69(5), 140–146 (2020). https://doi.org/10.15585/mmwr.mm6 905e1 21. Rendon, E., Alejo, R., Garcia Rivas, J.L.: Clustering algorithms: an application for adsorption kinetic curves. IEEE Lat. Am. Trans. 19(3), 507–514 (2021). https://doi.org/10.1109/TLA. 2021.9447701 22. de Sá, P.M.: Pattern Recognition: Concepts Methods and Applications. Springer, Heidelberg (2001). https://doi.org/10.1007/978-3-642-56651-6 23. Sánchez, M.S., Valdovinos, R.M., Trueba, A., Rendón, E., Alejo, R., López, E.: Applicability of cluster validation indexes for large data sets. In: 2013 12th Mexican International Conference on Artificial Intelligence, pp. 187–193 (2013) 24. Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017) 25. Sengupta, P., Ganguli, B., Senroy, S., Chatterjee, A.: An analysis of covid-19 clusters in India two case studies on Nizamuddin and Dharavi from the SEIQHRF model for Nizamuddin we (October 2020). https://doi.org/10.21203/rs.3.rs-68814/v1 26. Yang, Q., et al.: Clinical characteristics and a decision tree model to predict death outcome in severe covid-19 patients. BMC Infect. Dis. 21(1), 783 (2021). https://doi.org/10.1186/s12 879-021-06478-w
Application of Stewart Platform as a Haptic Device for Teleoperation of a Mobile Robot Duc-Vinh Le
and CheolKeun Ha(B)
School of Mechanical Engineering, University of Ulsan, Ulsan 44610, South Korea [email protected]
Abstract. In this study, a haptic device based on a Stewart Platform is developed by a dual-loop position-based admittance control. The admittance control is a common technology used in the haptic interface, which has two control loops. An outer loop transforms a force to the desired position and orientation, which is called an admittance model. An inner position loop based on a fault-tolerant control is used to ensure that the movement of the haptic device follows the reference trajectory resulting from the admittance model. The fault-tolerant control in this research is a combination of Nonsingular Fast Terminal Sliding mode control (NFTSMC) with an improved reaching law and an Extended State Observer (ESO), which is used to estimate and compensate for disturbances, uncertainties, and faults in the system. The ESO mentioned in this paper can reduce the peaking value and enhance the tolerance ability to measurement noise compared to the traditional ESO. Accordingly, this fault-tolerant control will enhance the performance of the system under uncertainties, disturbances, and make the haptic handle move smoothly even in the presence of faults in the system. Finally, the haptic device is applied for teleoperation of a mobile robot with force feedback that helps the operator prevent the robot from colliding obstacles and improve the task performance. The experimental results demonstrate the effectiveness of the proposed system. Keywords: Haptic device · Admittance control · Teleoperation
1 Introduction Teleoperation is a popular technology in robotics that enable humans to control robots remotely. These days, teleoperation is widely used to replace humans in many fields and hazardous environments such as surgery, deep water exploration, space exploration, and nuclear power plants. Teleoperation includes a master device (haptic device), slave robot, and communication channels. The movement of the haptic device gives a position command sent to a slave robot. The feedback information such as interaction force and images are fed back to the master device. This is called bilateral teleoperation, which helps operators feel more realistic and enhance the user performance while doing a task. For example, considering a situation such as controlling a mobile robot, an operator moves the haptic handle © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 80–92, 2022. https://doi.org/10.1007/978-3-031-13832-4_7
Application of Stewart Platform
81
to control the movement of the robot and the images showing the environment are sent to the operator. However, if the information are just images without depth information, it will be challenging for the user to prevent the robot from hitting obstacles. Therefore, depth information is required. When the robot approaches obstacles, it will give the force feedback rendered from the depth information to the operators through the haptic device. This feedback helps the operator notice that the robot is moving toward the obstacle. Hence, the development of a haptic device with feedback is necessary for the teleoperation system. There have been many investigations to advance haptic performance [1–4]. A haptic device based on an admittance control is a simple and efficient way that calculates a displacement corresponding to a force input. The relationship between the force and movement is imposed by a mass damper spring system. An admittance control often has two loops. The external loop as an admittance model is used to transform the force input into movements of a handle of the haptic device. The inner loop called position control is used to track the desired position given by the external loop. In the position control, the unknown uncertainty dynamics and disturbances cause a decrease in the system stability. In addition to unknown dynamics and disturbances, faults in the system seriously deteriorate the performance. In previous literature, the presence of faults was not considered in the haptic device, hence this research will investigate a fault-tolerant method for the inner position control of the haptic device. Fault-tolerant technologies for the tracking control have increasingly attracted many researchers over the years [5– 8]. The remarkable feature of these technologies is to design a proper control law to tolerate some random faults in sensors, actuators, and other parts and guarantee whole system stability. In active fault-tolerant control (AFTC), the estimation module is used to observe the faults in the system and various methodologies have been studied in fault estimation [5–7]. An extended state observer (ESO) is an effective method and easy implementation for estimating uncertainties, disturbances, and faults. Nevertheless, the conventional ESO has several downsides such as the sensitivity to measurement noise and the peaking problem that may cause performance deterioration of the system. To eliminate these disadvantages of a traditional ESO, a new ESO [9] is proposed by Ran et al. for uncertain non-linear systems. The ESO in [9] illustrated the efficiency in decreasing the peaking value and augmenting the insensitivity to measurement noise. Therefore, thanks to those notable features, this paper will develop an AFTC for the inner loop control in the haptic device by combining a Nonsingular Fast Terminal Sliding Mode Control (NFTSMC) with an improved reaching law [10] and the ESO [9]. NFTSMC not only has the worthy attributes of robustness to uncertainties, disturbances, and low sensitivity to the changes in the system parameters, but also guarantees that the system quickly approaches the equilibrium point in a finite time. Besides, we proposed a new reaching law [10] to further enhance the performance of Sliding mode control. Consequently, the proposed controller inherits the advantages of NFTMSC with an improved reaching law [10] and ESO [9], and can considerably improve the efficiency and the stability of the system under the effect of uncertainties and faults. In this research, the actuator fault is considered and the fault-tolerant control is proposed to estimate and compensate for the bias and loss of effectiveness faults (gain faults) mentioned in [11]. Overall, our key idea is to build a haptic device based on the Stewart Platform using the admittance model and the proposed controller. The admittance model is used to
82
D.-V. Le and C. Ha
render the force impacting on the haptic handle measured by a force/torque sensor into the position and orientation of the haptic handle. The position and orientation are considered as the desired movements are fed to the proposed controller to control the haptic handle to follow the desired trajectory. The proposed controller will enhance the performance of the haptic device and make the haptic handle move smoothly under the existence of actuator faults. To assess the effectiveness of the haptic device using the suggested controller, the experiment result demonstrates a comparison of the proposed controller with the others. Finally, the haptic device will be applied for controlling a mobile robot and receiving force feedback from the robot to help the operator avoid collision and enhance task performance. The paper is organized as follows; the admittance model is described in Sect. 2. The active fault-tolerant control is designed in Sect. 3. The experiment result demonstrates the efficiency of the proposed controller and teleoperation of the mobile robot shown in Sect. 4. The conclusions are given in Sect. 5.
2 Admittance Model The structure of the haptic device is shown in Fig. 1, which includes some main components such as a Stewart Platform, a force/torque sensor (F/T sensor) is mounted on the upper platform, and a handle is mounted on the F/T sensor. The admittance model regulates the relationship between the movement of the haptic handle and a contact force on the handle. The admittance equation for 1-DOF is described as xr (s) 1 = 2 F(s) Mi s + Bi s + Ki
(1)
where F is the force impacting on the handle, x r represents the position of the haptic handle in task space, while M i , Bi , and K i describe Cartesian inertia, viscosity, and stiffness of the mechanical system, respectively. For a 6-DOF haptic device, x r has six elements described as ⎡ ⎤ ⎡ ⎤ tx fx ⎢t ⎥ ⎢f ⎥ ⎢ y⎥ ⎢ y ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ tz ⎥ ⎢ fz ⎥ 1 ⎢ ⎥= ⎢ ⎥ (2) ⎢ γ ⎥ M s2 + B s + K ⎢ m ⎥ i i i ⎢ x⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ β⎦ ⎣ my ⎦ α
mz
where t x , t y , and t z are the position of the haptic handle. α (Heave), β (Sway), and γ (Surge) are the orientation of the haptic handle. f x , f y , and f z are the force measured by the force/torque sensor. mx , my , and mz are the torque measured by the force/torque sensor.
Application of Stewart Platform
83
Fig. 1. Haptic device based on Stewart platform
3 Design Inner Position Control 3.1 Dynamic of the Haptic Device The proposed haptic device is designed by using the admittance model and the inner position control shown in Fig. 2. This research develops the position control method (faulttolerant control) based on the dynamic model that can ensure the static performance of the haptic device. The dynamic modeling of the haptic device based on the Stewart Platform can be described as follows: F = M (X )X¨ + V (X , X˙ ) + G(X ) + d = J T τ
(3)
where X = [tx , ty , tz , γ , β, α]T . F denotes the force vector in the task space. J represents a Jacobian matrix. τ ∈ Rn is the force vector in joint space. V (X , X˙ ) is the Coriolis and centrifugal force vector. G(X ) is the gravity force vector, and d denotes the unknown disturbance. The parameters M, V, G in (3) can be divided into nominal and uncertainty elements as follows: M = Mm + M , V = Vm + V , G = Gm + G where M , V , G are unknown dynamic uncertainties. M m , V m , and Gm are the nominal dynamics. Then, the dynamic Eq. (3) of the Stewart Platform can be expressed as F = Mm X¨ + Vm + Gm + ξ = J T τ where ξ = M .X¨ + V + G + d .
(4)
84
D.-V. Le and C. Ha
According to [11], the actuator fault in the system can be modeled as τa = (I − η(t))τ + ϕ(t) (t > te )
(5)
T where ϕ(t) = ϕ1 (t) ϕ2 (t) ... ϕn (n) with ϕ 1 (t), ϕ 2 (t),…, ϕ n (t) are bounded functions (bias faults) and η(t) = diag(η1 (t), η2 (t), ..., ηn (t)) with η1 (t), η2 (t),…, ηn (t) are unknown lost control rate functions (gain faults), 0 ≤ η1 (t), η2 (t), ..., ηn (t) ≤ 1. I denotes an identity matrix. τa is the torque values at the joints when faults appear and te is the time when the faults arise. Substituting (5) into (4), we get:
(6) X¨ = Mm−1 υ − Mm−1 J T η(t)τ − J T ϕ(t) + ξ where υ = F − Vm − Gm . 3.2 Extended State Observer-Based Estimation for Faults To estimate and compensate for the uncertainties, disturbances, and faults, the estimation module based on ESO is described in this subsection. In the state space, we can express Eq. (6) as: ψ˙ 1 = ψ2 (7) ψ˙ 2 = Mm−1 υ + ψ3
where ψ1 = X ∈ Rn , ψ2 = X˙ ∈ Rn , ψ3 = −Mm−1 J T η(t)τ − J T ϕ(t) + ξ . ψ 3 denotes the extended state of the system. The standard ESO has several drawbacks such as the peaking phenomenon and sensitivity to high-frequency measurement noise. Therefore, to reduce the effect of these drawbacks on the system, a new extended state observer proposed in [9] for the system (7) can be designed as: ⎧ ⎪ ⎨ θ˙1 = ρ1 (ψ1 − θ1 ) , ψ 2 = ρ1 (ψ1 − θ1 ) σ σ (8) −1 υ + ρ2 ψ − θ , ψ = ρ2 ψ − θ ˙ ⎪ θ = M ⎩ 2 2 2 2 3 2 m σ σ where θ1 , θ2 ∈ Rn , 0 < σ < 1 is a small positive constant. ρ1 , ρ2 are positive constants.
ψ 1 , ψ 2 , ψ 3 are observer states. The convergence of the ESO (8) wasshown in [9]. For the system (7) and the ESO (8), there exists ε > 0 and T > 0 so that ψi (t) − ψ i (t) ≤ ε, 2 ≤ i ≤ 3, ∀t ≥ T . 3.3 Inner Position Control for the Haptic Device The fault-tolerant control based on Nonsingular Fast Terminal sliding mode control is designed for a Stewart Platform in this part. We select the sliding mode manifold of NFTSMC as s = e + μ1 ep1 /p2 + μ2 e˙ r1 /r2
(9)
Application of Stewart Platform
85
where r1 , r2 , p1 , and p2 are positive odd integers, 1 < r 1 /r2 < 2, p1 /p2 > r 1 /r 2 and μ1 , μ2 are positive constants. The tracking error e = Xr – X. X r is the reference position and orientation of the center of the upper platform. X is the practical position and orientation of the center of the upper platform. The differential sliding surface (9) is calculated as, s˙ = e˙ + μ1
r1 pp1 −1 r1 r1 −1 |e| 2 e˙ + μ2 |˙e| r2 (X¨ r − X¨ ) r2 r2
(10)
From (7) and (10), it can be rewritten as, s˙ = e˙ + μ1
p1 pp1 −1 r1 r1 −1 ¨ |e| 2 e˙ + μ2 |˙e| r2 Xr − Mm−1 υ − ψ3 p2 r2
(11)
We can design the AFTC law with an improved reaching law [10] as follows: F = Fe + Fsw = J T τ
(12)
in which the equivalent part is 1 r2 2− rr1 μ1 p1 r2 pp1 −1 2− rr1 |e| 2 e˙ 2 − ψ 3 + ω2 |s|h sgn(s) + Vm + Gm Fe = Mm X¨ r + e˙ 2 + μ2 r1 μ2 p2 r1
(13)
and the switching part is Fsw = Mm ω1 tanh with ω1 =
2ω3 λ+(1−λ) exp(−ζ (|s|−1)) ,
positive constants, 0 < λ < 1, k > 1.
h =
s δ
(14)
k if |s| ≥ 1 , k, ω2 , ω3 , λ, ζ, δ being 1 if |s| > 1
Theorem: We consider that with the state space form of the Stewart Platform described in (7), a non-singular fast terminal sliding surface presented in (9), the ESO given in (8), and the controller designed in (12), then tracking error will converge to an equilibrium point in a finite time. Proof: The Lyapunov function is chosen as V = 21 s2 ≥ 0 Then, the differential Lyapunov function V is p1 p1 −1 r1 r1 −1 ¨ V˙ = s˙s = s e˙ + μ1 |e| p2 e˙ + μ2 |˙e| r2 Xr − Mm−1 (F − Vm − Gm ) − ψ3 p2 r2 (15) Substituting (12) into (15) yields
s r1 r1 r1 r1 r1 −1 r1 −1 −1 − μ2 ω2 |˙e| r2 |s|h+1 + μ2 s|˙e| r2 (ψ 3 − ψ3 ) V˙ = −μ2 ω1 s|˙e| r2 tanh r2 δ r2 r2 (16)
86
D.-V. Le and C. Ha
s r1 r1 r1 r1 −1 −1 − μ2 |s||˙e| r2 ω2 |s|h − ε ≤ 0 V˙ ≤ −μ2 ω1 s|˙e| r2 tanh r2 δ r2
(17)
The system will be stable if the following condition is satisfied: ω2 |s|h ≥ ε
(18)
When the state system approaches the sliding surface (|s| < 1), (18) leads to |s| ≥ ωε2 . The value ε can be very small, e.g., 0.001, 0.0001, etc. due to the convergence of ESO (8). We can select a large enough value ω2 , then ε/ω2 can be near zero. In other words, s will approach the convergence region ε/ω2 in a finite time, then the tracking error will converge to the equilibrium point within a finite time, and the theorem is proven. The finite time t s of the sliding surface that is needed to travel from e(t r ) to e(t r + t s ) can be determined in [12] as ⎞ ⎛ r1 r1 r1 1−r2 /r2 |e(0)| − 1 − 1 r 2 r GF ⎝ , r2 ; 1 + r2 ; −μ1 |e(0)|p1 /p2 −1 ⎠ ts = 2
p1 p1 r1 r1 r1 r 1 μ1 r2 − 1 p2 − 1 r2 p2 − 1 r2 (19) where GF(.) is a Gauss’ hypergeometric function.
Fig. 2. Admittance control scheme for the haptic device
4 Experiment 4.1 The Performance of the Proposed Controller This subsection compares the efficiency of NFTSMC without ESO (NFTSMC), NFTSMC with the standard ESO (AFTC-ESO1), and the suggested fault-tolerant method (12) using the ESO (8) (Proposed Controller) for a haptic device based on a Stewart Platform, which was mainly constructed by six MightyZap actuators 12Lf-17F-90 and the moving and fixed platforms shown in Fig. 1. A six-axis force/torque sensor RFT80-6A01 is mounted on the moving platform and the handle is installed on the sensor.
Application of Stewart Platform
87
The NFTSMC without the ESO can be expressed as: 1 r2 2− rr1 μ1 p1 r2 pp1 −1 2− rr1 F = Mm X¨ r + e˙ 2 + e 2 e˙ 2 + ω1 sign(s) + ω2 s + Vm + Gm μ2 r1 μ2 p2 r1 (20) On the other hand, the standard ESO for the system (7) can be given as: ⎧ ˙ ⎪ ρ1 ⎪ ψ ψ = ψ + − ψ ⎪ 1 1 2 1 σ ⎪ ⎪ ⎪ ⎨ ˙ ψ 2 = Mm−1 υ + σρ22 ψ1 − ψ 1 + ψ 3 ⎪ ⎪ ⎪ ⎪ ˙ ⎪ ⎪ ⎩ ψ 3 = σρ33 ψ1 − ψ 1
(21)
where ψ 1 , ψ 2 , ψ 3 are observer states. σ < 1 is a small positive constant. ρ1 , ρ2 , ρ3 are positive constants selected such that the following polynomial is Hurwitz: s3 + ρ1 s2 +
ρ2 s + ρ3 . Then, the AFTC-ESO1 controller is designed similarly to (12), but ψ 3 in the controller is an observer state of (21). To test the robustness of the proposed controller compared with the other controllers, it was assumed that faults occur at leg 2, leg 4, and leg 6 from the tenth second. The torque functions of actuators with the existence of faults were defined in (5) where the gain faults can be assumed as η1 (t) = 0, η2 (t) = 0.2 + 0.3 sin(π t) , η3 (t) = 0, η4 (t) = 0.3 + 0.1 cos(3t + 2), η5 (t) = 0, η6 (t) = 0.25 + 0.2 cos(t + 7) and the bias faults can be assumed as ϕ1 (t) = 0, ϕ2 (t) = 0.3 cos(0.5t + 10), ϕ3 (t) = 0, ϕ4 (t) = 0.2 sin(3t), ϕ5 (t) = 0, and ϕ6 (t) = sin(t + 5). For the ESO (8), the parameters are set as ρ1 = 1, ρ2 = 1, and σ = 0.09. The parameters of the standard ESO (21) are given as ρ1 = 3, ρ2 = 3, ρ3 = 1, and σ = 0.09. The parameters in AFTC law (12) and (20) are selected as μ1 = 0.1, μ2 = 0.02, p1 = 27, p2 = 19, r1 = 21, r2 = 19, ω3 = 0.1, ω2 = 400, λ = 0.1, ζ = 0.2, δ = 0.1, k = 1.1. The random movements of the upper platform were given by the force impacting on the handle through the admittance model (2) where M i = 1, Bi = 70, K i = 800. For teleoperation of a mobile robot, just two degrees of freedom (DOF) of Stewart platform are used. Thus, we considered the performance of x and y directions only, and neglected the remaining DOFs in this study. In addition to observing the result figures, the mean absolute error (MAE) (22) of each controller was also used for the performance evaluation. MAE =
m m 1 1 |Xri − Xi | = |ei | m m m=1
(22)
i=1
where Xri is the reference trajectory, Xi is the practical trajectory, m is the sample size. ei = Xri − Xi . Figure 3 and Table 1 show experimental results of the haptic device using NFTSMC, AFTC-ESO1, and Proposed Controller for x and y directions. It can be seen that the performance of AFTC-ESO1 and Proposed Controller were better than NFTSMC in
88
D.-V. Le and C. Ha
the first ten seconds due to the estimation and compensation of the standard ESO (8) and ESO (21) in the control laws for unknown uncertainties and disturbances of the system. Next, after ten seconds, the actuator faults occurred and the controllers showed significantly different performances. Thanks to the compensation of the ESO (8) and ESO (21), AFTC-ESO1 and Proposed Controller presented superior performance compared to NFTSMC for control tasks even though the faults arose. The proposed Controller using the ESO (8) was a bit more effective than AFTC-ESO1. Besides, the handle of the haptic device using NFTSMC does not move smoothly under the presence of faults, which causes uncomfortable state for the operator. In summary, the haptic device using the Proposed Controller has high precision and moves smoothly compared with the other controllers.
Fig. 3. The performance of the haptic device using NTSMC, AFTC-ESO1, and proposed controller
Application of Stewart Platform
89
Table 1. The mean absolute errors for x and y directions NFTSMC
AFTC-ESO1
Proposed controller
MAE-x
0.0026
0.0016
0.0015
MAE-y
0.0024
0.0014
0.0012
4.2 Teleoperation of a Mobile Robot in the Virtual Environment Figure 4 shows the teleoperation scheme of a mobile robot. The proposed haptic device with the faulty assumption in subsect. 4.1 was tested for teleoperation of the slave robot in the virtual environment built in Gazebo and shown in Fig. 5. The virtual environment includes a mobile robot and obstacles. The mobile robot is equipped with a Lidar sensor to detect obstacles. The laser scanner provides an angular solution of 1°, an angular range of 360°, and a distance range of approximately 3.5 m at a scan rate of approximately 300 rpm. The obstacles were different shapes and sizes, and were comprised of cylinders, cubes, and walls. The haptic device was connected to the computer and communicated with the mobile robot via ROS (Robot Operating System) software platform. According to [4], a logical position of the haptic handle can be mapped to the motion parameters of the mobile robot. For this research, the position command x and y were mapped to the speed rate and turning rate, respectively shown in Fig. 6. In this experiment, the operator remotely controls the mobile robot by moving the handle of the haptic device. The mission is that the operator controls the mobile robot to move from the start point to the goal point as fast as possible while simultaneously avoiding the obstacles on the given route shown in Fig. 5.
Fig. 4. Teleoperation scheme of a mobile robot
The position of the haptic handle and contact force on the handle measured by the force/torque sensor are shown in Fig. 7. Due to the measurement noise and the sensitivity of the F/T sensor, there were contact force oscillations lower than 5N in Fig. 7b, but it did not cause difficulty in controlling the haptic handle. It took approximately 145 s to complete the mission and the robot approached the obstacles two times. When the robot closed to obstacles two and four, the force feedback was provided to the haptic
90
D.-V. Le and C. Ha
Fig. 5. A mobile robot in a virtual environment
Fig. 6. Mapping a logical point (x, y) to motion parameters (speed rate, turning rate)
device to drive the handle backward at the 16th and 83rd seconds, which made the contact force increase and the robot move away from the obstacles. In the real field, this would protect the robot from damage due to an unexpected collision. These results illustrated the adequacy and effectiveness of the proposed haptic device for the teleoperation of a mobile robot.
Application of Stewart Platform
91
a)
b) Fig. 7. Experimental results of teleoperation. a) Movement of the haptic handle to control a mobile robot. b) Contact force on the haptic handle
5 Conclusion In this paper, the development of a haptic device based on a Stewart platform using the admittance model and fault-tolerant control was presented. The admittance model was used to convert force input to the handle position. The position control based on a faulttolerant algorithm was developed to track the desired position given by the admittance model. To prove the effectiveness of the proposed fault-tolerant controller compared to the other controllers, the same desired trajectory and fault functions were applied for all controllers. The tracking performance in the experiment showed good performance of the Proposed Controller compared with the NFTSMC and AFTC-ESO1. Furthermore,
92
D.-V. Le and C. Ha
the Proposed Controller illustrated the robustness trait and improved accuracy even though unknown disturbances and faults appeared in the system. The handle of the haptic device using Proposed Controller moved smoother than using the other controller, which makes the operator comfortable for controlling the haptic device. Finally, teleoperation of a mobile robot in a virtual environment was implemented to assess the proposed haptic device. The results demonstrated that the proposed master device was effective for teleoperation of the mobile robot. In the future, the teleoperation of an actual mobile robot via wireless communication will be investigated. Furthermore, the proposed haptic device will be applied for teleoperation of the 6-DOF model like an unmanned aerial vehicle (UAV).
References 1. Abdossalami, A., Sirouspour, S.: Adaptive control for improved transparency in haptic simulations. IEEE Trans. Haptics. 2, 2–14 (2009). https://doi.org/10.1109/TOH.2008.18 2. Park, H., Lee, J.M.: Adaptive impedance control of a haptic interface. Mechatronics 14, 237–253 (2004). https://doi.org/10.1016/S0957-4158(03)00040-0 3. Na, U.J.: A new impedance force control of a haptic teleoperation system for improved transparency. J. Mech. Sci. Technol. 31, 6005–6017 (2017). https://doi.org/10.1007/s12206017-1145-6 4. Ju, C., Son, H.: Il: Evaluation of haptic feedback in the performance of a teleoperated unmanned ground vehicle in an obstacle avoidance scenario. Int. J. Control. Autom. Syst. 17, 168–180 (2019). https://doi.org/10.1007/s12555-017-0721-y 5. Van, M., Ge, S.S., Ren, H.: Finite time fault tolerant control for robot manipulators using time delay estimation and continuous nonsingular fast terminal sliding mode control. IEEE Trans. Cybern. 47, 1681–1693 (2017). https://doi.org/10.1109/TCYB.2016.2555307. 6. Zhang, H., Han, J., Luo, C., Wang, Y.: Fault-tolerant control of a nonlinear system based on generalized fuzzy hyperbolic model and adaptive disturbance observer. IEEE Trans. Syst. Man Cybern. Syst. 47, 2289–2300 (2017). https://doi.org/10.1109/TSMC.2017.2652499. 7. Le, Q.D., Kang, H.J.: Implementation of fault-tolerant control for a robot manipulator based on synchronous sliding mode control. Appl. Sci. 10, 1–19 (2020). https://doi.org/10.3390/ app10072534 8. Yin, S., Luo, H., Ding, S.X.: Real-time implementation of fault-tolerant control systems with performance optimization. IEEE Trans. Ind. Electron. 61, 2402–2411 (2014). https://doi.org/ 10.1109/TIE.2013.2273477 9. Ran, M., Li, J., Xie, L.: A new extended state observer for uncertain nonlinear systems. Automatica 131, 109772 (2021). https://doi.org/10.1016/j.automatica.2021.109772 10. Le, D.-V., Ha, C.: Finite-time fault-tolerant control for a Stewart platform using sliding mode control with improved reaching law. IEEE Access 10, 43284–43302 (2022). https://doi.org/ 10.1109/access.2022.3165091 11. Guo, J., Qi, J., Wu, C.: Robust fault diagnosis and fault-tolerant control for nonlinear quad rotor unmanned aerial vehicle system with unknown actuator faults. Int. J. Adv. Robot. Syst. 18, 1–14 (2021). https://doi.org/10.1177/17298814211002734 12. Yang, L., Yang, J.: Nonsingular fast terminal sliding-mode control for nonlinear dynamical systems. Int. J. Robust Nonlinear Control 23 (2015). https://doi.org/10.1002/rnc.1666
Geometric Parameters Calibration Method for Multilink Manipulators Anton Gubankov1,2(B)
, Dmitry Yukhimets1,2 and Changan Yuan3
, Vladimir Filaretov1,2
,
1 Institute of Automation and Control Processes, Vladivostok 690041, Russia
[email protected]
2 Sevastopol State University, Sevastopol 299053, Russia 3 Guangxi Academy of Science, Nanning 530007, China
Abstract. The paper considers problems of calibration of kinematic (geometric) parameters of multilink manipulators. The structure of developed method is described. Calibration procedure is similar to conventional method of robot tool center point calculating. Data from sensors of joints angles of multilink manipulator are used. This data corresponds rotation angles when end effector of robot reaches points of the same surface with different orientation. The end effector move to surface points can be done in automatic or manual mode by means of tech pedant. The proposed method can be used without complex and expensive equipment for external measurement of position and orientation of end effector in base coordinate system. Above surface can be produced by means of 3D printing. Problems of proposed method implementation are considered. Keywords: Multilink manipulator · Calibration · Geometric parameters
1 Introduction At present, automation goes into many spheres of life, manipulators replace people in production. Most of basic technological operations (processing, cutting, arc welding, and others) performed by multilink manipulators (MM) have strict accuracy requirements [1, 2]. Accuracy of end effector moving to required position with desired orientation in base coordinate system (BCS) directly depends on accuracy of determining kinematic (geometric) parameters. In automatic operation mode, MM controller, when solving direct kinematics problem, uses nominal values of kinematic parameters specified in technical documentation. Therefore, actual positioning at the same angles of joint rotation will differ from the calculated one, and these deviations may become up to several millimeters. This is because nominal kinematic parameters of MM often differ from the real ones due to limited manufacturing accuracy of their components (links, joints, etc.) and inaccuracies during their assembly. When performing high-precision operations in automatic mode, such errors are unacceptable. Currently, MM parameters are identified by means of coordinate measurement machines, laser trackers, high-precision stereo cameras, and so on [3–7]. Purchase of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 93–105, 2022. https://doi.org/10.1007/978-3-031-13832-4_8
94
A. Gubankov et al.
this equipment requires significant investments. Therefore, creation of method for calibration of kinematic parameters of manipulators without use of complex additional equipment is urgent task that significantly increases quality of work of industrial robots. Many papers have been devoted to the development of such methods, see for example, [7–16]. The paper [11] presents method of identification using sensor probe. The identification process uses three spheres located at precisely known distances from each other. The spheres are rigidly attached to the inclined platform orientation of which changes five times. Three base spheres with a diameter of 2 inches were attached to the platform and spaced at precisely known distances with a nominal value of 300 mm (from center to center). The distance between the centers of each pair of spheres was measured by Mitutoyo coordinate measuring machine with an error of ±2.7 µ. Iterative algorithm is used to estimate the identified values of the kinematic parameters of the manipulator, which allow to minimize distance errors. Simulation results showed reliability of identified model with respect to measurement noise. Experimental tests demonstrate significant increase in accuracy of robot. The distance accuracy assessment showed that average and maximum errors were reduced from 0.698 mm to 0.086 mm and from 1.321 mm to 0.127 mm. Despite of great experimental result, to implement this method one need use high precision expensive equipment. Paper [12–14] presents methods, which are very close to each other. The main idea is to collect data for identification procedure by help of end effector, which is moved to points of the same plane. Data collection can be automated and become easy for such type of procedure. The main drawback is quality of MM parameters identification. Our own experience and research of different literature show worse results compared to other similar methods. It can be explained that equation of plane has four unknown parameters and when end effector reaches to points lied in this plane, one can get only three coordinates. For solving this problem, it should make some simplifications that can make worse results of identification. There are also a lot of confidential commercial solutions that are widely used in cars production lines. To provide high precision of technological operations they perform parameters identification of MM by means of optical sensors and calibration artefacts. These artefacts are fixtures with precisely produced spheres, pyramids and so on. Positions of, for example, sphere centers are measured by means of laser trackers. The identification procedure can be conducted in automatic mode, but its implementation cost is very high. In [15, 16] original method was proposed that allows to identify geometric parameters of MM. It based on data obtained when end effector (probe) is moved to the same point in space, which is fixed by another probe. In this paper, the development of this approach is proposed. The MM end effector is moved to points, which belong to the same surface. The surface is produced by means of 3D printer using cheap and low shrinkage filament (for example, Polylactic Acid, PLA) or shrinkage compensation option. Proposed method significantly simplifies process of collecting primary data necessary for calibration and its automation is easy implemented. This is the contribution of proposed paper.
Geometric Parameters Calibration Method
95
2 Description of MM Parameter Calibration Method In this paper, the method of parameter identification is considered on example of a six degree of freedom MM with series kinematic scheme of PUMA type (see Fig. 1). To perform identification, it is necessary to use kinematic MM model, which taking into account Denavit-Hartenberg notation has following form: Af (, Q) =
6 k=1
Ak (ϕk , qk ),
(1)
Rf Xf R4×4 is homogeneous transformation matrix describing position O 1 and orientation of MM last link (flange) in BCS Oxyz associated with MM base; Rf R3×3 is orientation matrix of MM flange in BCS; Xf R3×1 is coordinate vector of MM flange in T BCS; OR1×3 is zero row vector; = ϕ1T , . . . , ϕ6T , ϕk = (θk , ak , dk , αk ), k = 1, 6 is Denavit-Hartenberg parameter matrix; k is MM joint number; Q = (q1 , . . . , q6 )T is vector of MM joint angles, which is measured by means of built-in sensors;
where Af =
Ak (ϕk , qk ) ⎤ ⎡ cos(qk + θk ) − sin(qk + θk ) cos(αk ) sin(qk + θk ) sin(αk ) ak cos(qk + θk ) ⎢ sin(qk + θk ) cos(qk + θk ) cos(αk ) − cos(qk + θk ) sin(αk ) ak sin(qk + θk ) ⎥ ⎥ =⎢ ⎦ ⎣ 0 sin(αk ) cos(αk ) dk 0 0 0 0 is Denavit-Hartenberg transformation matrix. In automatic operation mode, to form trajectories for MM end effector, controller
of nominal geometric solves direct and inverse kinematics problems using matrix parameters of Denavit-Hartenberg, compiled according to data from technical documentation. According to [16] to estimate real parameters of matrix F, it is necessary to perform n series of measurements of vectors Q. Each i-th series consists of mi vectors Q, which are formed as a result of manually drive by operator (or automatically) end effector to mi arbitrary points belong to i-th surface. The orientation of the end effector may change or may remain constant. For example, in automatic mode it is easier to perform measurements with a constant orientation of the touch sensor. i of For each vector Qji , i = 1, n , j = (1, mi ) it is possible to match vector X˜ tool,j i tool center point (TCP) coordinates Xtool,j in BCS, which, taking into account model
of parameters, will be calculated as follows: (1), using matrix 6 i X i ˜ ˜ R f ,j tool,j =
Qji = Aij , Aik,j ATCP , (2) O 1 k=1 where R˜ if ,j ∈ R3×3 is orientation matrix of manipulator flange in BCS for the j-th mea E XTCP , E ∈ R3×3 is unit diagonal matrix; X TCP surement in the i-th series; ATCP = O 1 is coordinate vector of TCP in flange coordinate system Of x f yf zf .
96
A. Gubankov et al.
Fig. 1. Manipulator with series kinematic scheme and surface for identification process i Coordinates X˜ tool,j calculated using (2) will differ from the coordinates of the actual position of TCP due to difference between MM parameters used and their real values. i will lie above or below the i-th surface. Therefore, points X˜ tool,j As a surface in identification process, it is proposed to use a part of a hyperbolic paraboloid:
z=
(x − x0 )2 (y − y0 )2 − + z0 , A2 B2
(3)
where x0 , y0 , z0 are coordinates of initial surface displacement; A, B are coefficients of the hyperbolic paraboloid. The choice of surface type is due to the simplicity of its mathematical description and 3D printing. If the point does not lie in the surface, then (3) is violated. Quality estimation of identification of MM parameters will be carried out according to belonging of points to surface: ⎞2 ⎛ 2 2 mi n yitool,j − y0i xitool,j − x0i ⎟ ⎜ − + z0i − z itool,j ⎠ , J , = ⎝ 2 2 Bi Ai i=1 j=1
As a result, the task of MM parameters identification has the following form: J , = minJ (, ).
(4)
(5)
The Livenberg-Marquardt numerical optimization method will be used to estimate parameters of manipulator. To do this, initial measurement data must be presented in the
Geometric Parameters Calibration Method
97
following form: 2 xitool,j − x0i
rji =
Ai
2
−
yitool,j − y0i
⎤
Bi
2
2
+ z0i − z itool,j , i = 1, n , j = 1, mi ,
⎡ ⎤ r1i R1 n ⎢ .. ⎥ ⎢ .. ⎥ mi . Ri = ⎣ . ⎦, P = ⎣ . ⎦ ∈ RL , L = ⎡
i rm i
(6)
i=1
Rn
Cost function (4) taking into account (6) can be rewritten as follows: J =
1 T P P. 2
The matrix of IR parameters can be presented as follow ⎡ ⎤ ϕ1 T ⎢ . ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎢ T ⎥ ⎢ ϕ6 ⎥ ϑ =⎢ ⎥ ∈ R24+3i , ⎢ X0,i ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ . ⎦ X0,n
(7)
(8)
where X0,i is vector of the initial displacement for the i-th surface. The i-th surface means its different location and orientation in working zone of MM. As a result of its work, estimate of vector ϑ of parameters of manipulator will be
j
formed, at which points X tool,i converge to minimum distance with the i-th surface in each i-th series of measurements.
3 Simulation Results Numerical simulation was carried out to verify the proposed method for identification of IR kinematic parameters. As MM the Mitsubishi RV-2FB robot was considered, with a PUMA kinematic scheme (see Fig. 1). Two different location and orientation of surface were considered (see Fig. 2). They describe as follows: z=
(y − y01 )2 (x − x01 )2 − + z01 , B1 2 A1 2
(y − y02 )2 (z − z02 )2 − + x01 , (9) 2 B2 2 A2 √ √ √ and coefficients have following values: A1 = 120, B1 = 60; A2 = 60, B2 = √ 120; x01 = 20, y01 = 20, z01 = 100; x02 = 100, y02 = 20, z02 = 20. x=
98
A. Gubankov et al.
Fig. 2. Two locations and orientations of the same surface
Four cases were investigated in the MATLAB programming environment to study various practical situations, such as errors in the end effector drive to the surface, inaccuracies in surface 3D printing, unknown initial position of the surface, the influence of the orientation of the end effector on the quality of identification of MM parameters. 1. Without taking into account errors of tool driving to surface points and known initial location of surfaces x0i , y0i , z0i . 2. Taking into account errors of tool driving to surface points and known initial location of surfaces. These errors were formed randomly, while their amplitude does not exceed 0.1 mm. 3. Taking into account errors of tool driving to surface points and unknown initial location of surfaces. 4. Without taking into account error of tool driving to surface points and unknown initial location of surfaces. To additionally evaluate quality of identification, let’s calculate distances between
j
points X tool,i and the closest points to them on surface. Distances are calculated using the following formula: 2 2 2 i xji − xitool,j + yji − yitool,j + zji − z itool,j , hj =
(10)
where xitool,j , yitool,j , z itool,j are TCP coordinates in BCS, calculated using (2) and matrix
of identified parameters ; xji , yji , zji are coordinates of point in surface. Due to unknown the closest point xji , yji , zji in surface to xitool,j , yitool,j , z itool,j task was solved like optimization problem. Cost function was (10) with the only difference that, depending on surface location and orientation used (see Fig. 2), one of corresponding unknown coordinate will have the form (9). This reduced complexity of the optimization task and allowed to perform fewer calculations. To evaluate distances MATLAB function fminsearch was used.
Geometric Parameters Calibration Method
99
Table 1 shows the values of cost function (4) at the beginning and at the end of the optimization algorithm. Figure 3 shows process of its change in all four cases. Table 1. Values of cost function J before and after identification N
1
2
3
4
J0
4.0349·106
4.0334·106
4.0334·106
4.0349·106
J
1.145·10–8
3.6014
7.067·10–9
3.8773
Fig. 3. Process of cost function change in all four cases
Consider the first case – without taking into account errors of tool driving to surface points and known initial location of surfaces. Real and identified parameters are
100
A. Gubankov et al.
given below:
⎡
⎤ θ,◦ a, mm d , mm α,◦ ⎢ 0 0.4 295 −89.885 ⎥ ⎢ ⎥ ⎢ −89.885 229.6 0.4 −0.17 ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ −90 −89.77 ⎥, 50.3 0 ⎢ ⎥ ⎢ −0.17 0.2 270.3 90.17 ⎥ ⎢ ⎥ ⎣ 0.115 0.3 0.2 −90.115 ⎦ 180 0 0.2 70.3 ⎡ ⎤ ◦ θ, a, mm d , mm α,◦ ⎢ −3.98 · 10−4 0.3965 295.0058 −89.8859 ⎥ ⎢ ⎥ ⎢ −89.8855 229.6027 0.2120 −0.1717 ⎥ ⎢ ⎥ ⎥ ˆ =⎢ ⎢ −90.0011 50.3003 0.2120 −89.7726 ⎥. ⎢ ⎥ ⎢ −0.1674 0.1981 270.3067 90.171 ⎥ ⎢ ⎥ ⎣ −0.0668 0.0769 0.0469 −89.9881 ⎦ 179.7395 0 0.2004 70.3002
Graphs of distances to surfaces before and after identification (red – before, blue – after) are shown in Fig. 4.
Fig. 4. Graphs of distances to surfaces in the first case. (Color figure online)
The value of cost function after identification has become very close to zero, distances between working point of tool and surfaces do not exceed 8 µ, 15 parameters of 24 differ in thousandths from the real ones. This confirms the correctness of the proposed idea for the identification of MM parameters.
Geometric Parameters Calibration Method
101
The second case, when taking into account errors of tool driving to surface points and known initial location of surfaces. Identified parameters are as follows: ⎡ ⎤ θ,◦ a, mm d , mm α,◦ ⎢ −0.2044 1.4783 293.0184 −89.3426 ⎥ ⎢ ⎥ ⎢ −90.1441 225.6131 0.2037 −3.102 ⎥ ⎢ ⎥ ⎥ ˆ =⎢ ⎢ −89.7445 49.1007 0.2034 −88.5544 ⎥. ⎢ ⎥ ⎢ 0.7578 2.1141 270.6 88.1236 ⎥ ⎢ ⎥ ⎣ −3.3607 0.0163 0.0414 −86.2471 ⎦ 202.2377 2.3347 70.7721 0
Distances decreased about 6 times and became less than 0.2 mm (Fig. 5).
Fig. 5. Graphs of distances to surfaces in the second case
In the third case, when we consider errors of tool driving to surface points and initial location of surfaces are unknown, distances remained close to the initial ones (see Fig. 6), estimates of angular parameters and surfaces locations contain errors: ⎡ ⎤ θ,◦ a, mm d , mm α,◦ ⎢ −0.4482 2.2839 292.404 −86.2744 ⎥ ⎢ ⎥ ⎢ −89.955 224.3485 0.3064 −10.0637 ⎥ ⎢ ⎥ ⎥ ˆ =⎢ ⎢ −89.7105 48.2523 0.2976 −87.2783 ⎥; ⎢ ⎥ ⎢ −2.8071 2.3594 270.6748 87.7263 ⎥ ⎢ ⎥ ⎣ −5.1981 0.0202 0.2507 −89.5555 ⎦ 193.0757 3.4386 71.2744 0
102
A. Gubankov et al.
x01 = 20, y01 = 20, z01 = 100; x01 = 19.9737, y01 = 19.6393, z 01 = 99.1191;
x02 = 100, y02 = 20, z02 = 20; x02 = 97.8133, y02 = 19.0257, z 02 = 23.477. This case is close to real one because in process of conducting real research on the identification of MM parameters, the exact initial displacements of surfaces are unknown and errors are present during data collection. Nevertheless, it is possible to significantly reduce the error of the tool driving to surface points if one will use a touch sensor or a micrometer mounted on the MM flange. To analyze this situation, the fourth experiment was conducted (see Fig. 7). As a result, distances to surfaces did not exceed 0.028 mm, 15 parameters of 24 differ from the real ones by thousandths, and initial displacements of the surfaces differ in hundredths of a millimeter from the actual ones:
Fig. 6. Graphs of distances to surfaces in the third case
x01 = 20, y01 = 20, z01 = 100; x01 = 20.0205, y01 = 20.02, z 01 = 99.9794;
x02 = 100, y02 = 20, z02 = 20; x02 = 100.0176, y02 = 20.0127, z 02 = 20.0188;
Geometric Parameters Calibration Method
103
Fig. 7. Graphs of distances to surfaces in the fourth case
⎡
⎤ θ,◦ a, mm d , mm α,◦ ⎢ −0.0013 0.3917 295.0019 −89.8865 ⎥ ⎢ ⎥ ⎢ −89.8859 229.6038 0.203 −0.1697 ⎥ ⎢ ⎥ ⎥ ˆ =⎢ ⎢ −90.0007 50.3017 0.203 −89.7697 ⎥. ⎢ ⎥ ⎢ −0.168 0.2 270.305 90.1712 ⎥ ⎢ ⎥ ⎣ −0.0319 0.12 0.0773 −90.015 ⎦ 186.1388 0.1788 70.3174 0
4 Conclusions and Discussion A modification of the MM parameter identification method described in [16] is proposed. Its difference is the use of a calibration fixture printed on a 3D printer. Its production is cheap and accessible to a wide range of researchers and engineers, and printers from the middle price range allow you to produce parts with an accuracy of up to 0.1 mm. The proposed method is universal, as a surface, one can use any other with a known mathematical description. The cost of 3D printing surface model and touch sensor or micrometer implementation is significantly less than the purchase of coordinate measuring machines, laser trackers or high-precision stereo cameras. In addition, the use of a touch sensor makes it possible to automate and speed up the data collection process at the initial stage of identifying the parameters of the manipulator. Simulation has confirmed the operability and effectiveness of the proposed method, but it certainly requires experimental validation. Despite the accuracy of identifying MM parameters is similar to [16], further it is necessary to determine the requirements for
104
A. Gubankov et al.
the calibration surface printing, for the used material (filament), and for the limitations of the proposed calibration method. Two or more different location and orientation of surface is redundant for geometric parameters calibration. It can be used for identification of MM elastostatic parameters when heavy end effectors are used. Acknowledgement. This work is supported by Russian Science Foundation (grant 22-19-00392).
References 1. Filaretov, V., Zuev, A., Yukhimets, D., Gubankov, A., Mursalimov, E.: The automatization method of processing of flexible parts without their rigid fixation. Procedia Eng. 100, 4–13 (2015). https://doi.org/10.1016/j.proeng.2015.01.336 2. Filaretov, V., Yukhimets, D., Zuev, A., Gubankov, A., Mursalimov, E.: Method of combination of three-dimensional models of details with their CAD-models at the presence of deformations. In: Proceedings of the 12th IEEE International Conference on Automation Science and Engineering, pp. 257–261. IEEE (2016). https://doi.org/10.1109/COASE.2016.7743415 3. Hollerbach, J., Khalil, W., Gautier, M.: Model identification. In: Siciliano, B., Khatib, O. (eds.) Springer Handbook of Robotics, pp. 113–138. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-32552-1_6 4. Nubiola, A., Bonev, I.A.: Absolute calibration of an ABB IRB 1600 robot using a laser tracker. Robot. Comput.-Integr. Manuf. 29, 236–245 (2013). https://doi.org/10.1016/j.rcim. 2012.06.004 5. Elatta, A.Y., Gen, L.P., Zhi, F.L., Daoyuan, Y., Fei, L.: An overview of robot calibration. Inf. Technol. J. 3, 74–78 (2004). https://doi.org/10.3923/itj.2004.74.78 6. Gang, C., Tong, L., Ming, C., Xuan, J.Q., Xu, S.H.: Review on kinematics calibration technology of serial robots. Int. J. Precis. Eng. Manuf. 15, 1759–1774 (2014). https://doi.org/10. 1007/s12541-014-0528-1 7. Klimchik, A., Pashkevich, A., Wu, Y., Caro, S., Furet, B.: Design of calibration experiments for identification of manipulator elastostatic parameters. J. Mech. Eng. Autom. 2, 531–542 (2012) 8. Khalil, W., Garcia, G., Delagarde, J.F.: Calibration of the geometric parameters of robots without external sensors. In: Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pp. 3039–3044. IEEE (1995). https://doi.org/10.1109/ROBOT. 1995.525716 9. Roning, J., Korzun, A.: A method for industrial robot calibration. In: Proceedings of International Conference on Robotics and Automation, pp. 3184–3190. IEEE (1997). https://doi. org/10.1109/ROBOT.1997.606773 10. Edwards, C., Galloway, R.L.: A single-point calibration technique for a six degree-of-freedom articulated arm. Int. J. Rob. Res. 13, 189–198 (1994). https://doi.org/10.1177/027836499401 300301 11. Joubair, A., Bonev, I.A.: Kinematic calibration of a six-axis serial robot using distance and sphere constraints. Int. J. Adv. Manuf. Technol. 77(1–4), 515–523 (2014). https://doi.org/10. 1007/s00170-014-6448-5 12. Khalil, W., Besnard, B., Lemoine, P.: Comparison study of the geometric parameters calibration methods. Int. J. Robot. Autom. 15, 56–67 (2000) 13. Tang, G.R., Liu, L.S.: A study of three robot calibration methods based on flat surfaces. Mech. Mach. Theory 29, 195–206 (1994). https://doi.org/10.1016/0094-114X(94)90030-2
Geometric Parameters Calibration Method
105
14. Zhong, X.L., Lewis, J.M.: A new method for autonomous robot calibration. In: Proceedings of the 1995 IEEE International Conference on Robotics and Automation, pp. 1790–1795. IEEE (1995). https://doi.org/10.1109/ROBOT.1995.525529 15. Gubankov, A.S., Yukhimets, D.A.: Identification method of kinematic parameters of multilink industrial manipulator. In: Proceedings of the IEEE 7th International Conference on Systems and Control, pp. 327–331. IEEE (2018). https://doi.org/10.1109/ICoSC.2018.8587637 16. Gubankov, A., Yukhimets, D.: Development and experimental studies of an identification method of kinematic parameters for industrial robots without external measuring instruments. Sensors 22, 3376 (2022). https://doi.org/10.3390/s22093376
A Kind of PWM DC Motor Speed Regulation System Based on STM32 with Fuzzy-PID Dual Closed-Loop Control Wang Lu, Zhang Zaitian, Cheng Xuwei(B) , Ren Haoyu, Chen Jianzhou, Qiu Fengqi, Yan Zitong, Zhang Xin, and Zhang Li 21st Department, 32272 Unit of PLA, Chengdu 610214, China [email protected]
Abstract. In order to meet the transportation and transmission requirements in daily life, a dual closed-loop PWM DC motor speed regulation system based on STM32 is designed, including hardware circuit and its control program. At the same time, the control system and method are optimized respectively, in order to expand its applicability. Fuzzy control is used to improve its closed-loop control performance. The experimental results show that although PI control has good dynamic and steady-state performance, fuzzy control has the advantages of faster start-up, no overshoot of speed and better anti-interference ability compared with PI, which can automatically adjust the controller coefficient, and has good adaptability. Keywords: Digital control · PID · PWM · Fuzzy control
1 Introduction DC motor has good starting and braking performance, which has been widely used in electric drag automatic control systems, such as rolling mills and their auxiliary machinery, mine hoists and other fields. In actual DC speed regulation system of industrial field, the DC speed regulation system is widely used with speed and current dual closed loop control. Traditional DC motor dual closed-loop speed regulation system mostly adopts a simple structure with PID controller. While, parameter changing and non-linear characteristics of controlled object make it difficult for PID controller to reach optimal state. Fuzzy control dynamically output according to empirical rules with its nonlinear characteristics, it can overcome limitations of PID, so that, speed regulation system has both fast dynamic response and a high stability degree, its robustness is further improved. In this paper, a DC motor PWM speed regulation system based on STM32 microcontroller is designed. Using closed-loop PWM control, its excellent performance can be further exerted, which can meet high need of load capacity and excellent driving ability.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 106–113, 2022. https://doi.org/10.1007/978-3-031-13832-4_9
A Kind of PWM DC Motor Speed Regulation System
107
2 System Framework The requirement of common application scenarios of the motor is as: power supply voltage is 72 V, motor voltage is 60–72 V, power capacity is 1.5–3 KW, and the maximum speed is 50 km/h. The structure of PWM DC motor speed regulation system based on STM32 is shown in Fig. 1.
Fig. 1. Flow block diagram of STM32 control system
By programming, it collects feedback signal of speed and current through AD converter on-chip, after signal processing, which is converted into control signal of PWM output. Changing duty cycle, feedback control is accomplished, thus, a dual closed-loop system of speed and current is finished. By measuring the motor parameters, establishing mathematical model and calculating control loop parameters, PWM drive signal is generated. While, with the development of micro-controller and computer control algorithms, a series of improved PID algorithms have been generated, such as integral separation PID, positional PID, Fuzzy-PID, which are more flexible and stable than traditional PID. Considering motor forward/reverse rotation and braking needs, H-bridge drive circuit is adopted, which is a high-voltage and high-current full-bridge driver, and used to drive DC or stepper motor. In order to overcome disadvantages of speed loop, and obtain better dynamic performance, by embedding current control loop in the speed loop, and integrate armature current in closed loop as a feedback model, the speed and current dual closed-loop control system is obtained.
3 Modeling and Analyzing of Dual Closed-Loop Analyzing motor working process and state parameters, the motor mathematical model is built with proper parameters, and the closed-loop parameters are optimized with simulation and experiments. The motor parameters are shown in Table 1.
108
W. Lu et al. Table 1 Parameters of motor.
Rated voltage
72 V
Rated current
20 A
Rated speed
1000 r/min
Rated torque
10 N·m
Armature loop resistance
1
Armature inductance
0.001 H
Flywheel inertia
0.1 N/m2
–
–
3.1 Dual Closed-Loop Control System The motor is modeled with parameters in Table 1. TL = L R Tm = GD2 R 375Ke Cm
(1) (2)
Fig. 2. Mathematical model of the motor
In open-loop model, current can reach 62 A during start-up stage, in direct start type. the overcurrent phenomenon is obvious, which can easily damage devices. The reason is that motor armature is in a static state, and armature winding directly connect to the power, which cause to short circuit and a large instantaneous current, so it is necessary to add a starting resistance to reduce current at start-up. Given 1000 r/min, building model in Fig. 2, and running simulation 0.5 s under rated load and positive rotation, the speed and armature current are obtained as in Fig. 3 and Fig. 4
Fig. 3. Speed waveform
Fig. 4. Armature current wave form
A Kind of PWM DC Motor Speed Regulation System
109
It can be seen that, starting time is about 0.05 s, starting speed acceleration is significantly reduced after 0.03 s, and then approaching final speed slowly. The motor has poor starting characteristics with open loop, and speed acceleration is constantly decaying throughout starting stage, which cannot reach constant acceleration. Motor current quickly jumps to 62 A, and then falls sharply, the reason is that induced electromotive force is generated under armature winding magnetic field to reduce the current. Such sharp change of torque is demonstrated in large decay, but also caused sharply reducing speed acceleration, so that the motor cannot maintain a constant torque during start-up stage, the start-up characteristics are poor. So, the feedback is divided into current and speed dual feedback loop control, and the STM32F103 is selected for digital controller. Figure 5 shows steady-state structure diagram of dual closed-loop speed regulation system. If trigger requires output of ACR (UCT) to be positive, the input Ui* of ACR is negative for regulator is reverse input, so voltage Un* of the ASR input is required to be positive.
Fig. 5. Dual closed-loop control schematic
Fig. 6. Dual closed-loop control speed characteristics
The current loop is a closed loop composed of current regulator ACR and current negative feedback, which is aim to stabilize current. The speed loop is a closed loop composed of speed regulator ASR and negative speed. The working process of dual closed-loop speed regulation system can be simply divided into two stages. First, in start-up stage, for given sudden adding step signal Un∗ due to mechanical inertia, speed deviation voltage Un is extremely large, speed regulator ∗ and hold. Meanwhile, ASR is rapidly saturated, and output reaches limit value Uim current negative feedback loop keeps armature current constant and speed rises linearly, as shown in Fig. 6. Then, after speed reaches a given value and overshoot is generated, difference between fixed signal of speed loop and the feedback signal crosses 0-point polarity transition, speed remains constant, that is, n = Un*/. The speed and current feedback coefficient can be calculated respectively by formula ∗ nmax (3) α = Unm ∗ β = Uim Idm
(4)
110
W. Lu et al.
∗ and U ∗ are given output maximum and limit value of speed loop, where, Unm im respectively. The transfer functions of dual closed-loop speed regulation system are WASR = Kn · (1 + τns ) τns (5)
WACR = Ki · (1 + τis ) τis
(6)
3.2 Simulation of Motor PID Dual Closed-Loop Model The closed-loop motor is constructed with above parameters, and 4 s simulation is carried out for given rated load is 1000 r/min, and the load disturbance of (±)5 N · m is added at 1 s and 1.75 s respectively, results shown in Fig. 7 are obtained.
Fig. 7. Closed-loop perturbation waveform
It can be seen that armature current in start-up stage quickly reaches 2 times rated current value and then hold. Thus, the goal of rapid adjustment of motor speed is achieved, which ensure good motor start-up characteristic, realize constant torque start-up.
4 Fuzzy-PID Design and Simulation DC motor itself is a nonlinear system, in order to simplify calculation and design, so linear approximation is adopted. However, when it comes to more complex nonlinear changes of system, and it cannot be better adjusted real-time. In order to further improve control system performance, fuzzy PID control is introduced to strengthen system control capabilities. The basic fuzzy control system is shown in Fig. 8. The implementation process of the fuzzy control algorithm is as follows: As an input signal of fuzzy controller,
A Kind of PWM DC Motor Speed Regulation System
111
error E between given signal and feedback, can be converted to fuzzy variation by fuzzy language, and then fuzzy subset is judged by control rules to obtain fuzzy control variation u, at last de-fuzzy processing is utilized to obtain precise control amount U. For difference application, input of fuzzy controller can be expanded into multiple variations, to achieve desired effect.
Fig. 8. Fuzzy control basic schematic
The fuzzy rule base provides the necessary definitions of fuzzy control, including the discretization, quantization, and normalization of the domain of language control rules, as well as the division of input and output spaces, the definition of membership functions, and so on. In this fuzzy control system, speed feedback signal change quickly during start up stage, so input set are expanded to 7 states: {Negative Large, Negative Medium, Negative Small, Nero, Positive Small, Mid, Positive Large}.The basic domain of error E, error change EC and fuzzy controller output are [-xe, xe], [-xec, xec] and [-ye, ye], respectively. Conversing basic domain [a, b] to fuzzy domain [-n, + n] is y = 2n (b − a) · x − (a + b) 2 (7) The membership function is determined by expert’s empirical method, lastly, triangular membership function is adopted. After analyzing with expert experience, control rules of the dual-input, single-output system are determined. Add the fuzzy controller to closed-loop control, we can to obtain simulate as Fig. 9.
112
W. Lu et al.
Fig. 9. Fuzzy-controlled PID closed-loop control
The PID dual closed-loop control and fuzzy PID control model are simulated synchronously. Given 1000 r/min, (±) 5 N · m load disturbance is added at 1 s, and the duration time 1 s.
Fig. 10. Omparision of RPM
Fig. 11. Burst load disturbance RPM
It can be seen from Fig. 10 and Fig. 11, that linear time of constant torque is longer under fuzzy control, the starting speed is faster without overshoot, which is better than traditional PID control. When sudden load (reduce) disturbance appear, motor response is more rapid under fuzzy control, speed change amplitude and rate are smaller, and dynamic anti-interference performance is strength. Whether it is the start-up or anti-interference performance, fuzzy control system performance is better than traditional PID control, which has stronger regulation performance, speed variation is smaller, response speed is faster, and dynamic performance is greatly improved.
5 Conclusion In this paper, a kind of DC motor dual closed-loop PWM full-bridge PID control system based on STM32 is designed. System current overshoot is less than 10%, speed overshoot is less than 20%, and reaches steady-state without static error, constant torque start-up, which has good anti-interference performance, can meet expected household power drag scenario. Further improving system performance, so that control system can meet
A Kind of PWM DC Motor Speed Regulation System
113
demand for power dragging application. To improve DC motor performance, fuzzy control method is adopted to dual closed-loop PID control system, experimental results show that fuzzy controller designed makes system performance improved, which has better start-up characteristics without steady state error and overshoot, and digital fuzzy control system improves system anti-interference performance.
References 1. Meshalkin, V.S, Kuguchev, D.B, Mansurov, A.R.: PWM on the basis of different configurations of reference signals for a multilevel ocnverter on H-Bridge. In: IEEE 19th International Conference of Young Specialists on Micro/Nanotechnologies and Electron Devices (EDM), pp. 6403–6407(2018) 2. Malinowski, M., Gopakumar, K., Rodriguez, J., et al.: A survey on cascaded multilevel inverters. IEEE Trans. Industr. Electron. 57(7), 2197–2206 (2009) 3. Sharma, K., Palwalia, D.K.: A modified PID control with adaptive fuzzy controller applied to DC motor[C]. IEEE International Conference on Information, Communication, Instrumentation and Control (ICICIC), p. 405 (2017) 4. Ji X H. Design of Low-Power DC Servo Motor Speed Control System[D]. Hebei Agricultural University, 2010 5. Fei, X.H.: Design of PWM controller for DC motor. Dig. Technol. Appli. 11, 155–156 (2012) 6. Jiang, Y.L., Tao, A.L., Dong, B.L., et al.: Study on a single closed loop DC speed regulating system based on visual technology. Sensor World 20(1), 28–31 (2014) 7. Liu, S.B.: PWM speed regulation system of DC motor based on STM32. Northeast Pet. Univ. 43, 834 (2018) 8. Zhang, W., Yang, M.: Comparison of auto-tuning methods of PID controllers based on models and closed-loop data. In: IEEE Proceedings of the 33rd Chinese Control Conference, pp. 3661– 3667 (2014) 9. Zhu, R.T., Wu, H.T.: Design of DC motor speed regulation system based on incremental PID algorithm. Instr. Tech. Sens. 7, 121–126 (2017)
Machine Learning and Data Mining
Research on Exchange and Management Platform of Enterprise Power Data Uniftcation Summit Yangming Yu(B) , Zhiyong Zha, Bo Jin, Geng Wu, and Chenxi Dong Information and Communication Branch of Hubei Epc, Hubei, China [email protected]
Abstract. In recent years, due to the continuous expansion of the government, enterprises, and social users for power data sharing applications, as well as the rapid expansion of data service product scope and data service access business volume, along with the continuous advancement of power data sharing interactive services, it is necessary to improve data sharing operation management and control mechanism. At the same time, on the one hand, in order to support the efficient development of the daily operation and maintenance management of data products, standardize the product operation and maintenance, including data catalog management, data product management, product drop-off, etc.; on the other hand, in order to strengthen the operation of the data sharing and interaction process, etc. This paper proposes the design of the data sharing service platform, which meets the following requirements: First, establish access service security guarantee, unified access to government, enterprise and other customers’ access requests in different ways The second is to establish a standardized data service demand management, to provide data catalog query services for customers such as governments and enterprises, and to accept the data needs of these customers, forward the business department for processing, and give feedback to customers on data demand processing opinions; The third is to provide unified data Product access services, including data product query, data product application, and data product online access services; Fourth, provide unified data interface services, including data interface query services. The platform guarantees the normal operation of external power data sharing services, establishes an internal data sharing management and control mechanism, and supports the company’s data service operation management personnel to carry out data product operation management, data service upload and removal, data service authority control, data service operation monitoring and analysis, etc. Data sharing operation management and control. Keywords: Enterprise power data · Data sharing · Management platform
1 Introduction In recent years, with the continuous expansion of the government, enterprises, and social users’ demands for power data sharing applications, the scope of data service products and the volume of data service access will also rapidly expand [10]. At present, there © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 117–128, 2022. https://doi.org/10.1007/978-3-031-13832-4_10
118
Y. Yu et al.
is a lack of effective prevention and control in data interaction, limited support for access protocols, insufficient security protection [11], lack of data permission control and operation monitoring [6], which brings risks to data security. Therefore, it is necessary to strengthen the ability to support different access protocols of multi-channel users, improve the encryption and security authentication of the data interaction process, and strengthen the safety prevention and control capabilities such as fuse recovery, service current limiting, and anti-replay in the data interaction service process to ensure that Data and application security, improve the efficiency of power data sharing and interaction [1]. At the same time, with the continuous advancement of power data sharing and interactive services [4], it is necessary to improve the data sharing operation management and control mechanism. Data product management, product release and release, etc.; on the one hand, strengthen the monitoring and management of the operation of the data sharing interaction process, and comprehensively control the development of data sharing services; Enhance data security prevention and control, etc. At present, the interface protocols provided by the power grid resource middle station, customer service business middle station, data middle station and other systems are not uniform, and the multi-system mesh connection method is expensive. The data services of these business middle stations and data middle stations need to access data Shared applications will require a lot of investment in transformation costs, and later operation and maintenance costs will be large. Therefore, it is necessary to establish a unified interface management application, promote the unified management of interface services and multi-protocol adaptation, lay a solid foundation for supporting data sharing and interactive services, and support power data integration and efficient external sharing. From the point of view of management benefits, by building a data sharing platform, the in-depth data sharing application has been completed, the customer service channels on the power side have been expanded, and the internal business power efficiency has been improved. The government side has collected high-quality, multi-dimensional customer user Electric data improves the accuracy of decision-making and further improves the internal work efficiency of both parties. From the perspective of social benefits, by improving the online interactive application between government and enterprises, both parties can carry out more in-depth data and product sharing, assist the government to make economic decisions more efficiently, and at the same time promote people’s livelihood business, reduce the number of errands, and achieve mutual benefits and win-win results. Based on the previous government data sharing data and platform construction sequence, there are three modes of data sharing with the government as follows. First, through the government-enterprise data sharing application deployed in the Internet area, the business data in the management information area is dispatched to the government pre-database deployed on the government private network to support government data sharing; the second is.
Research on Exchange and Management Platform
119
Through the data exchange deployed in the Internet area. The server, the indicator platform of the dispatching management information area, the electric energy service management platform and other business data are sent to the government pre-database deployed by the government private network to support the sharing of government data; the third is that the provincial government directly deploys it in the province of the Internet area through the Internet call The interface server of the Economic and Information Commission obtains power data and supports real-time government calls. Realize data sharing applications with enterprises through the energy big data platform (the energy big data platform has an external network database). The first two models have the same technical route, and the third model is direct interaction on the Internet. The data sharing service is to provide data services for the government, enterprises and other customers, so that the government and enterprises can access data services in different ways, and manage the data service requirements and data product requirements of the government and enterprises. The data sharing that has been carried out mainly adopts the following three modes: one is to synchronize business data in real time through the data interface, the other is to synchronize the data regularly through the data exchange service, and the third is to provide data access services for the Provincial Economic and Information Commission through a fixed interface. In order to support the establishment of a more flexible, open and secure power data interactive sharing capability, rely on the Internet to create a unified external data sharing interactive service, expand the data service access mode, support the unified access of user data services from external network channels such as government enterprises, and provide unified access to data services. Data sharing security is protected to provide support for data sharing applications. The data sharing service platform proposed in this paper meets the following business requirements: First, establish access service security guarantee, unified access to the access requests of customers such as governments and enterprises in different ways, establish data interaction security encryption measures, and prevent data transmission process. security issues, and establish data service guarantee measures such as antireplay protection measures, fusing protection, and current limiting protection to ensure stable and secure data service provision; the second is to establish standardized data service demand management to provide data for customers such as governments and enterprises. Catalog query service, accept the data requirements of these customers, forward the business department for processing, and give feedback to customers on the processing of data requirements; the third is to provide unified data product access services, including data product query, data product application, and data product online access services The fourth is to provide unified data interface services, including data interface query services, data interface application, and data interface invocation services for the business systems of customers such as governments and enterprises. In addition, in order to ensure the normal operation of external power data sharing services, establish an internal data sharing management and control mechanism to support the company’s data service operation management personnel to carry out data product operation management, data service uploading and dropping off, data service authority control, data service operation monitoring and analysis and other data sharing operation management and control work. Among them, data product operation management
120
Y. Yu et al.
supports operators to carry out data catalog management and data product definition; data service de- listing supports data product release, product de-listing, and product preview; data service authority control supports operator account authority management and abnormal account management, data product access rights control, etc.; data service operation monitoring and analysis support operators to monitor data service status, user registration, etc.
2 Related Work The development of social media technology and Internet of Things (IoT) technology has resulted in a massive increase in unstructured data obtained from various sensors, cameras, digital surveillance equipment, video conferencing, and VoIP telephony. According to the International Data Corporation (IDC), this data volume will double every two years, and by 2020 we will generate and replicate 44 ZB (or 44 trillion gigabytes) of data annually. The term “big data” is used in the context of the aforementioned explosion of data to describe the enormous amount of data. Literally understood, the concept of big data seems relatively simple at first, but it is actually a relatively abstract concept, and there is no unified definition yet. In its report “Big Data: New Frontiers in Innovation, Competition, and Productivity,” the McKinsey Global Institute defines big data as a vast collection of data that is too large to capture, storage, management and analysis [7]. In addition, there is a similar definition in academia, which was proposed by Dumbill (2013) [5], “Big data refers to data that exceeds the capacity of traditional database systems, not only in huge quantities, but also at a very fast rate of change, and its structure cannot be Adapt to the existing database architecture. In order to obtain valuable information from these data, new and alternative methods must be adopted to process these data.” According to the above definition, for enterprise organizations, new analytical tools and models must be adopted to deal with the above Huge and complex data. This means that companies need to develop new capabilities to facilitate the flow of external and content information, and to turn data into strategic resources for designing, producing and delivering innovative products and services that meet new and growing customer demands. This new capability needs to properly handle the five basic characteristics of big data in order to truly tap the huge value behind big data [3]. Data sharing refers to the centralized storage of heterogeneous data from different data sources at the logical and physical levels, and the realization of unified access. By realizing data sharing, centralized management and control of resources can be more effectively achieved, and the efficiency of data utilization can be significantly improved. With the rapid development of my country’s smart grid, the growth trend of massive heterogeneous data resources in the process of production and operation is growing exponentially, but there are many problems in data sharing, such as data heterogeneous problems [2], data storage problems [9], data mining problems [8] etc. Therefore, this paper studies the problems existing in the data sharing of electric power enterprises in my country. By building a unified and university data integration platform, it aims to help electric power enterprises to effectively achieve high-quality data sharing and provide a powerful tool for electric power enterprises to improve their management and service levels. data support. The system front desk of the enterprise energy consumption data sharing and analysis platform realizes customized function development
Research on Exchange and Management Platform
121
according to the business needs of the tower company, and builds cores such as daily electricity consumption analysis, monthly electricity consumption analysis, electricity bill management, energy consumption analysis, threshold setting, and system management. functional module. The backend of the system relies on the power data center to clean the power data of multi-service systems, such as daily electricity consumption and monthly electricity consumption data in the electricity consumption collection system, user electricity bill information and user basic information in the marketing business application system, analysis and processing, complete daily electricity consumption analysis, monthly electricity consumption analysis, electricity bill management [12]. Then, through data interface, data import, data screening, and using advanced information technologies such as big data and artificial intelligence, the relevant data of the tower company’s base station energy consumption monitoring system is integrated to build a big data analysis model of tower base station energy consumption. This model is mainly used for Identify abnormal power consumption of tower base stations and realize early fault warning.
3 Proposed Architecture 3.1 Business Architecture Based on the existing business model and informatization construction results, and on the basis of information security, formulate a reasonable technical architecture and technical implementation plan, standardize the data interaction interface, and realize the data sharing and interconnection of power companies and government big data platforms. The overall business structure is shown in Fig. 1. 3.2 Application Architecture Build a unified external sharing and exchange platform for electric power data, realize the construction of supporting applications such as the full life cycle of data services, service sharing, monitoring and analysis under the premise of safe and available data services, and create a unified technical support system for enterprise data interaction and sharing, so as to realize the power supply to government users, enterprise users and social users to securely share data. The overall application architecture is shown in Fig. 2. It mainly includes the following three platforms. Data sharing and exchange application platform (external network) Upgrade the original government-enterprise data sharing extranet application, deploy a unified HUAWEI CLOUD, expand data service access methods, support omnichannel user data service unified access, protect data sharing security, and provide support for data sharing applications.
122
Y. Yu et al.
Fig. 1. The total business architecture
Research on Exchange and Management Platform
123
Fig. 2. Application architecture
Data sharing and exchange application platform (intranet)It supports internal management applications through three functions: authority account management, data management, and monitoring management. Account authority management: Through the integration with the energy big data platform, user accounts are managed, including account registration review, account authorization management, and dynamic account
124
Y. Yu et al.
management, adding security risk accounts to the blacklist. At the same time, it controls data rights, including data service rights control for users, black and white list control, data desensitization control and other data rights control processing. Data management: Unified management of data provided by the system, including data catalog registration, review, release control, and resource hooking, and support for data collection management, including database tables and custom data service management. Service access management: Manage and control the release process links of data services and data products, such as access, testing, and release, and support unified access to data services and data products. Monitoring and management: From the perspective of operators and maintenance personnel, monitor the operation of the system to ensure the orderly operation of the system. Interface management platform. The new enterprise-level interface management platform realizes the service connection with each business center, data center, and technology center through applications such as interface management, interface adaptation, service orchestration, service routing, and service monitoring, and supports the original interface service agreement. Change or less change, quickly access the data exchange sharing management application. 3.3 Technical Architecture Follow the company’s enterprise technology architecture, absorb and draw lessons from advanced architecture experience at home and abroad, based on cloud platform and data center, refer to the software architecture of Pace-Layered Application Strategy and SOLID principles. See Fig. 3 for details. These include: The IAAS layer is based on the cloud platform, providing servers, networks, storage, operating systems and firewalls, etc., and provides underlying resources and services for the deployment and operation of micro-services and front-end applications, and the company provides resources and components uniformly. The PASS layer uses relational databases and big data platforms to provide data storage and computing services for the service layer. The business service layer is composed of micro-services with different functions, providing services such as business logic processing and data processing for the application layer, and providing functions such as registration/discovery, load balancing, routing, and configuration between micro-services. 3.4 Data Architecture On the data sharing interactive application platform (external network) side, user accounts, permissions and access data are cached, and business data is not retained; on the data interactive application platform (intranet) side, management permission data and related log data are persistently retained. See Fig. 4 for details.
Research on Exchange and Management Platform
Fig. 3. Technology architecture
125
126
Y. Yu et al.
Fig. 4. The framework of data sharing and exchanging application
3.5 Integration Architecture The interface management platform realizes functions such as interface registration, interface adaptation, interface arrangement, and interface monitoring, and forms the enterprise middle platform together with the business middle platform, data middle platform, and technology middle platform. The follow-up data sharing and exchange platform as a unified channel for data interaction between power companies and the government, enterprises and social users, it is recommended to upgrade the original government-enterprise sharing applications. The data sharing and exchange platform is divided into two parts: the intranet data sharing and exchange management platform and the extranet data sharing and exchange application platform. The intranet data sharing and exchange management platform integrates with the data API services of various business middle stations and data middle stations through the enterprise-level interface management platform, and performs account authority management and service access management for data interface services, data micro-application products and external shared data, data management, monitoring management, etc. The external network data sharing and exchange application platform belongs to a sub-application under the energy big data application. The management information area and the Internet area interact through information penetration through the security strong isolation device. The external network data sharing application, on the one hand, relies on the government dedicated line channel to provide the government with data sharing services in a variety of interactive ways, including data push, interface request, etc. Through the firewall, it provides a variety of power data products for Internet users, including government users, enterprise users, and social users.
Research on Exchange and Management Platform
127
3.6 Security Architecture The data security protection system is designed and implemented around the security of the full life cycle of data. Improve data security defense-in-depth capabilities at various levels, including the application layer, data resource layer, network layer, and basic platform layer, to ensure the reliability, availability, authenticity, validity, and privacy of data and services. Please note that the first paragraph of a section or subsection is not indented. The first paragraphs that follows a table, figure, equation etc. does not have an indent, either. Subsequent paragraphs, however, are indented.
4 Conclusion Due to the continuous expansion of the government, enterprises, and social users for power data sharing applications and the rapid expansion of data service product scope and data service access business volume, along with the continuous advancement of power data sharing interactive services, it is necessary to improve the data sharing operation and control mechanism. Based on this, this paper proposes the design of the data sharing service platform, which meets four requirements: first, to establish access service security guarantee, unified access to government, enterprise and other customers’ access requests in different ways; second, to establish standardized data service requirements Management, providing data catalog query services for customers such as governments and enterprises, and accepting the data needs of these customers, forwarding them to business departments for processing, and feeding back data needs processing opinions to customers; the third is to provide unified data product access services, including data product query, Data product application, as well as data product online access services; Fourth, provide unified data interface services, including data interface query services. The platform guarantees the normal operation of external power data sharing services, establishes an internal data sharing management and control mechanism, and supports the company’s data service operation management personnel to carry out data product operation management, data service upload and removal, data service authority control, data service operation monitoring and analysis, etc. Data sharing operation management and control jobs.
References 1. Antoniu, G., Deverge, J.-F., Monnet, S.: Building fault- tolerant consistency protocols for an adaptive grid data-sharing service. PhD thesis, INRIA (2004) 2. Castano, S., De Antonellis, V.: Global viewing of heterogeneous data sources. IEEE Trans. Knowl. Data Eng. 13(2), 277–297 (2001) 3. Davenport, T.H., Patil, D.J.: Data scientist. Harvard Bus. Rev. 90(5):70–76 (2012) 4. Dong, X., Jiadi, Y., Luo, Y., Chen, Y., Xue, G., Li, M.: Achieving an effective, scalable and privacy-preserving data sharing service in cloud computing. Comput. Secur. 42, 151–164 (2014) 5. Dumbill, E.: Making sense of big data (2013)
128
Y. Yu et al.
6. Kim, S.-H., Kim, N.-K., Chung, T.-M.: Attribute relationship evaluation methodology for big data security. In: 2013 International conference on IT convergence and security (ICITCS), pp. 1–4. IEEE (2013) 7. Manyika, J., et al.: Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute (2011) 8. Data Mining. What is data mining. J. Frandâs web page at UCLA page. https://www.and erson.ucla.edu/faculty/json.frand/teacher/technologies/palace/datamining.htm. Accessed 29 May 2011 9. Piramanayagam, S.N., Chong, T.C.: Developments in data storage: materials perspective. John Wiley & Sons (2011) 10. Sun, J., Xu, G., Zhang, T., Xiong, H., Li, H., Deng, R.: Share your data carefree: an efficient, scalable and privacy-preserving data sharing service in cloud computing. IEEE Trans. Cloud Comput. (2021) 11. Yang, G., Yang, M., Salam, S., Zeng, J.: Research on protecting information security based on the method of hierarchical classification in the era of big data. J. Cybersec. 1(1), 19 (2019) 12. Zhu, J., Zhuang, E., Jian, F., Baranowski, J., Ford, A., Shen, J.: A framework-based approach to utility big data analytics. IEEE Trans. Power Syst. 31(3), 2455–2462 (2015)
Application of Deep Learning Autoencoders as Features Extractor of Diabetic Foot Ulcer Images Abbas Saad Alatrany1,2(B) , Abir Hussain1,3 , Saad S. J. Alatrany4 , and Dhiya Al-Jumaily1 1 School of Computer Science and Mathematics,
Liverpool John Moores University, Liverpool, UK [email protected] 2 University of Information Technology and Communications, Baghdad, Iraq 3 Department of Electrical Engineering, University of Sharjah, Sharjah, UAE 4 Imam Ja’afar Al-Sadiq University, Baghdad, Iraq
Abstract. Diabetic Foot Ulcer is one of the most common diabetic complications that can lead to amputation if not treated appropriately and timely. When diagnosed by professionals, diabetic foot ulcers can be extremely successful, but the diagnosis comes at a great cost. Therefore, early automated detection tools are required to help diabetic people. In the current work, we test the ability of deep learning autoencoders to extract appropriate features that can be fed to machine learning algorithms to classify normal or abnormal skin areas. The proposed model was trained and tested on 754-foot photos of healthy and diabetic ulcer-affected skin from several individuals. By benchmarking various machine learning algorithms, our extensive research and experiments showed that the features extracted from autoencoder models can generate high accuracy when passed to a support vector machine with a polynomial kernel for early diagnosis, with 0.933 and 0.939 for accuracy and F1 score, respectively. Keywords: Autoencoder · Deep learning · Machine learning · DUF · Diabetic foot ulcer
1 Introduction Diabetes mellitus, or diabetes, is a metabolic condition characterised by elevated blood sugar levels [1]. Insulin is a hormone that transports sugar from the bloodstream into cells, where it can be stored or used for energy. When you have diabetes, your body either generates inadequate insulin or is unable to adequately use the insulin it does produce. Diabetes comes in a variety of forms: Diabetes type 1 is an autoimmune illness. In the pancreas, where insulin is produced, the immune system fights and destroys cells. It’s still unclear what’s causing this onslaught. This type of diabetes affects approximately 10% of diabetics [2]. The second kind occurs when sugar builds up in the bloodstream and the body becomes insulin resistant, known as type 2 diabetes (T2D) [3]. Finally, prediabetes © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 129–140, 2022. https://doi.org/10.1007/978-3-031-13832-4_11
130
A. S. Alatrany et al.
is defined as having blood sugar levels that are higher than normal but not high enough to be diagnosed with T2D. Each form of diabetes has its own set of symptoms, causes, and treatments [4]. Diabetic peripheral neuropathy (DPN) is a serious consequence of diabetes that impairs the sensory nerve supply to the feet, causing infections, structural alterations, and the development of diabetic foot ulcers (DFUs). It affects 30–50 percent of persons with diabetes [5]. Recurrent stress over a region prone to high vertical stress is a major cause of diabetic foot ulcers in patients with DPN [6]. According to research analysing 785 million outpatient visits by diabetic patients in the United States over a six-year period, diabetic foot ulcers and associated infections are a substantial risk factor for emergency department visits and hospital admissions [7]. Machine learning (ML) is the study of computer algorithms that aid in the formulation of correct predictions and reactions in specific situations, as well as the intelligent behaviour of humans. Machine learning, in general, is about learning to generate better future conditions based on what has been learnt in the past. Machine learning is the development of programmes that allow us to analyse data from various sources by selecting relevant data and using that data to predict the behaviour of the system in similar or different scenarios [8]. Although previous studies achieved good results, machine learning performance for the early diagnosis of diabetic foot ulcers still needs to be improved. This paper is divided into several sections. Section 2 outlines recent literature. Section 3 defines the materials and methods conducted in the current work, while Sect. 4 demonstrates the results obtained from the experiment. Finally, Sect. 5 concludes the paper.
2 Related Work Goyal et al. [9] propose a novel CNN architecture named (DFUNet) that combines significant aspects of CNN design. They used convolution layers in both depth and parallel to improve the extraction of important features for DFU classification. Their idea is to decrease the number of layers in the network, and use a larger size filter to learn the feature map from the input images. Parts of the network employ layers in a parallel manner to extract concatenated features using tenfold CV as internal validation, in which the deep learning model reached an AUC of 0.96, outperforming other deep learning architectures such as Alexnet and LeNet. Das et al. [10] suggested DFU_SPNet, a deep learning model to classify healthy and unhealthy samples. Their network utilised stacked parallel layers with different kernel sizes. This variety in sizes aided in the successful learning of both global and local feature abstractions. There were 1679 image patches in the dataset, the majority of which were annotated as abnormal, and 641 were annotated as normal. The dataset was divided into 80% for training and 20% for testing. The authors trained the network with various optimizers and hyperparameters, and the best combinations reached an AUC of 0.97. For DFU detection, Yap and his team [11] employed Faster R-CNN [12]. The ML models trained using the dataset from the DFU grand challenge [13] consist of 4000 images divided equally for training and testing. All images were reduced to 640 × 480 pixels to boost the efficiency of the deep learning algorithms and to minimise processing
Application of Deep Learning Autoencoders
131
expenses. A modified R-CNN model called deformable convolution achieved the best results with an f1-score of 0.74. The paper also concludes that using ensemble technique based on several deep learning algorithms can improve the F1-Score but does not have much effect on the mean average precision. From a single foot thermograms, the detection results of six deep CNN models for categorizing the thermograms into control and diabetes groups is reported by Khandakar et al. [14]. The dataset contains 167-foot pair thermogram images from 122 diabetic individuals and 45 healthy controls. This study employed five-fold CV, with each fold separated into an 80% training and 20% testing set. The validation set was made up of 20% of the training data. DenseNet201 exceeds the other five CNN models studied, with an overall sensitivity of 94.01% for the detection of Diabetes foot ulceration. Scebba and colleagues [15] Present a deep learning model named “detect and segment” to segment clinical wound areas in images. Deep neural networks were used to determine the wound’s location and separate it from the background. The model was tested on various datasets of clinical wounds. The Matthews’ correlation coefficient of 0.85 was reported for the diabetic foot ulcer data set. The following are the aims of the present work: • To propose and implement a deep learning model by employing autoencoder, based on the idea of compress features volume for the purpose of facilitating the training process of state-of-the-art machine learning algorithms. • To enhance the classification performance of normal vs. abnormal skin areas.
3 Materials and Methods 3.1 Dataset The diabetic foot ulcer dataset used in this paper is requested from [16–18]. The data was gathered in the form of a standardised collection of colour images of diabetic foot ulcers from various individuals. The dataset includes 754 photos of diabetic patients’ feet with DFU and healthy skin from the diabetic centre at Nasiriyah Hospital in the south region of Iraq. These photographs were taken using a Samsung Galaxy Note 8 and an iPad at various brightness levels and angles. Regions of Interest (ROI) was cropped into small patches. This is a considerable area around the ulcer that comprises essential tissues from both normal and abnormal skin classes, and the patches were then annotated by a specialist. A medical professional marked the ground-truth labels in two types of normal and abnormal skin patches. A total of 1609 skin patches were collected, 542 of which were normal and 1067 of which were aberrant DFU. For computational simplicity, the patches have been rescaled to 128 × 128 pixels in the current work.
132
A. S. Alatrany et al.
3.2 Background of Machine Learning Algorithms Machine learning is a branch of artificial intelligence in which computer algorithms are used to learn from data independently. This section provides an overview of the ML algorithms used in the current work to classify healthy and unhealthy skin areas. Support Vector Machine Machine-learning algorithms based on support vector machines (SVM) are among the most accurate and robust [19]. When undertaking the two-class learning task, SVMs are used to identify the most appropriate function of classification for separating the classes in the training data. By utilizing different kernel functions, different degrees of nonlinearity and flexibility can be incorporated into the model. Support vector machines have attracted a lot of academic attention in recent years since they can be developed from advanced statistical theories and limitations on the generalisation error can be computed for them. In the medical literature, performance comparable to or better than that of other machine learning algorithms has been documented [19]. Random Forest Random Forest Regression is an ensemble approach that uses a group of decision trees to predict output for each tree, and then takes the average of all the forecasts to get the random forest model’s output prediction [20]. It employs the bagging principle, i.e., Bootstrap and Aggregation. The term “bootstrap” refers to the process of selecting random samples from a dataset and replacing them with new ones. Aggregation is the process of combining all of the predictions to obtain the final result. Bagging aids in the reduction of model overfitting [20]. Naive Bayes In its simplest form, the naive Bayes model is a Bayesian probability model. Naive Bayes classifiers are based on the principle of high independence. This indicates that the likelihood of one characteristic has no impact on the likelihood of the other. The Naive Bayes classifier makes 2n! independent assumptions given a set of n features. The classifier determines whether a mathematical mapping between a set of attributes and a set of labels is valid by evaluating a particular problem domain with n characteristics and m classifications. The classifier calculates a prior probability of each datapoint for each class. Then it classifies the datapoint as belonging to the class with the highest probability [21]. Logistic Regression A machine learning model usually used for problems that have two classes output. The classifier calculates the maximum likelihood depending on the datapoint given to the model [22]. Autoencoders An autoencoder (AE) is a type of unsupervised learning artificial neural network. AEs learn features from unlabeled datapoints automatically, and their principal falls within the field of data dimensionality reduction. A typical autoencoder is made up of three or
Application of Deep Learning Autoencoders
133
more layers: an input layer, a set of hidden layers, and a reconstruction or output layer. A shallow or simple structured autoencoder is a single hidden layer neural network that uses an encoding process to convert original data (input values) to compressed data (lower dimensionality than the original data), which is then mapped to an output layer to approximate the original data via a decoding process [23]. The encoder half derives the feature vector from the input datapoints as follows: (1) y = f we x + be whereas Eq. 2 describes the decoder half. ́ =
(
y +
)
(2)
where we and wd represent the weights of the encoder half and decoder half respectively, whereas be and bd represent the biases of encoder and decoder components. These parameters are learnt by minimizing the input–output error. The loss function can be mathematically described as: Loss Function = ||y − ý ||2
(3)
3.3 Proposed Machine Learning Model The proposed model architecture of the encoder and overall model is illustrated in Fig. 1 and Fig. 2. This architecture is proposed as a way to improve the extraction of key characteristics linked to DFU classification. The autoencoder’s value comes from the fact that it reduces noise from the input images, leaving only a high-value representation. Because the algorithms can learn the patterns in the data from a smaller selection of highvalue inputs, the performance of the machine learning algorithms can be improved. The comparison feature provided by autoencoders reduces the training time significantly, allowing the building of lighter ML models that can work efficiently on low-performance computers. The suggested encoder contains 28 layers which are divided into four blocks. The various layers types are defined as: Input layer: 128 × 128 patches of two classes: normal and abnormal. Convolutional layer: this is the first layer after the input layer, which takes the inputs and extracts the various features. In this layer, a mathematical process is performed between the input and a filter. A filter which can be of any size of the form NxN is slid over the inputs. Dot product is produced between the filter and the parts of the images in terms of filter size. A feature map is created because of this procedure, which comprises information about the pictures such as their corners and edges. This feature map is then passed on to further layers, which use it to learn various features from the input image. In the current work, a filter size of 3 × 3 is used. A layer of batch normalization and activation follows each convolutional layer.
134
A. S. Alatrany et al.
Fig. 1. The proposed encoder
Pooling layer: the primary purpose of the pooling layer is to minimise the size of the feature map produced by the convolutional layer in order to cut computational costs. There are different types of pooling procedures, and the max pooling operation is used in the current work to reduce the feature size. Activation layer: They are used to learn and estimate any type of continuous and sophisticated network variable-to-variable linkage. There are various activation functions that are regularly utilised. In our case, the ReLU activation function was used. Flatten layer: The preceding layers’ outputs are flattened and supplied to the machine learning models. The features are taken from the bottleneck of the autoencoder and supplied to state of the art machine learning models along wide the labels of each datapoint, for binary classification of either normal or abnormal skin area.
Application of Deep Learning Autoencoders
135
Fig. 2. The proposed model
4 Results 4.1 Evaluation Metrics The experimental findings are assessed using five assessment indicators. Accuracy, Precision, Recall, F1 score, and AUC are the assessment measures that are used intensively in the literature [24, 25]. The Accuracy calculation formula is as follows: Accuracy =
TP + TN TP + FP + TN + FN
The number of true positives, true negatives, false positives, and false negatives is represented as TP, TN, FP, and FN, respectively. Precision differs from Accuracy in that Precision is concerned solely with the number of positive samples projected to be positive. Precision is calculated using the following formula: Precision =
TP TP + FP
Data imbalance is a regular occurrence in the categorization of medical conditions, hence the Recall is required. The following is how the Recall is defined: Recall =
TP TP + FN
136
A. S. Alatrany et al.
The F1 score is a weighted average of both Accuracy and Precision, using the following calculation formula: F1score = 2
Precision ∗ Recall Precision + Recall
4.2 Results Figure 3 shows the area under the curve achieved by the proposed machine learning algorithms. The reported AUC ranged from 0.90 to 0.93. Both the logistic regression and Naïve Bayes had equivalent AUC measurements. Support vector machine was the second best classifier with an AUC of 0.9. Whereas SVM with Polynomial kernel achieved the best results with AUC of 0.934.
Fig. 3 AUC Plot of ML models performance
The performance results of the benchmarked ML algorithms based on the features retrieved by the autoencoder’s encoder component are shown in Table 1. All the results are in the range of 0.86 to 1. Support vector machines with a polynomial kernel performed the best in terms of accuracy and F1 score, with 0.933 and 0.939, respectively. Both Random Forest and Naïve Bayes achieved a precision measure of 1, which means that the classifiers were able to correctly classify all cases. Logistic regression reached a value of 0.921, 0.916, 0.948, and 0.922 for accuracy, precision, recall, and F1 score, respectively. While SVM with a liner kernel performed the worst, with an accuracy of 0.905. A robust model can provide the assurance needed to put a ML model into production. Simply considering only model performance and disregarding model robustness might
Application of Deep Learning Autoencoders
137
have substantial consequences, particularly in critical ML applications such as disease risk prediction. Our model requires more testing to be considered a robust method. Such as testing it on a different dataset. This has been decided as a future work for the paper. Table 1 Comparison of state-of-the-art algorithms Classifier
Accuracy
Precision
Recall
F1-score
RF
0.924
1.00
0.862
0.925
SVM
0.905
0.887
0.948
0.916
SVM_Poly
0.933
0.947
0.931
0.939
LR
0.921
0.916
0.948
0.922
NB
0.915
1.00
0.844
0.915
Examples output of the best performing machine learning model is illustrate in Fig. 4.
Fig. 4 Examples output of our proposed model
The current work was benchmarked with other models from the literature, as shown in Table 2. According to the F1 score, our proposed model comes in second place. Interestingly, the proposed model achieved the highest score of precision in comparison to its competitors. This is extremely important in the medical field, where accurate prediction of cases is critical. Despite the convergence of the results of our study with the results of other studies, our proposed model was less complex than the other proposed deep learning models. Goyal et al. [9] constructed a CNN model with 15 convolutional layers. While Das et al. [10] included 13 convolutional layers. Alzubaidi et al. [16] proposed a deep learning model with 17 convolutional layers that are arranged in both a parallel and sequential fashion. On the other hand, in the current work, the proposed
138
A. S. Alatrany et al.
model only contains 8 convolutional layers, which results in an efficient light deep learning model. Table 2 Comparison with previous work Model
Accuracy
Precision
DFUNet [9]
0.925
0.945
DFU_SPNet [10]
0.964
Recall
F1-score 0.939
0.926
0.984
0.954
CA-DetNet [11]
0.719
0.768
0.743
DFU_QUTNet [16]
0.942
0.926
0.934
0.947
0.931
0.939
Proposed model
0.933
This study demonstrates that autoencoders can be used as an effective tool for feature extraction of diabetic foot ulcer images. By comparing multiple Machine Learning models applied to predict a foot ulcer, this has given us insight into the degree to which machine learning models are able in prediction an abnormal skin areas.
5 Conclusion In this study, we used a deep learning technique as a feature extractor and subsequently trained multiple machine learning models to accurately classify healthy and DUF skin regions. The encoder component of the autoencoder architecture assisted in the extraction of key characteristics from the input images. To the best of our knowledge, this is the first time an autoencoder model has been applied to a diabetic foot ulcer classification. All of the machine learning models yielded high performance. The highest classification accuracy was attained using a support vector machine with a polynomial kernel.
References 1. Kaul, K., Tarr, J.M., Ahmad, S.I., Kohner, E.M., Chibber, R.: Introduction to diabetes mellitus. In: Ahmad, S.I. (ed.) Diabetes: An Old Disease, a New Insight, pp. 1–11. Springer New York, New York, NY (2013). https://doi.org/10.1007/978-1-4614-5441-0_1 2. Saberzadeh-Ardestani, B., et al.: Type 1 diabetes mellitus: cellular and molecular pathophysiology at a glance. Cell J. (Yakhteh) 20(3), 294 (2018) 3. Chatterjee, S., Khunti, K., Davies, M.J.: Type 2 diabetes. The Lancet 389(10085), 2239–2251 (2017) 4. Khan, R.M.M., Chua, Z.J.Y., Tan, J.C., Yang, Y., Liao, Z., Zhao, Y.: From pre-diabetes to diabetes: diagnosis, treatments and translational research. Medicina 55(9), 546 (2019) 5. Tesfaye, S.: Neuropathy in diabetes. Medicine 43(1), 26–32 (2015)
Application of Deep Learning Autoencoders
139
6. Bus, S.A., Ret al.: Footwear and offloading interventions to prevent and heal foot ulcers and reduce plantar pressure in patients with diabetes: a systematic review. Diabetes/Metabol. Res. Rev. 32, 99–118 (2016) 7. Skrepnek, G.H., Mills, J.L., Sr., Lavery, L.A., Armstrong, D.G.: Health care service and outcomes among an estimated 6.7 million ambulatory care diabetic foot cases in the US. Diabetes Care 40(7), 936–942 (2017) 8. Subasi, A.: Chapter 3 - Machine learning techniques. In: Subasi, A. (ed.) Practical Machine Learning for Data Analysis Using Python, pp. 91–202. Academic Press (2020) 9. Goyal, M., Reeves, N.D., Davison, A.K., Rajbhandari, S., Spragg, J., Yap, M.H.: DFUNet: convolutional neural networks for diabetic foot ulcer classification. IEEE Trans. Emerg. Top. Comput. Intell. 4(5), 728–739 (2020) 10. Das, S.K., Roy, P., Mishra, A.K.: DFU_SPNet: a stacked parallel convolution layers based CNN to improve Diabetic Foot Ulcer classification. ICT Express (2021) 11. Yap, M.H., et al.: Deep learning in diabetic foot ulcers detection: a comprehensive evaluation. Comput. Biol. Med. 135, 104596 (2021) 12. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Presented at the Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 1. Montreal, Canada (2015) 13. Cassidy, B., et al.: The DFUC 2020 dataset: analysis towards diabetic foot ulcer detection. TouchREVIEWS in endocrinology 17(1), 5–11 (2021) 14. Khandakar, A., et al.: A machine learning model for early detection of diabetic foot using thermogram images. Comput. Biol. Med. 137, 104838 (2021) 15. Scebba, G., et al.: Detect-and-segment: A deep learning approach to automate wound image segmentation. Inform. Med. Unlocked 29, 100884 (2022) 16. Alzubaidi, L., Fadhel, M.A., Oleiwi, S.R., Al-Shamma, O., Zhang, J.: DFU_QUTNet: diabetic foot ulcer classification using novel deep convolutional neural network. Multimedia Tools Appl. 79(21), 15655–15677 (2020) 17. Alzubaidi, L., Fadhel, M.A., Al-Shamma, O., Zhang, J., Santamaría, J., Duan, Y.: Robust application of new deep learning tools: an experimental study in medical imaging. Multimedia Tools Appl. 81(10), 13289–13317 (2022). https://doi.org/10.1007/s11042-021-10942-9 18. Alzubaidi, L., et al.: Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 10(13), 4523 (2020) 19. Orru, G., Pettersson-Yeo, W., Marquand, A.F., Sartori, G., Mechelli, A.: Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci. Biobehav. Rev. 36(4), 1140–1152 (2012) 20. Sarica, A., Cerasa, A., Quattrone, A.: Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front. Aging neurosci. 9, 329 (2017) 21. Salmi, N., Rustam, Z.: Naïve Bayes classifier models for predicting the colon cancer. In: IOP Conference Series: Materials Science and Engineering, vol. 546, no. 5, p. 052068. IOP Publishing (2019) 22. Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., Van Calster, B.: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019) 23. Lopez Pinaya, W.H., Vieira, S., Garcia-Dias, R., Mechelli, A.: Chapter 11 – Autoencoders. In: Mechelli, A., Vieira, S. (eds.) Machine Learning, pp. 193–208. Academic Press (2020)
140
A. S. Alatrany et al.
24. Alatrany, A., Hussain, A., Mustafina, J., Al-Jumeily, D.: A novel hybrid machine learning approach using deep learning for the prediction of alzheimer disease using genome data. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Premaratne, P. (eds.) ICIC 2021. LNCS (LNAI), vol. 12838, pp. 253–266. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-845322_23 25. Alatrany, A.S., Hussain, A., Jamila, M., Al-Jumeiy, D.: Stacked machine learning model for predicting alzheimer’s disease based on genetic data. In: 2021 14th International Conference on Developments in eSystems Engineering (DeSE), 7–10 Dec 2021, pp. 594–598 (2021). https://doi.org/10.1109/DeSE54285.2021.9719449.
MPCNN with Knowledge Augmentation: A Model for Chinese Text Classification Xiaozeng Zhang(B) and Ailian Fang(B) School of Computer Science and Technology, East China Normal University, Shanghai 200062, China [email protected], [email protected]
Abstract. Since TextCNN was proposed, Convolutional Neural Networks have shined in the field of NLP, especially for text classification tasks. In recent years, many models based on convolutional neural networks have been proposed, and DPCNN is one of them. It expands the receptive field by continuously deepening the number of network layers, thereby obtaining the long-distance dependence of text. However, the traditional DPCNN has problems such as single feature extraction, ignoring the importance of different semantic spaces and different ngram features. This paper improves and proposes the MPCNN model to extract grammatical features from multiple scales. Integrate different semantic spaces to build multiple pyramids. At the same time, considering the lack of contextual information and insufficient semantics in short Chinese texts, the knowledge base is used to obtain the concept set of the text. Then the existing text is enhanced by prior knowledge. Experiments on the THUCNews dataset have achieved better classification results than other mainstream classification models. Keywords: Chinese text classification · Convolutional neural network · KA-MPCNN · Knowledge augmentation
1 Introduction With the rapid development of the Internet, various network platforms have emerged in recent years. As the primary way of people’s daily communication, the amount of data has exploded. How to effectively use and manage these data has become an urgent task of Natural Language Processing. Text classification is a basic task, and its classification efficiency and accuracy directly affect downstream tasks, such as sentiment analysis, information retrieval, etc. It plays a pivotal role. There are three main methods of text classification. The first is the rule-based method: the relevant people are used to formulate rules, and then matching is performed to complete the classification. This approach is easy to understand but requires a vast knowledge base and professionals, which is expensive to build and poorly portable. The second is a method based on machine learning, which is classified by manually constructing features and machine learning algorithms. This method relies on complex feature engineering, and the quality of feature construction directly affects the accuracy of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 141–149, 2022. https://doi.org/10.1007/978-3-031-13832-4_12
142
X. Zhang and A. Fang
classification. The third method is based on deep learning, which automatically extracts text features through neural networks to complete the classification task. With the proposal of pre-trained word vector methods such as Word2vec [1] and Glove [2], deep learning has been successfully applied to the text field. Designing an efficient network architecture has gradually become the focus of research for most researchers. Typical feature extractors are RNN, CNN, etc. RNN has natural sequence features, so it is widely used in the NLP field, such as LSTM [3], GRU [4], etc., but these do not have the characteristics of CNN’s weight sharing and fewer parameters. The TextCNN [5] proposed by Kim uses different size convolution kernels to extract features for classification. Although it is a shallow network, the effect is excellent, so it has received extensive attention. Since then, convolutional neural networks have also enjoyed a place in NLP, and various network architectures have been continuously proposed. However, most of them are shallow convolutional networks, and their design is simple and efficient, so for a while, it was thought that deepening the number of network layers made little sense in the text-domain. After that, Johnson R proposed DPCNN [6]. He achieved good results through continuous convolution and pooling dimensionality reduction, and deepening the number of network layers to obtain longdistance dependencies. The VDCNN [7] proposed by Conneau et al. was also carefully designed. Its 29-layer deep network also enhances the ability to achieve good accuracy on large datasets, which proves that there is still room for imagination in deep networks. Compared with English, Chinese has rich semantics and complex part-of-speech composition. At the same time, because short texts have less vocabulary, less information, sparse features, and lack of contextual information resulting in fuzzy semantics, classification is very difficult. There have been many related studies in this area. Cao proposed cw2vec [8], which uses stroke n-grams to capture Chinese words’ semantic and word-formation information; Sun [9] uses radical dictionaries to extract features, etc. They all It is based on the original text to explore. The main contributions of this paper are as follows: 1. For the problem of single feature extraction of DPCNN, region embedding is performed from different scales, pyramids are constructed in different semantic spaces. We evaluate short texts and long texts in the THUCNews dataset. The accuracy rates are improved by 1.11% and 0.88%, respectively. 2. Given the lack of semantics of short texts, domain knowledge is added for semantic enhancement, which exceeds the original DPCNN by 1.97% on the THUCNews dataset. It is also significantly improved compared to other mainstream models. 1.1 Related Work Kalchbrenner proposed a DCNN [10] model based on a two-layer convolutional network. Through one-dimensional convolution and dynamic k-max pooling layer, the sentence is semantically modeled without changing the width of the sentence input sequence, which can solve the sentence lengthening problem. The model adopts one-dimensional wide convolution to increase sentence length, which can preserve text edge information. The pooling layer uses dynamic k-max pooling to extract the corresponding amount of semantic feature information from sentences of different lengths to ensure the unification
MPCNN with Knowledge Augmentation
143
of subsequent convolutional layers. Unlike TextCNN, DCNN does not consider the integrity of word semantics in the one-dimensional convolution process, destroying its integrity. Kim proposed a TextCNN [5] model based on a single-layer convolutional network, extracting different n-gram features through convolution kernels of different sizes, and then splicing different n-gram features through a maximum pooling layer. Unlike DCNN, which convolves partial dimensions of words, TextCNN convolves whole lines (i.e. all dimensions), preserving semantic integrity and better interpretability. Although it is a shallow network, TextCNN has demonstrated excellent performance on different datasets. Zhang [11] conducts a deeper analysis and discussion on TextCNN, and conducts detailed experiments to explore its impact by setting hyperparameters such as convolution kernels of different sizes, the number of convolution kernels, and pooling parameters. Conneau proposed a character-level deep model VDCNN [7], which uses smallscale convolution and 1/2 pooling operations to design a 29-layer deep network. While pooling reduces the feature map, the number of filters is increased, and finally, k-max pooling is used to express the situation where the same type of feature appears multiple times, which achieves better results on large data sets. Johnson proposed a word-level, efficient, deep convolutional network model DPCNN [6]. The author first uses region embedding to compress adjacent semantics, and continuously performs down-sampling and equal-length convolution with a step size of 2 on the compressed result. The feature dimension is reduced while transforming in the same semantic space, and the convolution and sampling are repeated. This process allows the feature-length to be continuously shortened and finally classified. In order to avoid the disappearance of the gradient caused by the deepening of the network, a shortcut is set up after sampling, which significantly alleviates this problem. Experimental results show that deep convolutional networks still have room for imagination in the future. However, DPCNN only extracts syntactic features from a single scale, ignoring the importance of different semantic spaces. Kang [12] started from linguistic research, explores the unique word meanings and even grammatical reasoning methods in Chinese, and proposes a large-scale analogical reasoning data set CA8, which provides the use of different representations (sparse and dense), context features (words, n-grams, characters, etc.) and Chinese word vectors trained by the corpus, including dozens of word vectors trained with corpora in various fields (Baidu Baike, Wikipedia, People’s Daily, Zhihu, Weibo, literature, finance, ancient Chinese, etc.).
2 Model This paper proposes the KA-MPCNN model, which further extends the original DPCNN, uses multiple region embeddings to compress semantics, extracts different grammatical features from different semantic spaces, and integrates the knowledge base to obtain the prior knowledge of the text, using Textual concept sets for knowledge augmentation. The model is mainly composed of two parts: knowledge augmentation and multi-pyramid (Fig. 1).
144
X. Zhang and A. Fang
Fig. 1. Model structure.
2.1 Knowledge Augmentation We use the knowledge in the news domain as the knowledge base, and the Chinese word vectors trained based on different co-occurrence information and words, characters, and triples as contextual features are used as additional input so as to obtain the prior knowledge of the text. The typical enhancement methods are summation and splicing. Although the two are simple, the effect is not satisfactory. The summation is to add the corresponding dimensions. This method loses the original text and knowledge information, and the semantic space changes. The text semantics that is originally very different may conflict after adding domain knowledge. The semantic space becomes close; the second is to splice the two dimensions and merge them through dimension expansion. Although the original information is retained, it does not pay attention to the key dimensions. We use the convolution method to enhance the knowledge of the original text, which is used as two-channel input, use a convolution kernel of size (1, 1) to fuse the information in the channel, and automatically learn the two channels by introducing learnable parameters. rather than simple addition or concatenation (Fig. 2). We denote the text word vector of a certain document as S: S = [w1 , w2 , w3 , ......, wn ]
(1)
wi = [e1 , e2 , e3 , ......, ed ]T
(2)
MPCNN with Knowledge Augmentation
145
Fig. 2. Knowledge augmentation.
n is the length of the text after word segmentation, wi is the word vector corresponding to each word having a dimension of d, and the size of S is n x d. We take the domain word vector as S knowledge : knowledge knowledge knowledge knowledge (3) S knowledge = w1 , w2 , w3 , ......, wn The output of knowledge augmentation is: conv S, S knowledge
(4)
2.2 Multi-pyramid We first convolve a text region containing multiple words with a convolution kernel of size (k, d) to generate regional embedding, compress semantics and capture more complex grammatical feature roots. Let the sequence length after region embedding is n, we say that the sequence has n regions, and each region contains its k-gram features. In order to improve the richness of the region embedding representation, the region embedding is then subjected to two equal-length convolutions, the information of each region and its left and right regions is compressed into the embedding of the region, that is, each region can be represented by higher-level semantics modified by contextual information, as shown in the following figure (Fig. 3). To capture the long-range dependencies of text, the receptive field is enlarged by stacking convolutional layers.
146
X. Zhang and A. Fang
Fig. 3. Pre-Convolution.
For images, there are gaps between low-level features and high-level features, such as “points”, “lines” to “faces”, and it is difficult to imagine the advanced process of this feature. However, compared with text, the features are much flatter, that is, from 1-g to 2-g to 3-g, which fully satisfies semantic substitution, so the number of feature maps is fixed in this process, and the process is carried out in the same semantic space. After that, it enters 1/2 pooling, continuously reducing the feature dimension, and each time the length becomes the previous 1/2, forming multiple pyramids (Fig. 4).
Fig. 4. Multiple pyramids.
MPCNN with Knowledge Augmentation
147
3 Experiment The experiment uses the THUCNews dataset, which is a total of 14 categories reintegrated and divided by Tsinghua University based on the original Sina News classification system. We select 10 categories, and each category contains 20000 documents. A total of 200000 pieces of data are divided into training set, validation set and test set according to 8:1:1. which treats all headings as short text, with an average length of 19.24, and treats all body content as long text, with an average length of 846.26. We first preprocess the collected text, use jieba for word segmentation, filter it with the stop word list provided by Harbin Institute of Technology, and then use word2vec to train word vectors. The mainstream models compared with the model in this article are: TextCNN [5], TextRNN [13], C-LSTM [14], LSTM-Attention [15], DPCNN [6], the evaluation method is accuracy, and the formula is as follows: Accuracy =
T TP
T TP + T TN + T TN + F FP + F FN
(5)
Among them, TTP is positive example which classified correctly, TTN is negative example which classified correctly, FFP is positive example which classified wrongly, FFN is negative example which classified wrongly. The loss function formula is: 1 ylogy + (1 − y)log 1 − y (6) Loss = − N N
Among them, N is the total number of samples, y is the predicted label of sample, y is the real label of sample. 3.1 Short Text Table 1. Short text result. Model name
Accuracy
TextCNN
87.90
TextRNN
86.81
C-LSTM
86.78
Bilstm-Attention
87.76
DPCNN
87.69
MPCNN
88.80
KA-MPCNN
89.66
Short texts lack contextual information due to sparse features, insufficient semantics. If only relying on the original text, the accuracy of the current mainstream models is not very ideal (Table 1).
148
X. Zhang and A. Fang
The best one is TextCNN, reaching 87.90%, which once again verifies its simplicity and efficiency, and the well-designed DPCNN can also reach 87.69%, achieving a good result. The expanded MPCNN model is 1.11% higher than the DPCNN model, and 0.8% higher than TextCNN model, because there are many convolution kernels of different sizes in the model, the features of different scales can be extracted, and the accuracy of classifying the features of different scales is significantly improved. The KA-MPCNN, the accuracy rate is as high as 89.66%, which is higher than DPCNN 1.97%. By acquiring prior knowledge of the text, the existing text is enhanced with data, and the richness of semantics is enhanced. 3.2 Long Text
Table 2. Long text result. Model name
Accuracy
TextCNN
94.64
TextRNN
93.61
C-LSTM
91.79
Bilstm-Attention
93.62
DPCNN
93.58
MPCNN
94.45
KA-MPCNN
94.21
For long texts, it can be clearly seen that the accuracy of each model has been greatly improved, because long texts contain a large amount of information and prominent features, so they are easy to extract (Table 2). TextCNN still achieves the best results, while the improved MPCNN model can outperform DPCNN by 0.87% and is close to TextCNN. However, KA-MPCNN, accuracy rate is reduced by 0.24%. This is because after the long text is divided into words, the resulting vocabulary dictionary is large. The semantics of some words are not much different from external knowledge, and some words in the data set to form the unique semantics, which leads to a decrease in accuracy. But the short text does not have this problem. Its data dictionary is small, and the semantics of words are ambiguous and insufficient. The introduction of domain knowledge can be well enhanced.
4 Conclusion For the task of Chinese news classification, this paper analyzes the problems of DPCNN, and combines with knowledge augmentation to propose the KA-MPCNN model. The
MPCNN with Knowledge Augmentation
149
model enters different semantic spaces through multiple regional embeddings, and then extracts features from different scales. We build a multi-pyramid for fusion, and use the knowledge base to obtain the prior knowledge of the text, enhances it with the text concept set, and matche with the THUCNews dataset. Compared with other mainstream models, it has obvious advantages. Due to the expansion of DPCNN, the cost of improving the accuracy is the loss of efficiency. Future work will focus on optimizing efficiency, considering the use of depth-wise separable convolution to reduce the number of parameters and verify the model’s generalization. In addition, Word2vec belongs to the traditional static word vector, and it is fixed once the training is completed, how to improve it is also the focus of the future work.
References 1. Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents, vol. 4, pp. 2931–2939. Beijing, China (2014) 2. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (2014) 3. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. Computer Science (2014) 5. Kim, Y.: Convolutional neural networks for sentence classification. Eprint Arxiv (2014) 6. Johnson, R., Tong, Z.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers (2017) 7. Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification (2017) 8. Shaosheng, C., Wei, L., Jun, Z., Xiaolong, L.: Cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information, pp. 5053–5061. New Orleans, LA, United states (2018) 9. Sun, Y., Lin, L., Tang, D., Yang, N., Ji, Z., Wang, X.: Radical-enhanced Chinese character embedding. Lect. Notes Comput. 8835, 279–286 (2014) 10. Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. Eprint Arxiv, vol. 1 (2014) 11. Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. Computer Science (2015) 12. Kang, R., Zhang, H., Hao, W., Cheng, K., Zhang, G.: Learning chinese word embeddings with words and subcharacter n-grams. IEEE Access 99, 1–1 (2019) 13. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. AAAI Press (2016) 14. Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM neural network for text classification. Comput. Sci. 1(4), 39–44 (2015) 15. Peng, Z., Wei, S., Tian, J., Qi, Z., Bo, X.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 2, Short Papers (2016)
An Improved Mobilenetv2 for Rust Detection of Angle Steel Tower Bolts Based on Small Sample Transfer Learning Zhiyu Cheng(B) , Jun Liu, and Jinfeng Zhang State Grid Anhui Electric Power Co., LTD., Hefei 230022, Anhui, China [email protected]
Abstract. Rust detection of diagonal steel tower bolts is dangerous and heavy, so it needs to be assisted by intelligent technology. In order to be applied in the equipment with limited resources such as mobile terminal or embedded terminal, based on the method of deep learning, this paper proposes a new lightweight neural network to solve the classification problem of rusty or non-rusty angle steel tower bolts. The network is improved on the basis of mobilenetv2. In order to be suitable for the training environment with small samples, the strategy of transfer learning is used. In order to make up for the loss of characteristic information caused by convolution, a double branch structure is used to better obtain characteristic information. Using the attention mechanism, the network can automatically obtain the important weight of features according to the training process, so as to better improve the separation performance. In the task of bolt rust detection, the proposed network obtains an accuracy of 97.6% and an F1 score of 96.3% while ensuring small parameters. Compared with other excellent neural networks, the proposed method has better classification performance and is suitable for outdoor environment detection. Keywords: Rust detection · Improved Mobilenetv2 · Small samples · Transfer learning
1 Introduction Angle steel tower is a large-scale device with strong bearing capacity. It plays an important supporting role in many living environments, such as electric transmission, signal transmission and so on. Angle steel tower is exposed to the natural environment for a long time, and will be corroded by rain and exposed to the sun. The tower will be deformed and rusted, which is a challenge brought by the use of diagonal steel tower [1]. Bolts play an important role in tightening the angle steel tower. The tower is mainly connected by bolts, supplemented by tower feet and hanging points [2]. The problem of one bolt may cause irreparable impact, so it is very important to detect the bolts of diagonal steel tower. The early bolt detection of angle steel tower adopts the method of manual investigation. This method requires human to climb the tower and find it in turn. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 150–161, 2022. https://doi.org/10.1007/978-3-031-13832-4_13
An Improved Mobilenetv2 for Rust Detection
151
It is not suitable for bad weather, and it is easy to cause casualties. It has the disadvantages of low efficiency and high risk. Therefore, a method of mobile terminal or UAV is needed to assist bolt maintenance. With the development of computer technology and the deepening of vision research, the method of image processing has attracted extensive attention. Machine learning based algorithms are often used in image detection tasks. Traditional machine learning methods need to use feature extractors such as LBP and hog to extract feature information with obvious discrimination, and then input it into classifiers such as SVM [3] and decision tree [4] for classification. Although the classification performance of the network is improved, the degree of manual participation is high. With the advent of Alexnet [5] model in 2012, convolutional neural network has developed rapidly and is widely used in all aspects of natural image processing, such as image recognition methods such as faster-rcnn [6] and mask RCNN [7], image classification methods such as VGG [8] and RESNET [9], and image segmentation methods such as UNET [10]. The method based on convolution neural network does not need to extract features manually, and the degree of automation is high. In recent years, the deep learning method has also been gradually applied to defect detection to promote the process of industrial automation. For example, Wang et al. [11] proposed a deep learning detection method combining deep confidence network and BP network for the two-classification experiment of solar cells with only 100 training sets. Tang et al. [12] analyzed the effect of different depth learning methods on transmission line component defect detection. Jia et al. [13] studied the application of convolutional neural network to seven faults of transformer, and the test accuracy reached 91.6%, which was more than 2% higher than that of BP neural network. These neural networks play a positive role in defect detection, greatly improve the accuracy of detection and improve the efficiency of sorting. The network deployed in the mobile terminal needs small bandwidth, low computation and low delay [14]. For example, Liu et al. [15] adopted the method of combining dynamic threshold segmentation and defect region extraction, and designed an online detection of rail binary defects with very low delay. Yao et al. [16] proposed a convolutional neural network Magnetnets with high efficiency and low delay, which has better performance in magnetic defect detection. Sun et al. [17] designed a fault detection method based on Mobilenet network to solve the problems of outdoor transmission line cracking, which is embedded in the terminal equipment and put into practical use, which has strong practical significance. Because the detection method of diagonal steel tower bolt rust not only needs to consider the embedded network parameters and computational complexity, but also has high requirements for the detection accuracy. Based on this, we propose a lightweight neural network for angle steel tower bolt rust detection based on deep learning method. As shown in Fig. 1, the lightweight network mobilenetv2 [18] is used as the basic network for improvement, and the migration learning method is used in the training environment with few bolt data sets to reduce the possibility of over fitting. On this basis, a channel attention mechanism [19] is added to select the corresponding channels for the features obtained by the basic network, give higher weights to important channels, and highlight the importance of relevant features. At the same time, a context path is added to enable the network to obtain richer feature information, make up for the loss of important information in the convolution process, and effectively improve the detection
152
Z. Cheng et al.
performance of the network. Compared with other excellent networks, the proposed network can obtain better classification results in the rust detection of angle steel tower bolts. At the same time, it has lower parameter quantity and computational complexity. It can assist the maintenance of electricians to a certain extent and plays a positive role.
Fig. 1. Proposed network framework
2 Materials and Methods 2.1 Dataset
Fig. 2. Angle steel tower bolt image
Based on the bolts of angle steel tower, this paper mainly studies the problem of efficient detection of bolt rust. There are two categories of images collected, namely, bolt rust image and bolt rust free image. As shown in Fig. 2, the image is color with different resolution. There may be multiple bolts in a picture at the same time. Some bolts are rusty and some bolts are rust free, and the degree of rust is also different. The samples are divided into two sets: training set and test set. Too small training image will appear over fitting phenomenon in training, resulting in unreasonable training results. Therefore, before training, the training image is first expanded by means of rotation, translation and cutting, and the image is added. After data enhancement, there are 1512 rust training images, 1288 rust free images and 68 test images.
An Improved Mobilenetv2 for Rust Detection
153
2.2 Model Framework 2.2.1 Mobilenetv2 Convolutional neural network has powerful feature extraction function. To a certain extent, increasing the depth and breadth of the network can make the network have stronger high-level semantic expression ability, better learning ability and better performance [20]. But at the same time, with the increase of the size of the model, it will also bring huge amount of calculation and memory occupancy [21]. For standard convolution operations, an input is given as X , X ∈ RC×H ×W , the convolution kernel size is K. After convolution, an output characteristic graph is Y ∈ RC ×h×w . Without considering the offset, the flops generated by the convolution operation are: FLOPs(conv2d ) = C × C × K × K × h × w,
(1)
where C is the number of input channels,C is convoluted output channel,H , W corresponds to the length and width of the input and output image respectively,h, w is the length and width of the output characteristic graph after convolution. It can be found that the flops of standard convolution have a multiplicative relationship with the size of the convolution kernel of the channel number kernel. When the size of the input-output channel and convolution kernel of the network is too large, it will produce a large amount of computation, which is not suitable for life scenes with limited resources. Mobilenetv2 uses deep separable convolution instead of standard convolution, and designs the network structure to reduce the amount of calculation of the model. Deep separable convolution is to integrate the standard volume into two parts. The first part is deep convolution, convolution channel by channel, and convolution operation is adopted for each channel. The second part is point convolution, which uses 1 × 1 convolution kernel for feature fusion [21]. To achieve the same characteristic map, the calculation formula of flops with depth separable convolution is as follows: FLOPs(deconv) = h × w × (K × K × C + C × C )
(2)
At this time, a multiplication operation is decomposed into two addition operations, and the amount of calculation is greatly reduced. In order to further improve the feature expression ability of the network, mobilenetv2 adopts the linear bottleneck to enhance the feature expression of the bottom layer, and the inverse residual structure makes the network more suitable for the situation with few channels, so that the network has high classification performance. Mobilenetv2 is also often used as a basic network in classification, segmentation and target detection. 2.2.2 Feature Extraction Module In order to obtain a lightweight network suitable for the classification of angle steel tower bolt data set, this paper proposes a new model structure based on mobilenetv2, which makes the framework suitable for a small number of samples, obtains better classification performance, and speeds up the process of angle steel tower bolt detection automation. As shown in Fig. 1, the network structure proposed based on the angle steel tower bolt data set is shown. The structure is a double branch structure. Branch 1 uses some layers
154
Z. Cheng et al.
of pre trained mobilenetv2 as the extractor of the underlying features, and adds a channel attention mechanism on the basis of the basic network, so that the network can select the feature information with higher discrimination according to the characteristics of the image to make the features more effective. The pooling layer can remove unimportant redundant feature information, reduce the parameter quantity and network complexity of the model, and reduce the occurrence of over fitting phenomenon. At the same time, the pooling layer can also maintain the translation, rotation and scale invariance of the model [22]. Because the rusty areas are scattered and some rusty areas are small, in order to avoid the loss of differentiated areas in convolution, the pool layer is used to reduce the dimension of the image, expand the receptive field, and reduce the parameters of the model, and then the 1 × 1 convolution operation is used to project to the high-dimensional space. After the channel superposition of the two parts of features, the convolution operation is used to extract the high-level semantics. Finally, the full connection layer is used to integrate all the extracted feature information [23] to obtain the final classification structure. The parameter settings related to the whole feature extraction are shown in Table 1. Bottleneck is the basic module in mobilenetv2, and different step sizes have different structural designs. 2.2.3 Channel Attention Mechanism In convolution neural network, convolution operation can improve the performance of the network, and the channel attention mechanism can also realize this function [24]. In order to obtain the important factors of different channels, the attention mechanism similar to see [25] network is adopted to compress from the spatial dimension and integrate the information of the spatial dimension. The attention mechanism mainly includes two parts, as shown in Fig. 3, one is compression and the other is stimulation. In the compression part, the global pooling layer is used for operation. For the basic network, the output result is F ∈ RC×H ×W , The characteristic diagram obtained after compression is: Z ∈ FSq (F) =
W H 1 F(i, j), H ×W
(3)
i=1 j=1
where Z ∈ RC×1×1 , The information distribution of C characteristic graphs is obtained [25]. In the excitation part, different from se, one convolution layer is used to replace the two fully connected layers in se to increase the learnable weight, and the result is: S = Fex (Z) = σ (δ(w1 x + b)),
(4)
where δ represents the ReLU function,σ represents sigmoid function. This method can further improve the performance of attention mechanism. Then, the feature map is multiplied by the weight, so that each channel can obtain different importance information to realize the weight Highlight the function of the point area. The results are shown as F = F · S, F ∈ RC×H ×W .
(5)
An Improved Mobilenetv2 for Rust Detection
155
Finally, the feature map obtained after the selection of attention mechanism is superimposed with the features of the second branch. Table 1. Feature proposing network Branch
Input size
Operate type
Step
Branch1
64 × 64 × 3
Conv2d
2
32
32 × 32 × 32
Bottleneck
1
16
32 × 32 × 16
Bottleneck
2
24
16 × 16 × 24
Bottleneck
2
32
8 × 8 × 32
Bottleneck
2
64
4 × 4 × 64
Bottleneck
1
96
4 × 4 × 96
Bottleneck
2
160
2 × 2 × 160
Bottleneck
1
320
Branch2
Output channel
2 × 2 × 320
Attention
1
320
32 × 32 × 3
avgpool
2
64
16 × 16 × 64
avgpool
2
64
8 × 8 × 64
Conv2d
2
64
4 × 4 × 64
avgpool
2
64
2 × 2 × 64
avgpool
2
64
2 × 2 × 384
Conv2d
1
160
Fig. 3. Channel attention mechanism
3 Experimental Analysis 3.1 Related Parameter Settings This method is based on the python framework and written in Python 3 language. The main running environment is windows. The GPU is Geforce GTX 1080Ti of NVIDIA.
156
Z. Cheng et al.
SGD is used as the optimizer in the training. The learning rate is set to 0.0001, the value of momentum is 0.95, the number of iterations is set to 60, and the number of batch processing of the training set is 2. Transfer learning is used to initialize some layers of the network, and then the whole network is trained to obtain better classification results. 3.2 Loss Function and Evaluation Index In the training, in order to better calculate the difference between the predicted value and the real value, the cross-entropy loss is used as the calculation method, and the formula of the function is: c−1 1 n (− (6) Z˙ i,j logZi,j ) losscls = i=1 j=0 n where n is the size of the batch, C indicates the number of types of defects, Z˙ indicates true value, Z indicates predictive value. In order to better measure the advantages of the proposed method, this paper sets up a variety of evaluation indicators, in which the formula of accuracy is: Accuracy =
TP + TN TP + FN + FP + TN
The calculation of F1 score is as 2×P×R TP TP F1 = , where P = , R= , R+R TP + FP TP + FN
(7)
(8)
where TP indicates that the predicted is rusty, and the real label is also rusty bolt. TN indicates that the predicted is stainless, and the real label is also stainless bolt. FN indicates that the predicted is rust free and the real label is rusty bolt. FP indicates that the predicted is rusty, and the real label is stainless bolt. Through the accuracy rate, the proportion of the number of correct predictions in the whole test sample can be measured. The F1 score can be used to measure the equilibrium advantage of the precision and recall of the proposed network. In addition, in order to measure the lightness of the network, the flops index is used to compare with other networks. 3.3 Optimizer Selection In the process of back propagation, it is necessary to update and calculate the learnable parameters, so that the predicted value of supervised learning is infinitely close to the real value, and the optimizer can realize this function. Different optimizers adopt different calculation methods. We compare the test results of the four optimizers, as shown in Fig. 4. It can be found that when using the SGD optimizer with momentum, the network can obtain better classification performance, and the SGD has the function of regularization, which is suitable for the process of small sample training, with an accuracy of 97.06%, which is better than the other three optimizers. As shown in Fig. 4, the F1 score of SGD is also the highest, It has good duplicate check rate and recall rate. Adam is improved on the basis of Rmsprop. When the accuracy of the two is the same, the former has a better F1 score.
An Improved Mobilenetv2 for Rust Detection
157
1.2
0.9706 0.963
1
0.9412 0.926
0.9412 0.923 0.8235 0.786
0.8 0.6 0.4 0.2 0 SGD
Adam Accuracy
RMSprop
Adadelta
F1 score
Fig. 4. Comparison of different optimizers in terms of Accuracy and F1 score
3.4 Ablation Experiment In order to verify the function of each module, the Ablation Experiment of the model is done, as shown in Table 2. When only the basic network is used, the accuracy is 95.59%, indicating that mobilenetv2 itself has strong feature extraction ability, and the pre trained mobilenetv2 can be well migrated to other data sets. When another branch is added on this basis, that is, the combination of pooling layer and convolution layer is used to reduce the feature scale and change the channel, which can not only realize the function of making up information, but also produce less parameters and flops. The accuracy obtained after the increase is the same as that of the basic network, but reaches this effect earlier. The method proposed in this paper has better classification performance, with an accuracy of 97.06%, 1.47% higher than the basic network, and the F1 score can reach 96.3%. Table 2. Ablation experiment Methods
Backbone √
Second path
√
√
√
√
Attention
√
Accuracy
Precision
Recall
F1-score
0.9559
0.929
0.963
0.946
0.9559
0.929
0.963
0.946
0.9706
0.963
0.963
0.963
3.5 Test Curve In order to more accurately observe the difference between the classical network and our network, as shown in Fig. 5, the change curve of the test results of the classical
158
Z. Cheng et al.
lightweight network is shown. According to the change of the loss curve, 60 iterations are set. It can be found that the method proposed in this paper can reach the test optimal value faster.
Fig. 5. Classification accuracy of different lightweight networks
3.6 Comparative Experiment In order to highlight the effectiveness and superiority of the proposed method, it is compared with other classical networks. The results are shown in Table 3. Due to the shallow network and only 8 layers, the Alex net model has limited ability to extract image features. The accuracy of bolt image is only 86.76% and the accuracy is low, so there is still much room for improvement. Vgg16 starts from the depth of the network, improves the classification performance by deepening the number of layers of the network model, and adopts the batch specification layer to forcibly standardize the input distribution of the previous layer to the standard normal distribution with mean value of 0 and variance of 1, so that the input value of the nonlinear transformation function falls into the area sensitive to the input, so as to avoid the problem of gradient disappearance, so that the network of each layer can learn itself and accelerate the learning of the network, At the same time, it has regularization effect and can prevent gradient explosion. The accuracy obtained is much higher than that of Alexnet and has strong feature extraction ability. It is often used as a pre training network in other tasks, but increasing the number of convolutions will increase the amount of calculation and memory occupancy, which is not suitable for the condition of limited resources. Googlenet [26] network from the breadth of the model, and further compress the occupied space of the model. Using inception structure can obtain multi-scale feature information, and this method can also obtain better results. Squeezenet [27], Shufflenet [28] and Mobilenet are improved from convolution operation to compress the model. Squeezenet network uses global pooling layer instead of full connection layer as category output layer, which can further reduce the amount of calculation of the model. Compared with other networks, the proposed method has the best accuracy and F1 value, and is more suitable for the rust detection of angle steel tower bolts to a certain extent.
An Improved Mobilenetv2 for Rust Detection
159
Table 3. Comparative experiment Methods
Accuracy
Precision
Recall
F1-score
Alexnet
0.8676
0.875
0.778
0.824
VGG16
0.9559
0.962
0.926
0.944
GoogleNet
0.9412
0.96
0.889
0.923
SqueezeeNet
0.9559
0.962
0.926
0.944
ShuffleNetv2
0.9412
0.926
0.926
0.926
MobileNetv2
0.9559
0.929
0.963
0.946
Ours
0.9706
0.963
0.963
0.963
3.7 Comparison of Flops in Different Networks Flops is an index to measure the lightness of the network. When the value of flops is larger, it shows that the complexity of the model is higher and more computation is needed. As shown in Fig. 6, it shows the comparison of flops of different models, in which the abscissa is the flops value and the ordinate is the F1 score. It can be seen from the abscissa that AlexNet and Google net networks adopt standard convolution and have high flops, while other networks adopt model compression, so they have low flops. The flops of the network proposed in this paper are slightly higher than shufflenetv2, but lower than mobilenetv2 and SqueezeNet. From the vertical coordinate observation, it can be found that the method in this paper has the highest F1 score, which is much higher than the value of shufflenetv2. In general, the proposed method has a better balance in bolt detection accuracy and light weight, and is more suitable for bolt rust detection.
Fig. 6. Comparison of flops values of different models
160
Z. Cheng et al.
4 Conclusion In order to solve the danger of bolt rust detection of electrical angle steel tower, a bolt detection network suitable for mobile terminal and embedded application is proposed. The network is improved based on mobilenetv2. The idea of migration learning is applied to the situation of small bolt samples, and the network is fully trained to improve the accuracy of the network. In order to obtain more abundant local defect information and prevent the loss of important features, a double branch structure is adopted, and the attention mechanism is used in the network to highlight the important distinguishing features. The proposed flops network has good detection accuracy and low concentration of data. Compared with other excellent convolutional neural networks, the proposed method has more advantages. Acknowledgement. This work was supported by the State Grid Anhui Electric Power Co., Ltd. (No. 52120019007G).
References 1. Chen, X., Liu, Y., Xie, Y., et al.: Defect analysis of power grid angle steel tower. Compreh. Corrosion Control 2017(3) (2017) 2. Wang, Z., Lu, W., He, X., et al.: Analysis on welding quality of angle steel tower of transmission line. Qinghai Electric Power 36(002), 36–40 (2017) 3. Zhang, F.Y., Liu, R.J.: Study on the parts surface defect detection method based on Modifed SVM algorithm. Appl. Mech. Mater. 541–542, 1447–1451 (2014) 4. Aghdam, S.R., Amid, E., Imani, M.F.: A fast method of steel surface defect detection using decision trees applied to LBP based features. Industrial Electronics & Applications. IEEE (2012) 5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012) 6. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 7. He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. IEEE (2017) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, pp. 1150–1210. San Diego (2015) 9. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition, pp. 770–778. IEEE (2016) 10. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. Springer, Cham (2015) 11. Xianbao, W., Jie, L., Minghai, Y., et al.: Solar cell surface defect detection method based on deep learning method. Patt. Recogn. Artif. Intell. 000(006), 517–523 (2014) 12. Yong, T., Jun, H., Wenli, W., et al.: Application of deep learning in component identification and defect detection of transmission line research. Electron. Measure. Technol. 041(006), 60–65 (2018) 13. Jia, J., Yu, T., Wu, Z., et al.: Transformer fault diagnosis method based on convolutional neural network. Electric Measure. Inst. 13(v.54; no.664), 69–74 (2017)
An Improved Mobilenetv2 for Rust Detection
161
14. Hu, T., Zhu, Y., Tian, L., et al.: Lightweight convolutional neural network architecture for mobile platform. Calc. Mech. Eng. 045(001), 17–22 (2019) 15. Ze, L., Wei, W., Ping, W.: Design of machine vision system for rail surface defect detection. Electron. Measure. J. Instrum. 11, 1012–1017 (2010) 16. Yao, M., Yang, Z.: Research on real-time defect detection method based on lightweight convolutional neural network. Calc. Mach. Measure. Control 027(006), 22–25, 40 (2019) 17. Sun, C., Lu, Y., Chang, H., et al.: Transmission line fault identification method based on neural network. Sci. Technol. Eng. 019(020), 283–288 (2019) 18. Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals andlinear bottlenecks: mobile networks for classification, detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE (2018) 19. Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. IEEE (2018) 20. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang, N.: BiSeNet: Bilateral segmentation network for real-time semantic segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 334–349. Springer, Cham (2018). https://doi.org/ 10.1007/978-3-030-01261-8_20 21. Zhou, F., Jin, L., Dong, J.: A review of convolutional neural networks. J. Comput. Sci. 6 (2017) 22. Daohui, G., Hongsheng, L., Liang, Z., Ruyi, L., Peiyi, S.: Miao qiguang lightweight neural network architecture synthesis description. J. Software 31(09), 7–33 (2020) 23. Yuan, M., Zhou, C., Huang, H., et al.: Overview of convolution neural network pooling methods. Software Eng. Appl. Use 9(5), 13 (2020) 24. Sainath, T.N., Mohamed, A., et al.: Deep convolutional neural networks for LVCSR. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8614–8618. IEEE, Vancouver, Canada (2013) 25. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1 26. Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. In: Proceedings of the IEEE/CVF Conference on Computer vision and Pattern Recognition, pp. 7132–7141. IEEE (2018) 27. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015) 28. Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and =2015), selecting the most recent works; after these checks the resulting number was equal to 123. Moreover, others 9 papers found on Google Scholar have been added, reaching the total number of 132 papers. The resulting papers distribution according to years is plotted in the Fig. 1, showing an increase of research works in this context in the last years.
Fig. 1. Number of publications related to Big Data and Distributed Deep Learning from 2015 to 2021
A Systematic Review of Distributed Deep Learning Frameworks
245
Then we manually removed 46 papers filtering by title, 18 papers filtering by abstract and others 38 papers, filtering by content. The research methodology process is summarized in the Fig. 2:
Fig. 2. Research methodology
3 Background 3.1 Distributed Deep Learning Distributed Deep Learning refers to a set of techniques and algorithms used to distribute the training of Deep Learning models across multiple hardware architectures. In this context a distinction should be made between the levels of parallelization, i.e., at SingleMachine level and at Cluster level (or Multiple-Machines level). In the first case, if the machine uses two or more processors, through multiprocessing it is possible to handle several processes on it. Moreover, for different ML or DL tasks, with specific frameworks it is also possible to distribute the workload across multiple Central Processing Units (CPUs) or Graphics Processing Units (GPUs) or across both CPUs and GPUs on the same machine, in order to speed up the process. On the other hand, training complex models using BD is a computationally intensive task and often a single machine is not enough to complete the task. In this scenario it is common to face two main problems: – The computational resources are not sufficient to handle the training process, especially the memory requested by the process due to the complexity of the model. – The training process cannot be completed in a desired time period, owing to model complexity and data used for the training. In these cases, the parallelization at Multiple-Machine level comes into play. As the name suggests, with this method the computation is distributed among multiple
246
F. Berloco et al.
machines connected to the same network. As we will see in the next sections, with this method it is possible to reduce the computational load by distributing the data and models between machines connected to the same network. However, this approach makes the processes management more complex, due to the handling of them both at single machine level and cluster level, by configuring properly the network and using a cluster manager. Furthermore, due to the messages exchanging between nodes, there are other aspects related to the network that cannot be ignored as network latency, bandwidth, communication management and so on. 3.2 Data Parallelism and Model Parallelism As mentioned before, for training process two techniques can be applied in order to reduce the computational load on the single machine: Data Parallelism and Model Parallelism. Data Parallelism Introduced by Hillis and Guy [12], Data Parallelism method deals with the problem of complexity and size of data used during the training process by partitioning them across different processors. Specifically, if there is a cluster with N machines, the data are partitioned in N batches and the model is replicated N times (see Fig. 3). Then both model and data batches are distributed across cluster machines and each model replica is trained locally on each subset of data (of size equal to 1/N).
Fig. 3. Data parallelism across three machines. The data are partitioned in three batches and each machine holds a replica of the same original model, so Model1 = Model2 = Model3
It should be noticed that each training is not performed in the same time period (especially for clusters with heterogeneous hardware) and this introduces a problem during the computation of Gradient Descent, in particular with the Stochastic Gradient Descent [13]. In these cases, it is computed the so-called Asynchronous Stochastic Gradient Descent [14]. Moreover, according to [15] there is a trade-off between the utilization (i.e., hardware efficiency) and generalization (i.e. statistical accuracy of the model), depending on batch size. In general, the utilization is maximized by increasing
A Systematic Review of Distributed Deep Learning Frameworks
247
the batch size, but the convergence of the SGD is not guaranteed with a large batch size (or it can be very slow) [16]. Model Parallelism Differently from the data parallelism, this approach aims at distributing the layers of the model across multiple devices (CPUs, GPUs or both). Set a number of parts in which to split the model (generally equal to the number of GPUs/CPUs available), each part is sent to the related device and the training is performed sequentially between the layers in different devices (see Fig. 4). In this case the batch is shared between all layers in all sub-models, so there is not the problem of utilization and generalization trade-off. On the other hand, model parallelism approach introduces different drawbacks: • Splitting of the model while guaranteeing the load balance between machines is challenging. • When a device is working the remaining ones are idle. • Since the layers of a Deep Neural Network (DNN) are strictly dependent, the communication during forward and backward propagation between sequential layers in different devices is complex, leading to a worsening of performance.
Fig. 4. Model parallelism with four GPUs. The dataset is shared between each GPU that holds a sub-set of model layers.
Another strategy is the Hybrid Parallelism, which consists in the combination of Data Parallelism and Model Parallelism in order to overcome the problems related to the two strategies. The work in [15], describes different implementations of such parallelization strategy [17, 18]. 3.3 Parameters Update Sharing and Communication In DDL tasks the updating of parameters sharing (or the communication, in general) relies on the network architecture, in particular on the presence of a central entity called
248
F. Berloco et al.
Parameter Server [19]; in its absence, the communication takes place through a decentralized technique, in general the All Reduce [20] technique or its variants. The choice among these approaches depends on several factors, such as network topology, bandwidth, and latency. Parameter Server For centralized networks and for data parallelism approach, the choice of the communication strategy between workers often falls to the Parameter Server (PS). The PS can be seen as an entity in charge for computing the global gradient during the training processes, by averaging the single gradients coming from each worker. This reduces both the computational load on each worker and the network load by reducing the information sent/received by each node, with respect to All Reduce. However, it is important to specify that the PS is a logical concept and can be configured in different ways. In the simplest case the PS is a central actor that receives the gradients from all workers, computes the average and sends it to all workers as shown in Fig. 5.
Fig. 5. Centralized Parameter Server; the weight update is the average of all update weights coming from all workers.
But the PS can be composed by two or more different actors, as the case of Hierarchical Parameter Server (HPS) [21]. However, with this approach introduces different drawbacks: • It suffers from possible bottlenecks in the network related to the network bandwidth available; this can become a huge problem when the number of workers is very high. • In case of synchronous computation of the general gradient, the performance depends on the slowest machine; on the other hand, an asynchronous approach can cause slow convergence of SGD.
A Systematic Review of Distributed Deep Learning Frameworks
249
All Reduce The All Reduce technique consists in a distributed approach for the parameter update sharing. Differently from PS, instead of having a single entity for the global gradient computation, the aggregation task can be distributed across all machines.
Fig. 6. All Reduce algorithm: all nodes are connected to each other (step 1), they compute and then exchange their updates (step 2). Next each node aggregates the updates (step 3, S1 is the aggregation of all w1 in machine 1 and so on) and exchanges the result with the other ones.
In particular, each machine sends a subset of its parameters to the other ones. Then, it is computed the aggregation of the parameters shared and each result is shared between all machines in the cluster (see the Fig. 6). However, the network load is increased due to the exchange of parameters between all nodes. An optimization is given by the Ring All Reduce, in which a node communicates and exchanges information only with its neighbors in a cyclical way, creating a ring communication. The exchange of parameters is carried on until each node contains all parameters needed for the aggregation. Then every node performs the aggregation and sends the results to its neighbors. The last step consists in the sharing of the aggregation results between all nodes. Another possible variant of the All Reduce algorithm is the Tree-based All Reduce [22].
250
F. Berloco et al.
3.4 Synchronous and Asynchronous Communication As mentioned before, the exchange of parameters can take place in two different ways: Synchronous and Asynchronous. Synchronous Communication For PS configuration, every node sends the parameter update to the PS that waits for all updates from all nodes. Once it receives all updates it computes the new parameters and sends them to all nodes. For the All Reduce approach the procedure is similar, but every node waits for all updates from the other ones. It is obvious that the general performance depend on the slowest worker and the other nodes are in idle until they receive the last update. Asynchronous Communication In order to mitigate the problem of the synchronous communication approach it is possible to use an asynchronous communication mode. For this purpose the HOGWILD! algorithm has been developed [23]. According to HOGWILD!, the dataset is partitioned in N equal partitions (equal to the number of workers) and a general model is created a stored into a shared memory accessible by all workers. During the computation, every worker computes the gradient for every batch and writes it into the shared model without blocking the other workers. All workers can retrieve the parameters updated from the general model when they need it. This approach is used for centralized architecture, such as PS. Moreover, for distributed architectures HOGWILD++ has been developed. A token containing the model is created and during the training it is passed from one worker to its neighbor, creating a ring topology (Ring All Reduce). Once it is passed to the next worker, the difference between the parameters contained into the token and the parameters of the model of the worker is computed. Then the token is passed to the next worker and the process is repeated until the end of the training process. Another approach is the Stale Synchronous Parallelism [24] in which each worker computes the gradient independently and when a node reaches a maximum staleness pre-defined, the global synchronization of the gradient is performed for all nodes.
4 Distributed Deep Learning Frameworks In this section, different DDL Frameworks are presented and classified according to their characteristics and capabilities. The frameworks examined have been selected from the most used ones and the ones present in the literature; in this review some of them are just cited and not analyzed in deeply, due to the lack of related documentation. The frameworks considered are listed in the Table 2.
A Systematic Review of Distributed Deep Learning Frameworks
251
Table 2. List of frameworks for Distributed Machine Learning and Deep Learning examined Framework
Work
PyTorch
[25]
TensorFlow 2
[26]
Caffe2
[27]
ChainerMN
[28]
BigDL2.0
[29]
SINGA
[30]
Elephas
[31]
TensorFlowOnSpark
[32]
Amazon Sage Maker
[33]
OneFlow
[34]
Horovod
[35]
SDDLF
[36]
TFSM
[37]
ShmCaffe
[38]
Another retrieved framework Qubole is not discussed due to the lack of technical aspects relevant for this review in the documentation. We did not include Keras because it is built on top of TensorFlow. Moreover, Caffe2 is deprecated, and it has been integrated into PyTorch but it is still available and some frameworks are based on this library. 4.1 Classification Criteria The classification criteria considered for this review are the following: 1. Supported Hardware, if the framework supports CPU computation, GPU computation or both. 2. Parallelization mode, i.e., if the framework supports the model parallelism, data parallelism or both. 3. Parameters Update Sharing mode, i.e., if the framework allows the use of the Parameter Server or supports a decentralized approach (e.g., All Reduce, Ring All Reduce, and so on). 4. SGD Computation mode, i.e., if the framework follows an Asynchronous or Synchronous approach. 5. Framework compatibility, i.e., if the framework works in standalone mode or it can work with other frameworks (such as Apache Spark).
252
F. Berloco et al.
4.2 Discussion The examined criteria concern the supported hardware (GPU/CPU), Parallelization mode, Parameters Update Sharing mode (Parameter Server or decentralized approach) and SGD Computation (Asynchronous or Synchronous approach). The Criterion 1 is very important, especially for clusters of heterogeneous hardware. At the moment, only SDDLF and BigDL 2.0 do not support GPU computation, but they probably will in the next versions. As regards the parallelization mode, all frameworks support data parallelism and only Caffe2, BigDL2.0, Elephas, TFSM and ShmCaffe do not support model parallelism. The Parameters Update Sharing mode, (Criterion 3) used by the most of frameworks is a centralized approach with PS. Finally, both SGD computation modes are supported by most of the frameworks, in particular Caffe2, ChainerMN and BigDL2.0 do not support Asynchronous SGD computation and SDDLF and TFSM do not support the Synchronous SGD computation. The comparison of all functionalities and capabilities of each framework just discussed is summarized in the Table 3. The framework compatibility is related to the ability of working in standalone mode (so with its back end) or using an external computation engine (such as Apache Spark) or using other frameworks. We chose to specify Apache Spark because it is one of the most popular engines for Big Data processing. The results of comparison according to this criterion is shown in the Table 4. Table 3. DDL frameworks classification according to framework compatibility
Framework
Framework Compatibility Stand Alone Apache Spark Others
PyTorch
X
TensorFlow 2
X
Caffe2 ChainerMN
X X
BigDL 2.0
X
Elephas
X
X
X
X
SINGA
X
TensorFlowOnSpark OneFlow
X
Horovod
X
SDDLF
X
Amazon Sage Maker TFSM ShmCaffe
X X X
253 A Systematic Review of Distributed Deep Learning Frameworks
Table 4. Framework comparison according to parallelism modalities, hardware support, parameter sharing modalities and Stochastic Gradient Descent Computation
Parallelization Parameter Server
X
Decentralized
X
Asynchronous
X
X
Synchronous
SGD Computation
GPU
X
X
Parameter Sharing
CPU
X
X
Modality
Data
X
X
Modality
Model
X X
Hardware Supported
Framework X X
Modality
PyTorch
X
X
X
X
TensorFlow 2
X
X
X
X
X
X
Caffe2
X
X
X
X
X X
X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
ChainerMN BigDL 2.0
X
X
X
Elephas
X
X
X
X
SINGA
X
X
X
TensorFlowOnSpark
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
OneFlow
X
X
X
X
X X
X
X
X
Horovod
X
X
SDDLF TFSM Amazon Sage Maker ShmCaffe
254
F. Berloco et al.
We can observe that half of frameworks analyzed can work in standalone mode. ShmCaffe and TensorFlowOnSpark, as the names suggest, are based respectively on Caffe and TensorFlow and Spark. BigDL2.0 (that includes BigDL and AnalyticsZoo) offers different solution to distribute the computation of DL for Big Data, by using DLlib, a distributed deep learning library for Apache Spark, and Orca context that scales out TensorFlow and PyTorch pipelines for distributed big data. Elephas is an extension of Keras, that allows for the distribution of workload by using Spark. Horovod framework can work with TensorFlow, Keras, PyTorch, and Apache MXNet in order to facilitate the distribution of DL workload. Finally, Amazon Sage Maker supports TensorFlow, PyTorch, MXNet, HuggingFace and Scikit-Learn.
5 Conclusion In this work we provided a review of Distributed Deep Learning Frameworks for Big Data analytics. We started by introducing two main problems present in the context of DL and Big Data analytics and then we analyzed the methods and approaches to face them. To accomplish this task, different queries have been performed on SCOPUS and Google Scholar and, starting from them, 14 works have been selected including the ones related to DDL frameworks and the ones related to parallelization and distribution algorithms and techniques. After examining the main theoretical aspects behind DDL, we selected 14 frameworks and we compared them according to the following criteria: Supported Hardware (GPU/CPU), Parallelization Mode (model or data), Parameters Update Sharing mode (Parameter Server or decentralized approach), SGD Computation (Asynchronous or Synchronous approach), and Framework compatibility. As we can see many frameworks support the main functionalities requested by DDL context. To the best of our knowledge, no other work in the literature shows and compares the difference of capabilities between DDL frameworks. Our work can be an useful guide and starting point in the choice of the right framework to developers address specific needs. As future work, an interesting topic can be the comparison of these frameworks according to computational performances, like computational resources used, model training and inference times by scaling them from one to multiple machines.
References 1. De Mauro, A., Greco, M., Grimaldi, M.: A formal definition of Big Data based on its essential features. Libr. Rev. 65(3), 122–135 (2016). https://doi.org/10.1108/LR-06-2015-0061 2. Gupta, D., Rani, R.: A study of big data evolution and research challenges 45(3), 322–340 (2018). https://doi.org/10.1177/0165551518789880 3. Apache Software Foundation: Apache Hadoop (2010). https://hadoop.apache.org 4. Joydeep, S.S., Thusoo, A.: Apache Hive (2011). https://hive.apache.org/ 5. L. AMP and A. S. Foundation: Apache Spark (2014). https://spark.apache.org/ 6. Backtype and Twitter: Apache Storm (2011). https://storm.apache.org/ 7. Apache Software Foundation: Apache Flink. https://flink.apache.org/
A Systematic Review of Distributed Deep Learning Frameworks
255
8. Goldstein, I., Spatt, C.S., Ye, M.: Big data in finance. Rev. Financ. Stud. 34(7), 3213–3225 (2021). https://doi.org/10.1093/RFS/HHAB038 9. Cui, Y., Kara, S., Chan, K.C.: Manufacturing big data ecosystem: a systematic literature review. Robot. Comput. Integr. Manuf. 62, 101861 (2020). https://doi.org/10.1016/J.RCIM. 2019.101861 10. Carnimeo, L., et al.: Proposal of a health care network based on big data analytics for PDs. J. Eng. 2019(6), 4603–4611 (2019). https://doi.org/10.1049/JOE.2018.5142 11. Buongiorno, D., et al.: Deep learning for processing electromyographic signals: a taxonomybased survey. Neurocomputing 452, 549–565 (2021). https://doi.org/10.1016/J.NEUCOM. 2020.06.139 12. Hillis, W.D., Steele, G.L.: Data parallel algorithms. Commun. ACM 29(12), 1170–1183 (1986). https://doi.org/10.1145/7902.7903 13. Gardner, W.A.: Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 6(2), 113–133 (1984). https://doi.org/10.1016/ 0165-1684(84)90013-6 14. Zheng, S., et al.: Asynchronous stochastic gradient descent with delay compensation (2017) 15. Ben-Nun, T., Hoefler, T.: Demystifying parallel and distributed deep learning. ACM Comput. Surv. 52(4) (2019). https://doi.org/10.1145/3320060 16. Goyal, P., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour, June 2017. https://doi.org/10.48550/arxiv.1706.02677 17. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, vol. 2, pp. 1223–1231 (2012) 18. Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project Adam: building an efficient and scalable deep learning training system. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2014), pp. 571–582 (2014). https://www.usenix. org/conference/osdi14/technical-sessions/presentation/chilimbi 19. Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 583– 598 (2014) 20. Patarasuk, P., Yuan, X.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009). https://doi.org/10.1016/j.jpdc.2008. 09.002 21. Zhao, W., et al.: Distributed hierarchical GPU parameter server for massive scale deep learning ads systems, March 2020. https://doi.org/10.48550/arxiv.2003.05622 22. Yang, C., Amazon, A.W.S.: Tree-based Allreduce communication on MXNet. Technical report (2018) 23. Niu, F., Recht, B., Ré, C., Wright, S.J.: HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent. In: Advances in Neural Information Processing Systems 24, 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, June 2011. https://doi.org/10.48550/arxiv.1106.5730 24. Ho, Q., et al.: More effective distributed ML via a stale synchronous parallel parameter server. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1, pp. 1223–1231 (2013) 25. Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing, vol. 32 (2019). https://doi.org/10.48550/arxiv. 1912.01703 26. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems, March 2016. https://doi.org/10.48550/arxiv.1603.04467 27. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: MM 2014 Proceedings of the 2014 ACM Conference on Multimedia, pp. 675–678, June 2014, https:// doi.org/10.48550/arxiv.1408.5093
256
F. Berloco et al.
28. Akiba, T., Fukuda, K., Suzuki, S.: ChainerMN: scalable distributed deep learning framework, October 2017. https://doi.org/10.48550/arxiv.1710.11351 29. (Jinquan) Dai, J., et al.: BigDL: A distributed deep learning framework for big data. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 50–60 (2019). https://doi. org/10.1145/3357223.3362707 30. Ooi, B.C., et al.: SINGA: a distributed deep learning platform. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 685–688 (2015). https://doi.org/10.1145/ 2733373.2807410 31. Elephas: Distributed Deep Learning with Keras and Pyspark. http://maxpumperla.com/ele phas/. Accessed 22 Mar 2022 32. Tensorflowonspark. https://github.com/yahoo/TensorFlowOnSpark. Accessed 22 Mar 2022 33. Liberty, E., et al.: Elastic machine learning algorithms in Amazon SageMaker. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 731–737 (2020). https://doi.org/10.1145/3318464.3386126 34. Yuan, J., et al.: OneFlow: redesign the distributed deep learning framework from scratch, October 2021. https://doi.org/10.48550/arxiv.2110.15032 35. Sergeev, A., Del Balso, M.: Horovod: fast and easy distributed deep learning in TensorFlow, February 2018. https://doi.org/10.48550/arxiv.1802.05799 36. Khumoyun, A., Cui, Y., Hanku, L.: Spark based distributed Deep Learning framework for Big Data applications. In: 2016 International Conference on Information Science and Communication Technology, ICISCT 2016, December 2016. https://doi.org/10.1109/ICISCT.2016. 7777390 37. Lim, E.J., Ahn, S.Y., Park, Y.M., Choi, W.: Distributed deep learning framework based on shared memory for fast deep neural network training. In: 9th International Conference on Information and Communication Technology Convergence. Powered by Smart Intelligence, ICTC 2018, pp. 1239–1242, November 2018, doi: https://doi.org/10.1109/ICTC.2018.853 9420 38. Qubole data service (2011). https://docs.qubole.com/en/latest/user-guide/qds.html
Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce Through Innovative Graphs and Cyberthreat Intelligence Employment Rafał Kozik1,2 , Marek Pawlicki1,2(B) , Mateusz Szczepa´nski1,2 , Rafał Renk1,3 , and Michał Chora´s1,2 1 ITTI Sp. z o.o., Pozna´n, Poland
[email protected]
2 Bydgoszcz University of Science and Technology, Bydgoszcz, Poland 3 Adam Mickiewicz University, Pozna´n, Poland
Abstract. Currently, the doors are open to novel paradigms of the use of connected technologies. E-commerce and the Internet of Things devices experience substantial growth in popularity. However, this unprecedented increase in popularity comes at a price of a wide attack surface. This paper proposes an efficient Post Event Analysis and Incident Response procedure implemented with the use of graph databases and cyberthreat intelligence platforms to raise the security capabilities of an organisation by granting the ability to relate current detection to both own past events and incidents reported by other organisations. This approach allows the user to gain a pre-emptive advantage over the malicious actors by learning from the experiences of others. Keywords: Graphs · Recommendation and mitigation · Cyber threat intelligence
1 Introduction Recently, two growing trends of the use of novel technologies present both enormous benefits and significant threats to society - these are the Internet of Things (IoT) and e-commerce. The e-commerce market in Europe is constantly growing [3] and plays a pivotal role in the European Digital Single Market. The percentage of customers employing the Internet to perform their shopping activities surpasses 70% [2]. The state of affairs in the European Digital Single Market pushes the e-commerce services across Europe into an economy which can cause cascading effects in case of disturbances - making it a sort of critical infrastructure. The stakes are high, as e-commerce related fraud exceeds 50% of total card fraud losses in some European countries, rounding up to a sum upwards of e1.3 billion a year in the whole EU [4, 6]. The list of impacts of security violations on e-commerce includes monetary effects like the operational costs, but also less tangible effects, like loss of reputation, harm to people and damage to the environment [11]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 257–266, 2022. https://doi.org/10.1007/978-3-031-13832-4_22
258
R. Kozik et al.
The influx in the number of e-commerce users makes the field a potent, attractive attack surface for malicious actors. The mix of kaleidoscopic relations between standards, legislation and privacy issues in the context of e-commerce makes the protection against cyberattacks a difficult endeavour [1]. The threat landscape is of such a level of intricacy that a traditional, reactive approach leaves the protected organisation exposed. The defenders need to cooperate using cyberthreat intelligence (CTI) platforms to be adequately prepared [13, 19]. The Internet of Things domain flourishes with the expanding range of applications. Naturally, with the growth of adaptation of the devices - from homes and offices to health monitoring and industrial applications, transportation and others - their security becomes a pressing concern [8]. The list of the IoT-related data breaches is staggering [18], with over 1.5 billion known breaches in 2020 [12]. No cybersecurity player in the field can possess all the necessary knowledge on their own, along with the relevant situational details of an attack [9]. In line with the ability to benefit from CTI is the organisations ability to learn from its own experience. Being able to draw from the history of incidents in one’s own organisation, along with the ability to draw insights from the community hones the capability to detect and handle new incidents, effectively hardening the organisation over time. An effective incident handling procedure sits at the heart of efficient cyberthreat response and mitigation, containing the incident and decreasing the financial, reputational and legal losses [7]. This paper proposes an efficient Post Event Analysis and Incident Response procedure implemented with the use of graph databases and cyberthreat intelligence platforms to raise the security capabilities of an organisation by granting the ability to relate current detection to both own past events and incidents reported by other organisations. The paper is structured as follows: Sect. 2 overviews the related works, then Sect. 3 delves into the details of the proposed solution. Section 4 elaborates on the knowledge accumulation process. The paper ends with conclusions.
2 Related Works and Post Event Analysis Methods The [16] primer of cybersecurity defines post event analysis as the step in incident handling which aims to focus on learning and reasoning from the incident. The manuscript puts the post incident activity as one of the most important steps in incident handling, prompting team discussions on the knowledge built from the incident handling process, enticing review, learning and improvement in incident response capabilities. The authors of [14] review models of cybersecurity incident handling, finalising the paper with a proposition of an effective process. Based on the surveyed models, they distinguish post-incident analysis as the third and final phase of incident response, right after Incident Analysis and Response, which follows the Detection and Diagnosis phases.
Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce
259
In the NIST Computer Security Incident Handling Guide Recommendations of the National Institute of Standards and Technology [10], Post-Incident Activity is included in the incident handling checklist. The major beats of incident handling in NIST SP 80061 are: Preparation, Detection and Analysis, Containment, Eradication and Recovery, Post-Incident Activity. The authors of [17] and [15] outline a post-incident analysis process which encompasses: Examination, Identification of Evidence, Collection of Evidence, Analysis of Evidence, Documentation of the Process and Results. In ‘Good Practice Guide for Incident Management’ published by ENISA, [5] the proposed incident response process follows these steps: Detection, Triage, Analysis, Incident Response. This approach is general enough, so that ENISA suggests adapting it to the stakeholders purposes, expanding on the necessary elements. The same guide proposes the Incident Resolution Cycle to include: Data Analysis, Resolution Research, Action Proposed, Action Performed, Eradication and Recovery, which flow in a circular, iterative fashion.
3 Post Event Analysis in Practice This section provides techniques and procedures for accumulating and eventually querying the knowledge database containing the observations and facts about the observed system. The knowledge database is primarily focused on countermeasures and mitigations so that the solution is able to provide supportive capabilities (recommendations on reaction and mitigation actions) for the operator. Apart from the events obtained from the observed system, the knowledge database is also enriched with the content obtained from third-party sources (including cyberthreat intelligence exchange websites, CTI Systems like MISP, and well-known repositories like GitHub, where the cyber-related data is publicly shared). The knowledge is maintained in a machine-readable format and is able to adequately facilitate response and mitigation. The solution requires a human-in-the-loop to provide the sources from which the known cybersecurity-related data is fetched and adequate rules that analyse the accumulated knowledge and produce inferred facts (e.g. recommendation for reaction to specific threats). 3.1 Proposed Architecture The internal architecture of the Post-Event Analysis solution has been presented in Fig. 1. There are several key elements loosely coupled in order to facilitate the main capabilities of the tool. These elements interact with other entities by means of the asynchronous publish/subscribe message bus and direct HTTP calls. The publish-subscribe message bus is internally shared in order to avoid point-topoint communication channels and thus to reduce the level of coupling between various sub-components.
260
R. Kozik et al.
Fig. 1. Post Event Analysis tool architecture
It must be noted that the knowledge database does not collect raw events. Instead, it stores only a small part of the original data (e.g. hashes, unique identifiers, relevant measurements, etc.) using STIX notation as a serialization language. It can be dynamically decided by the end-user which attributes of the original data should be stored in the knowledge database. This can be configured at the Parser level. Effectively, it means that the knowledge is serialised using the STIX format, which eventually allows users to consistently reason about the observed events and propose an adequate reaction. This is because the solution uses consistent language (the same semantics) to describe and name entities. The reasoning is implemented by means of inference rule engine. The rules for reasoning are provided by the expert (e.g. administrator of the system). These are expressed using a dedicated language, which is explained in the next section.
Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce
261
The engine and the knowledge database are managed by means of the KDBM (Knowledge Database Manager) service. In principle, the service monitors the executions of reasoning rules (e.g. their order and results). Moreover, KDBM listens to the internal message data bus and interprets the request to specific calls executed either on the knowledge database or rule engine. All the above mentioned sub-components can be centrally managed using a dedicated web-based user graphical interface that is composed of the front-end (Single Page Application running inside the web-browser) and the backend (running at the server side). 3.2 Graphical User Interface The Graphical User interface (GUI) has been developed using the Quasar frame-work. The tool adapts a classic SPA (Single Page Application) architecture. Thanks to the Vue.JS components the Quasar framework is using, the integration of the GUI is straightforward and guarantees consistent look-and-feel across various components. 3.3 Key Components and Capabilities The events are represented in the knowledge database using the STIX notation language. Therefore, the items that can be searched (see Fig. 2) and analysed (see Fig. 3). The items are categorised by their type (e.g. Attack pattern, Indicator, Course of action, etc.) according to the STIX specification. In Fig. 3, an example is shown where the user starts the analysis from the indicator of a malicious IP address, identifies what kind of course of action has been taken, and what kind of attack pattern it is related to.
Fig. 2. Searching relevant items in the knowledge database
262
R. Kozik et al.
Fig. 3. Retrieving relevant details about the items stored in the knowledge database
Moreover, the end-user can drill down deeper into the analysis and check whenever other indicators related to the selected attack pattern have been further recorded in the database (see Fig. 4).
Fig. 4. An indicator related to SQL Injection attack pattern
From the user’s point of view, the rule engine appears as a list of rules that are executed in the desired order. Typically, the rule starts with a predicate (a pattern that is matched against the knowledge stored in the database) and ends with a rule action that is executed whenever the rule is triggered. Apart from the rule’s body, the user may also specify the priority of the rule as well as the name of the group the rule belongs to. These two properties allow the user to control the process of how the rules are executed. The priority allows the user to form a chain of rules. For instance, we would like to first apply “blocking reaction” for the
Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce
263
strong indicators and “send the notification” for the remaining ones. On the other hand, grouping properties allows us to build even more complex scenarios. For instance, when a rule with a certain priority is triggered, it will stop the execution of these with lower priorities.
4 Knowledge Accumulation and Leveraging CT The Post-Event Analysis solution builds the up-to-date knowledge and maintains it in a machine-readable format. The following paragraphs will take a more in-depth look into the process of knowledge accumulation. The proposed solution facilitates efficient and effective Post Event Analysis by garnering the indicators of compromise coming from detection tools and querying CTI platforms to fully contextualise the incident with regard to similar occurrences in the past. The gathered intel is used to formulate a graph of all the relationships encompassing the incident. This graph allows to reason and formulate adequate mitigation rules, as well as significantly augment the post- event analysis process by relating the current incident to similar events in the past, of both the affected organisation and other organisations that fed the CTI. In Fig. 5, an indicator of a detected ransomware event is shown. The solution suggests a reaction - warning the administrator about a possible incident, as seen in Fig. 6.
Fig. 5. An indicator of a possible ransomware attack
Fig. 6. Possible reactions to ransomware
Using playbooks, the solution can notify the hunting team or the coordinator - this is a webhook that propagates the notification. One of the reactions extracting the malware name. This allows to query CTI platforms for more details, essentially searching previous events, identifying the name of the malware.
264
R. Kozik et al.
In this particular situation, cross-referencing the knowledge base allows to identify that the threat is in fact, most likely the Malware Conti, as seen in Fig. 8, as in the past events there were instances that contained this malware. Knowledge about this threat is currently important for the operator, so it is useful to query CTI about Indicators of Compromise (IoC). This allows the operator to be better prepared for an unfolding attack. The solution fetches the domain names related to this malware, the hashes, IP addresses, domain names, etc. This informs the operator that if any event relates to those IoC, the incident requires utmost attention. The extracted IoC can be seen in Fig. 7.
Fig. 7. Indicators of Compromise extracted from CTI
In Fig. 8, a set of recommendations can be seen - one coming from the fact that an IP address has performed a malicious action, and because this IP related the IoC from the earlier detection, one of the other recommendations with the warning that possible malware infection has been detected. In the explanation panel, the details about the indicator, the attack pattern and the course of action are provided. The solution also offers more detailed recommendation and mitigation strategies, which is referred to as the containment plan.
Fig. 8. Response and mitigation measures based on the accumulated knowledge
In Fig. 9, an example of a graph that expresses the accumulated knowledge is displayed. The graph encompasses two cyberattacks - a brute force and a portscan attack, which are two main beats of an unfolding cyberattack scenario. This representation allows for the correlation of specific suspicious actions of malicious users.
Efficient Post Event Analysis and Cyber Incident Response in IoT and E-commerce
265
Fig. 9. An example of the accumulated knowledge in a graph format
5 Conclusions Post Event Analysis is a crucial step in the incident handling process. With the rising numbers of cyber-incidents, novel paradigms of response are necessary. This paper presented an innovative approach to post event analysis and incident response and mitigation by leveraging novel developments in graph databases and the emergence of cyberthreat intelligence platforms. This innovative mixture grants the user the advantage of employing other users’ experience to gain a pre-emptive edge in the fight with the malicious actors. Acknowledgement. This work is co-funded under the ELEGANT project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 957286. This work is co-funded by the ENSURESEC project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 883242.
References 1. 2020 Global Threat Intelligence Report. The nature of security: be resilient to thrive. https:// tinyurl.com/4ayv32xx. Accessed 04 May 2021 2. Ecommerce in europe: e717 billion in (2020). https://tinyurl.com/hy3x8kwa. Accessed 04 May 2021 3. European ecommerce report (2019). https://tinyurl.com/4wpmrv52. Accessed 04 May 2021 4. Fraud losses in e-commerce on uk-issued credit cards-2019 — statista. https://tinyurl.com/ 9bx7dr3n. Accessed 04 May 2021 5. Good practice guide for incident management — enisa. https://www.enisa.europa.eu/public ations/good-practice-guide-for-incident-management. Accessed 05 Apr 2022
266
R. Kozik et al.
6. Survey on “scams and fraud experienced by consumers”. final report. https://tinyurl.com/e6n 97hf2. Accessed 04 May 2021 7. Ab Rahman, N.H., Choo, K.-K.R.: A survey of information security incident handling in the cloud. Comput. Secur. 49, 45–69 (2015) 8. Ahmad, R., Alsmadi, I.: Machine learning approaches to IoT security: a systematic literature review. Internet of Things 14, 100365 (2021) 9. Alkalabi, W., Simpson, L., Morarji, H.: Barriers and incentives to cybersecurity threat information sharing in developing countries: a case study of Saudi Arabia. In: 2021 Australasian Computer Science Week Multiconference. ACSW2021. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3437378.3437391 10. Cichonski, P., Millar, T., Grance, T., Scarfone, K., et al.: Computer security incident handling guide. NIST Spec. Publ. 800(61), 1–147 (2012) 11. Couce-Vieira, A., Insua, D.R., Kosgodagan, A.: Assessing and forecasting cybersecurity impacts. Decis. Anal. 17(4), 356–374 (2020) 12. Eshghi, B.: IoT cybersecurity in 2022: vulnerabilities countermeasures. AI Multiple (2022). Accessed 03 Nov 2022 13. Gong, S., Lee, C.: Cyber threat intelligence framework for incident response in an energy cloud platform. Electronics 10(3), 239 (2021) 14. Lee, J.W., Song, J.G., Son, J.Y., Choi, J.G.: Propositions for effective cyber incident handling (2018) 15. Pauna, A., Moulinos, K., Lakka, M., May, J., Tryfonas, T.: Can we learn from SCADA security incidents. White Paper, European Union Agency for Network and Information Security, Heraklion, Crete, Greece (2013) 16. Prasad, R., Rohokale, V.: Cyber Security: The Lifeline of Information and Communication Technology. Springer, Cham (2020) 17. Spyridopoulos, T., Tryfonas, T., May, J.: Incident analysis & digital forensics in SCADA and industrial control systems (2013) 18. U.S. Department of health and human services office for civil rights: breach portal: notice to the secretary of HHS breach of unsecured protected health information (2022). Accessed 03 Nov 2022 19. Xie, W., Yu, X., Zhang, Y., Wang, H.: An improved shapley value benefit distribution mechanism in cooperative game of cyber threat intelligence sharing. In: IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 810–815 (2020). https://doi.org/10.1109/INFOCOMWKSHPS50562.2020.9162739
Federated Sparse Gaussian Processes Xiangyang Guo, Daqing Wu, and Jinwen Ma(B) Department of Information and Computational Sciences, School of Mathematical Sciences and LMAM, Peking University, Beijing 100871, China [email protected]
Abstract. In this paper, we propose a federated sparse Gaussian process (FSGP) model, which combines the sparse Gaussian process (SGP) model with the framework of federated learning (FL). Sparsity enables thereduction in the time complexity of training a Gaussian process (GP) from O N 3 to O NM 2 and the space complexity from O N 2 to O(NM ), where N is the number of training samples and M (M N ) the number of inducing points. Furthermore, FL aims at learning a shared model using data distributed on more than one client under the condition that local data on each client cannot be accessed by other clients. Therefore, our proposed FSGP model can not only deal with large datasets, but also preserve privacy. FSGPs are trained through variational inference and applied to regression problems. In experiments, we compare the performance of FSGPs with that of federated Gaussian processes (FGPs) and SGPs trained using the datasets consisting of all local data. The experimental results show that FSGPs are comparable with SGPs and outperform FGPs. Keywords: Sparse Gaussian Processes · Variational inference · Federated learning · Preserve privacy
1 Introduction Gaussian Processes (GPs) have proven to be a powerful and popular model for diverse applications in machine learning and data mining, e.g., the classification of the images of handwritten digits, the learning of the inverse dynamics of a robot arm, and dimensionality reduction [1–4]. Unfortunately, the time complexity of training GPs scales as O N 3 and the space complexity as O N 2 , where N denotes the number of training samples, which makes GPs unaffordable for large datasets. To overcome this limitation, many sparse Gaussian process (SGP) models have [5–14], which allow been proposed the reduction in the time complexity from O N 3 to O NM 2 and the space complex ity from O N 2 to O(NM ), where M (M N ) is the number of inducing points. Among these SGP models, that proposed by Titsias [13] and Titsias [14] obtained the state-of-the-art performance, which is utilized to construct our proposed federated sparse Gaussian process (FSGP) model. In modern machine learning, big models are widely needed, and training them requires large datasets. However, there are two major challenges which strongly hinder © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 267–276, 2022. https://doi.org/10.1007/978-3-031-13832-4_23
268
X. Guo et al.
the training of big models. Firstly, in most industries, such as finance and healthcare, data exists in the form of isolated islands [15]. Secondly, due to the need for the preservation of privacy, the isolated data cannot be grouped to train a machine learning model [15]. Fortunately, federated learning (FL), first proposed by McMahan et al. [16], provides a solution to the two problems. Assume that there are K(K > 1) clients or participants, each of which possesses its own local dataset. Then, FL aims to learn a shared model using the K local datasets under the condition that local data on each client is not accessible to other clients. Many machine learning models have been combined with the framework of FL, such as federated Gaussian processes (FGPs) [17], federated linear regression [15, 18], SecureBoost [18, 19], federated deep neural networks [16, 18], and federated reinforcement learning [18, 20]. FL has been applied to a wide range of applications, including computer vision, natural language processing, recommendation system, finance, healthcare, education, urban computing, smart city, edge computing, Internet of things, blockchain, and 5G mobile networks [18]. In the FGP model [17], making predictions about test outputs on some client is only based on its own local training dataset, which leads to poor predictions. To tackle this problem, in this paper, we propose an FSGP model which integrates the SGP model and the framework of FL. In FSGPs, each client can make predictions using local training datasets of other clients without seeing them. Here, horizontal FL and the client-server architecture [18] are considered only. The objective function to be optimized takes the same form as that in MaMahan et al. [16] does. Thus, the FederatedAveraging algorithm [16, 18] is used to train the proposed FSGP model. We compare our proposed FSGPs with FGPs and SGPs on two synthetic datasets and one real-world dataset. When training FSGPs, the training datasets are randomly divided into K subsets, in which the numbers of samples are determined by random, and we address imbalance problems. On the contrary, SGPs are trained using the whole training datasets. The experimental results show that the performance of our proposed FSGP model is comparable with that of SGPs and better than that of FGPs. The rest of this paper is organized as follows. Section 2 shortly introduces the SGP model proposed by Titsias [13] and Titsias [14]. In Sect. 3, we elaborate on the FederatedAveraging algorithm and how the FSGP model preserves privacy. Section 4 presents the experimental results on three datasets, and we conclude this paper in Sect. 5.
2 Related Models In this section, we briefly introduce the SGP model proposed by Titsias [13] and Titsias [14]. A GP, denoted as {f (x)|x ∈ X }, is a collection of random variables indexed by It x ∈ X ⊆ RD , any finite subset of which follows a Gaussian distribution. is fully specified by its mean function m(x) and covariance or kernel function c x, x , where m(x) = E f (x) , c x, x = E (f (x) − m(x)) f x − m x
(1)
Federated Sparse Gaussian Processes
269
For simplicity, m(x) is usually assumed to be zero. Then, we choose the squared exponential function, defined by ⎧ ⎫ 2⎪ ⎪ ⎨ ⎬ x − x d D d 1 , (2) c x, x ; θ = θ02 exp − d =1 ⎪ ⎪ θd2 ⎩ 2 ⎭ as the kernel function, where θd , d = 0, 1, . . . D are positive hyperparameters that are optimized in the training process. More details about covariance functions can be found in Rasmussen and Williams [1]. N Suppose that we have a training dataset {(xn , yn )}n=1 , where yn is obtained by D= 2 adding i.i.d. Gaussian noise, subject to N 0, σ , to fn = f (xn ). Let X, f, and y denote all training inputs, all corresponding latent function values, and all training outputs, respectively. Then, the training process is performed by maximizing the log-likelihood function, given by L(y; θ , σ ) =
1 1 logp(y) = logN y|0, CNN + σ 2 IN , N N
(3)
w.r.t. θ and σ, where CNN = c(X, X; θ ) and IN is the identity matrix. After the training process, given a test point (x∗ , y∗ ), the aim of the prediction process is to calculate the conditional distribution p(y∗ |y) . We have T y CNN + σ 2 IN c∗N = N 0, , (4) c∗N c∗∗ + σ 2 y∗ where c∗N = c(x∗ , X; θ ) and c∗∗ = c(x∗ , x∗ ; θ ). It follows that −1 −1 ∗ 2 2 2 T y |y ∼ N c∗N CNN + σ IN y, c∗∗ + σ − c∗N CNN + σ IN c∗N
(5)
From Eq. (3–5), we see that the time complexity of training GPs scales as O N 3 2 and the space complexity as O N , since we need to store CNN + σ 2 IN and calculate its inverse and determinant. That makes GPs intractable for large datasets. Next, we shortly introduce the SGP model that can overcome the above limitation. M inducing points {(zm , um )}M m=1 are introduced to construct an SGP, where zm , m = 1, . . . , M are pseudo-inputs independent of X, and um = f (zm ). Let Z and u be all the pseudo-inputs and all inducing variables, respectively. Then, it is obtained that 1 N 1 = N 1 = N 1 ≥ N
L(y; θ, σ ) =
log p(y) log ∫ p(u, f, y)dfdu p(u)p(f|u)p(y|f) dfdu q(u, f) p(u)p(f|u)p(y|f) ∫ q(u, f) log dfdu q(u, f) log ∫ q(u, f)
(6)
270
X. Guo et al.
in which q(u, f) is any probability distribution over (u; f), and the inequality is obtained through Jensen’s inequality. The coefficient 1/N is used to eliminate the impact of the scale of the gradients. Assume that q(u, f) = q(u)p(f|u), where q(u) is an unconstrained Gaussian distribution with mean vector μ and covariance matrix . It follows that L(y; θ , σ ) ≥F(θ , σ, Z, q(u)) p(u)p(y|f) 1 dfdu. = ∫ q(u)p(f|u) log N q(u)
(7)
Fixing θ , σ and Z, q∗ (u) that maximizes F(θ , σ, Z, q(u)) can be found analytically. The mean vector and covariance matrix of q∗ (u) are μ∗ =
1 CMM A−1 CMN y and ∗ = CMM A−1 CMM , σ2
(8)
respectively, where CMM = c(Z, Z; θ ), CMN = c(Z, X; θ ), and A = CMM + σ −2 CMN CTMN . Then, we have L(y; θ , σ ) ≥ F(θ, σ, Z) = F θ , σ, Z, q∗ (u) 1 1 = log N y|0, QNN + σ 2 IN − tr(C), (9) N 2N σ 2 −1 T where QNN = CTMN C−1 MM CMN and C = CNN − CMN CMM CMN . Next, the estimation of θ and σ by maximizing L(y; θ , σ ) is replaced with the joint estimation of θ, σ , and Z by maximizing F(θ, σ, Z). This replacement enables the reduction in the time and space complexity. After the above maximization, we can calculate an approximation of the true conditional distribution p(y∗ |y). We have p y∗ |y = p(u|y)p(f|u, y)p y∗ |u, f dfdu. (10)
By substituting p(u|y) with q∗ (u) and p(y∗ |u, f) with p(y∗ |u), we obtain an approximate distribution q y∗ = q∗ (u)p y∗ |u du. (11) q(y∗ ) is a Gaussian distribution, whose mean and variance are 1 my x∗ = 2 c∗M A−1 CMN y σ and
−1 T cy x∗ = c∗∗ + σ 2 − c∗M C−1 c∗M , MM − A
respectively, where c∗M = c(x∗ , Z; θ ).
(12)
(13)
Federated Sparse Gaussian Processes
271
3 Federated Sparse Gaussian Processes 3.1 FederatedAveraging Algorithm Suppose that there are K clients and the k th one possesses a local training dataset Nk Dk = xnk , ynk n=1 , k = 1, . . . , K. Furthermore, assume that D = ∪K k=1 Dk and N = K k=1 Nk . To conduct federated learning, we use afactorized target function w.r.t. clients to approximate the true likelihood, i.e. p(y) ≈ K k=1 p(yk ), which leads to logp(y) = K logp(y This approximation has been applied to the training of distributed GPs ). k k=1 [21, 22]. As shown in Sect. 2, 1/Nk logp(yk ) has a lower bound Fk (θ, σ, Z), which is defined on Dk in the way shown in Eq. (9). Fk (θ, σ, Z), k = 1, . . . , K have common parameters. Therefore, K k=1 Nk /NFk (θ , σ, Z) can be viewed as an approximate lower bound of 1/NF(θ, σ, Z). The form of this lower bound is similar to that of the objective function of the federated optimization problem in McMahan et al. [16]. Thus, we use the FederatedAveraging algorithm proposed by McMahan et al. [16] to train an FSGP. Algorithm 1 gives the local update processes on clients.
FederatedAveraging algorithm performed by the server is presented in Algorithm 2.
272
X. Guo et al.
In the two algorithms, and w represent {θ , σ, Z} for simplicity. Since Fk (θ , σ, Z) has the coefficient 1/Nk , it is rational to consider the scales of the gradients of Fk (θ , σ, Z), k = 1, . . . , K to be same. Thus, we use the same learning rate sequence for different clients. To improve the training efficiency, only max{Kρ, 1} clients are selected to update model parameters locally in one round, where ρ ∈ (0, 1). In addition, we can employ privacy-preserving techniques, such as fully homomorphic encryption [23, 24], to ensure data security when transmitting gradients [18]. 3.2 Prediction After an FSGP is trained through the above FederatedAveraging algorithm, we can use Eq. (12) and Eq. (13) to calculate the approximate predictive distribution q(y∗ ). To show that the calculation can preserve privacy, we rewrite Eq. (12) and Eq. (13) as −1 ∗ K 1 1 K T CMNk CMNk CMNk yk my x = 2 c∗M CMM + 2 k=1 k=1 σ σ
(14)
and
−1 ∗ 1 K −1 2 T T cy x = c∗∗ + σ − c∗M CMM − CMM + 2 CMNk CMNk , (15) c∗M k=1 σ
respectively, where CMNk = c(Z, Xk ; θ ). From Eq. (14) and Eq. (15), we see that if a client wants to calculate q(y∗ ), it solely needs the values of CMNk CTMNk and CMNk yk from the other clients. Since Dk cannot be recovered from the values of CMNk CTMNk and CMNk yk (see Theorem 1), the prediction is privacy-preserving. Theorem 1. Dk cannot be recovered from the values of CMNk CTMNk and CMNk yk . Proof . Since an input x and a pseudo-input zm are both real vectors, it is rational to consider that x is impossible to be equal to zm . Thus, any entry of CMNk belongs to the N N open interval 0, θ02 . View each row of CMNk as a point in 0, θ02 k . 0, θ02 k is an open set and the convex hull of the M points is a subset of it. It follows that there exist infinitely many rotation transformations around the origin, denoted as ϕ, so that ϕ ( ) N is still a subset of 0, θ02 k . Each ϕ can be regarded as an Nk × Nk orthogonal matrix Qϕ . Then, we have
and
T CMNk CTMNk = CMNk Qϕ CMNk Qϕ
(16)
CMNk yk = CMNk Qϕ QTϕ yk
(17)
Therefore, we cannot infer CMNk and yk from the values of CMNk CTMNk and CMNk yk . Then, that CMNk cannot be recovered leads to that Xk cannot be recovered. We can easily generalize this result to other covariance functions.
Federated Sparse Gaussian Processes
273
4 Experiments In this section, we present the experimental results on two synthetic datasets and one real-world dataset. The first dataset is drawn from the following function of one variable f (x) = 3sin(2π x/20), x ∈ [−10, 10].
(18)
The 500 training inputs are evenly distributed in the above interval and corresponding outputs are obtained by adding i.i.d. Gaussian noises, subject to N 0, 0.52 , to latent function values. The 300 test samples are generated in the same way. The second synthetic dataset is generated similarly. The latent function is f (x) = 2.5sin(2π (x1 + x2 )/90), x ∈ [−25, 25]2
(19)
This dataset consists of 4900(70 × 70) training samples and 900(30 × 30) test samples. The Gaussian noises follow N 0, 0.42 . The third dataset is KIN40K dataset, which contains 10000 training samples and 30000 test samples from R8 × R. We use the root mean squared error (RMSE) to measure the performance of SGPs, FGPs and FSGPs, which is defined as 1 L RMSE = (20) (tl − yl )2 l=1 L where {yl }Ll=1 and {tl }Ll=1 are test outputs and corresponding predictions, respectively. It is clear that smaller RMSE imply better performance.
Fig. 1. Synthetic dataset 1
In all the three experiments, T , P and λ are set to be 5000, 3 and 0.1, respectively. Then, we sequentially set K = 5, 10, 10 and Kρ = 2, 5, 5, respectively. θ , σ , and Z are
274
X. Guo et al.
initialized as (1, . . . , 1)T , 0.1, and a random subset of X, respectively. When training SGPs and FSGPs, θ , σ , and Z have the same initial values. Furthermore, the imbalance problem is considered in the experiments by randomly determining the sizes of training subsets. In the first experiment, the difference between the maximum number and the minimum one is 59. In the other two experiments, the differences are 634 and 1299, respectively.
Fig. 2. Synthetic dataset 2
The results on three datasets are presented in Fig. 1, Fig. 2, and Fig. 3, respectively. In all the three experiments, FSGPs perform better than FGPs. On the two synthetic datasets, FSGPs outperform FGPs slightly when the number of inducing variables is large enough. However, on the KIN40K dataset, FSGPs obviously outperform FGPs when the number of inducing variables is large enough, since the unknown latent function in KIN40K is more complex than the two synthetic latent functions. In addition, we see that FSGPs and SGPs have a similar ability, that is to say, FSGPs. are comparable with SGPs. The three results show that although the whole training datasets are divided into small subsets in training an FSGP, we can obtain comparable performance through the federated aggregation algorithm.
Federated Sparse Gaussian Processes
275
Fig. 3. KIN40K dataset
5 Conclusion We have proposed an FSGP model that not only remains the scalability of SGPs, but also can learn a shared model using isolated datasets stored on more than one client. The FSGP model can preserve privacy since, in the training process, we need not transport the data stored on one client to the other clients, and in the test process, the data cannot be recovered. The experimental results on two synthetic datasets and one real-world dataset show that the performance of our proposed FSGP model is comparable with that of SGPs and better than that of FGPs in terms of the criterion we adopt. Two interesting topics for the future is to develop a more effective algorithm to accelerate the training processes and to combine vertical federated learning with GPs. Acknowledgement. This work is supported by the National Key R & D Program of China (2018AAA0100205).
References 1. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. The MIT Press (2006) 2. Bishop, C.M. (ed.): Pattern Recognition and Machine Learning. ISS, Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0 3. Guo, X., Li, X., Ma, J.: Variational EM algorithm for Student- mixtures of Gaussian processes. In: Huang, D.S., Jo, K.H., Li, J., Gribova, V., Hussain, A.: (eds) Intelligent Computing Theories and Application. ICIC 2021. Lecture Notes in Computer Science, vol. 12837, pp. 552–563. Springer, Cham (2021) https://doi.org/10.1007/978-3-030-84529-2_47
276
X. Guo et al.
4. Gao, X.B., Wang, X.M., Tao, D.C.: Supervised Gaussian process latent variable model for dimensionality reduction. In: IEEE transactions on systems, man, and cybernetics, Part B (Cybernetics), vol. 41, no. 2, pp. 425–434 (2011) 5. Smola, A.J., Bartlett, P.: Sparse greedy Gaussian process regression. In: Advances in Neural Information Proceeding System, vol. 13, pp. 619–625 (2001) 6. Csato, L., Opper, M.: Sparse online Gaussian processes. In: Neural Computation, vol. 14, pp. 641–648 (2002) 7. Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse Gaussian process methods: the informative vector machine. In: Advances in Neural Information Processing Systems, vol. 15 (2003) 8. Schwaighofer, A., Tresp, V.: Transductive and inductive methods for approximate Gaussian process regression. In: Advances in Neural Information Processing Systems, vol. 15 (2003) 9. Seeger, M., Williams, C.K.I., Lawrence, N.D.: Fast forward selection to speed up sparse Gaussian process regression. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics, PRML R4, pp. 254–261 (2003) 10. Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Advances in Neural Information Processing Systems, vol. 13, MIT Press (2001) 11. Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. In: Journal of Machine Learning Research, vol. 6, pp. 1939–1959 (2005) 12. Snelson, E., Ghahramani, Z.: Sparse Gaussian process using pseudo-inputs. In: Advances in Neural Information Processing Systems, vol. 18 (2006) 13. Titsias, M.K.: Variational model selection for sparse Gaussian process regression. Technical report, School of Computer Science, University of Manchester (2009) 14. Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, PMLR 5, pp. 567–574 (2009) 15. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. In: ACM Transactions on Intelligent Systems and Technology, vol. 10 (2019) 16. McMahan, H.B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, PMLR 54, pp. 1273–1282 (2017) 17. Yue X., Kontar R.A.: Federated Gaussian Process: Convergence, Automatic Personalization and Multi-Fidelity Modeling. https://doi.org/10.48550/arXiv.2111.14008 18. Yang, Q., Liu, Y., Cheng, Y., Kang, Y., Chen, T., Yu, H.: Federated Learning (2019) 19. Cheng, K., et al.: Secureboost: a lossless federated learning framework. In: IEEE Intelligent Systems (2021) 20. Zhuo, H.H., Feng, W., Lin, Y., Xu, Q., Yang, Q.: Federated Deep Reinforcement Learning. arXiv.org (2020–02–09) https://arxiv.org/abs/1901.08277 21. Deisenroth, M.P., Ng, J.W.: Distributed Gaussian Processes. In: Proceedings of the 32nd International Conference on Machine Learning, PMLR 37, pp. 1481–1490 (2015) 22. Xie, A., Yin, F., Xu, Y., Ai, B., Chen, T., Cui, S.: Distributed Gaussian processes hyperparameter optimization for big data using proximal ADMM. In: IEEE Signal Processing Letters, vol. 26, no. 8, pp. 1197–1201 (2019) 23. Acar, A., Aksu, H., Uluagac, A.S., Conti, M.: A survey on homomorphic encryption schemes: theory and implementation. In: ACM Computing Surveys, vol. 51, pp. 1–35 (2018) 24. Armknecht, F., et al.: A guide to fully homomorphic encryption. In: IACR Cryptology ePrint Archive (2015)
Classification of Spoken English Accents Using Deep Learning and Speech Analysis Zaid Al-Jumaili1(B) , Tarek Bassiouny1 , Ahmad Alanezi2 , Wasiq Khan2 , Dhiya Al-Jumeily2 , and Abir Jaafar Hussain1,2 1 Department of Electrical Engineering, University of Sharjah, Sharjah, UAE
[email protected] 2 School of Computer Science and Mathematics,
Liverpool John Moores University, Liverpool, UK
Abstract. Accent detection which is also known as dialect recognition represents an emerging topic in speech processing. The classification of spoken accents can provide details about people background and their demographic information which can help in several domains. In this research, convolution neural network is utilised for the detection of three accent of English speakers including the American, British, and Indian accents. The data was collected from publicly available resources with speech samples from required English accents. Experimental results indicated the robustness of deep learning algorithms for the classification of English spoken accents which has the potential to be utilised in diverse applications specifically, within the security domain. Keywords: Deep learning for dialect detection · Accent classification · Spoken English accent
1 Introduction There are several thousands of spoken languages (7000 approximately) across the globe where, one of the most popular spoken languages is English which is spoken in more than 70 countries around the world [1]. English is the third most spoken language worldwide. Most of these countries use English as their official language even though their people may speak different native languages. However, the spoken English in these countries vary with respect to spoken accent. The accent is the way of pronouncing a word in the language. For example, the American accent is different from the British accent and the Indian and Bangla people have other accents. The difference in accents may lead to many problems such as people may not understand each other whether the person speaks positively or negatively, leading to trust issues between people. There are various studies indicating that unless people speak in a confident tone of voice, it is less likely to believe someone who speaks with an accent [1]. Furthermore, the Accent or the dialect of a person provides details or information about the origin and the ethnicity [2]. In border control, the origin and the ethnicity of suspects allow law control authorities to have significant details about the origin and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 277–287, 2022. https://doi.org/10.1007/978-3-031-13832-4_24
278
Z. Al-Jumaili et al.
the country that those suspects came from and permit them to identify speaker identity. Artificial Intelligence (AI) and Machine learning (ML) can provide automatic detection and classification of accent and dialect recognition handling the characterizing of the speakers’ accents in any spoken language. It should be noted that the accent recognition and classification processes are utilised as a primary phase for speech recognition [3]. Various works have been presented for the classification of accents using machine learning techniques. For instance, Chen et al. [4] have used Naïve Bayes logistic regression and Support Vector Machine (SVM), achieving 57.12% accuracy using SVM for the classification of German and Mandarin non-native speakers. Furthermore, Radzikowski et al. [5] described the problem of non-native speech recognition and identified the issues of training automatic speech recognition systems. In their proposed approach, the authors utilised accent modification methodology for training and modifying spectrogram of non-native speech to one that looks similar to a native speaker using autoencoder based on a convolutional neural network. The authors claimed that their simulation results showed that there is a noteworthy improvement with respect to speech recognition accuracy. Research proposed by Chen et al. [6] utilised a Gaussian mixture model for the classification of speaker gender and accent speech. The authors looked at Mandarin corpus data collected through Microsoft Research Asia, and the data consist of 748 speakers which are labelled with female and male genders, the accent used are regional including Beijing, Shanghai and Guangdong cities with speaking rate set including slow, normal or fast. The authors looked at the association among the number of sounds in the test data and accent identification error achieving an average accuracy around 80%. In this paper, we propose the classification of multiple accents of English language including American, British, and Indian accents spoken English while utilising the Convolution Neural Network (CNN). The proposed algorithm is used for the purpose of obtaining details or information about the geographical origin and the ethnicity of people which has diverse applications. The main contributions of these work include the following: 1. Utilization of convolution neural network for the detection of English spoken accent. 2. Using transfer learning for the analysis of the Mel spectrogram.
2 Methodology In this section, we will talk about the dataset we used and the preprocessing of the data. 2.1 Dataset In this paper, we have used an existing dataset, AccentDB, which contains recordings in 9 different English accents, and we took three of the most known accents of English, which are the American, British, and the Indian. [9].
Classification of Spoken English Accents Using Deep Learning
279
The number of samples, duration, and the number of speakers are summarized in Table 1. 2.2 Preprocessing We obtained the three accents dataset from AccentDB [9], where the researchers collected samples of four Indian-English accents and a metropolitan Indian-English accent, which are the Bangla, Indian, Malayalam, Odiya, and Telugu, and four native-English accents, which are the American, Australian, British, and Welsh. Three accent are selected including American, British, and Indian. The audio signal transferred to visual form, hence Spectrograms were utilized by passing raw audio waveforms through filter banks. It should be noted that Mel Spectrogram, which is a spectrogram transformed to a Mel scale was used for our proposed system as illustrated in Fig. 1. Melspectrogram adds a frequency-domain filter bank to time-windowed audio sources. A spectrogram is a visual representation of a signal’s frequency spectrum, where the frequency spectrum of a signal is the frequency range that the signal contains. The Mel scale mimics how the human ear works; studies demonstrate that we cannot discern sounds at high frequencies, but we can notice variations at lower frequencies [11]. Table 1. The details of AccentDB [9] Accent
Number of Samples
Duration
Number of Speakers
Bangla
1528
2 h 13 min
2
Malayalam
2393
3 h 32 min
3
748
1 h 11 min
1
Odiya Telugu
1515
2 h 10 min
2
American
5760
5 h 44 min
8
Australian
1440
1 h 21 min
2
British
1440
1 h 26 min
2
Indian
1440
1 h 29 min
2
Welsh
720
0 h 43 min
1
19 h 49 min
23
Total
16984
A log Mel energy spectrum over time represents the short-term power of an audio source on the Mel-frequency scale. The log-Mel energy spectrum is made up of the Mel-frequency spectral coefficients (MFSC). The Mel frequency scale defines a perceptual scale of frequencies that are subjectively perceived to be equivalent in terms of hearing experience. The following is the function B and for calculating Mth Mel-frequency from frequency f in hertz, as well as its inverse: B(f ) = 2595log10 (1 +
f ) 700
(1)
280
Z. Al-Jumaili et al.
Fig. 1. Mel spectrogram of plot of audio signal
m
B−1 (m) = 700(10 2595 − 1)
(2)
To compute the MFSC of an audio signal, the signal is first divided into few frames of 20–40 ms in length. According to the literature review, shorter frames do not provide enough data samples for an accurate spectrum estimation, while bigger frames do not account for potentially frequent signal changes within a frame. Frames are interleaved, and a weighted window (e.g., Hanning) is utilized in the Discrete Fourier Transform (DFT) computation to eliminate artifacts induced by rectangular windowing. Because samples at the beginning and end of a frame have lower weights, overlapping is utilized to capture the influence of these data in a prior and subsequent frame. After an audio frame has been gathered and windowed, the Fast Fourier Transform (FFT) approach is utilized to compute the Fourier transform. Because it is mirrored in frequency, only the first component of the FFT is used. A triangular overlapping filterbank with N triangular filters is used to calculate MFSC. To maintain the spectrogram inside a specified frequency range, a lower and higher frequency is supplied. For speech transmissions with a sample frequency greater than 16000 Hz, a value of 300 Hz for both the lower and higher frequencies is appropriate. Then, N + 2 equally spaced frequencies (m) in the mel-domain are formed between the lower and higher frequencies. These edge frequencies are then translated to the frequency domain, and their FFT bin numbers are computed by multiplying the number of FFT bins (K) by the sampling frequency (fs).
Classification of Spoken English Accents Using Deep Learning
281
The collection of N + 2 edge frequency bin values of the filters spread uniformly in the mel domain is denoted by f, and the amplitude of the nth filter at frequency bin k is denoted by H. The power spectrum calculation of the FFT is then multiplied by the filterbank. To calculate MFSC, sum the products of each separate filter and take the log of each total, as indicated in the equation below: K Hn (k) ∗ |F(k)|2 , n = 1, 2, 3, · · · N MFSC(n) = log (3) k=0
Following the identification of N MFSC coefficients, they are combined to yield a N*B image, where B is the number of frames examined in the spectrum. The use of log-mel energy spectrum images as input to the CNN distinguishes portions or segments of a noisy voice signal with speech content from those with no speech content or pure noise. This graph shows the relationship between edge frequencies in the frequency and mel domains; lower frequencies are spaced closer together in the frequency domain than higher frequencies, while they are uniformly spaced in the mel domain.
3 Convolution Neural Network Convolution neural network is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. A convolution neural network (CNN) is a kind of deep feedforward neural network with convolution computation and deep structure [7]. It has excellent feature extraction ability and superior performance with images and audio signals inputs. CNN consists of 3 layers which are: the convolution layer, pooling layer, and fully connected layer. There are many architectures of CNNs such as LeNet, AlexNet, GoogLeNet, VGGNet, ResNet, ZFNet. When using CNN for pattern recognition, the input data need to be organized as several feature maps to be fed into the CNN [8]. In image-processing applications, in which it is intuitive to organize the input as a two-dimensional (2-D) array, being the pixel values at the x and y (horizontal and vertical) coordinate indices [8]. For color images, RGB (red, green, blue) values can be viewed as three different 2-D feature maps [8]. CNNs run a small window over the input image at both training and testing time, so that the weights of the network that looks through this window can learn from various features of the input data regardless of their absolute position within the input [8].
4 Proposed Design The proposed accent detection system uses a Convolution Neural Network (MobileNet V2) for feature extraction, as shown in Fig. 2, while Fig. 3 shows the structure of the deep learning model utilized in our experiments.
282
Z. Al-Jumaili et al.
Fig. 2. Proposed Accent detection system
Fig. 3. Block diagram of deep learning model
MobileNetV2, a convolutional neural network design that aims to perform effectively on mobile devices, was used as a pretrained neural network. It is built on an inverted residual structure, with residual connections between bottleneck layers. As a source of non-linearity, the intermediate expansion layer filters features using lightweight depthwise convolutions. The MobileNetV2 design has an initial fully convolution layer with 32 filters, followed by 19 residual bottleneck layers. In our suggested model, we applied transfer learning. Transfer learning is a branch of machine learning (ML) that focuses on storing and transferring information learned while dealing with one problem to another but similar problem. Depth-wise separable convolutions, which are a type of factorized convolution that decreases computation cost when compared to conventional convolutions, are already
Classification of Spoken English Accents Using Deep Learning
283
included in Mobilenet V1. Because feature maps can be stored in low-dimensional subspaces and non-linear activations cause information loss, the Bottleneck layers were necessary to overcome these challenges (refer to Fig. 3). It works by first executing a point-wise convolution to extend the low-dimensional input feature map to a higher-dimensional space suited for non-linear activations, followed by applying the ReLU6 activation function. Following that, depth-wise convolution with 3x3 kernels was applied, followed by the ReLU6 activation function. Using another point-wise convolution, the feature map is projected back to a low-dimensional subspace. Finally, when the starting and ending feature maps are the same size, a residual link is inserted to optimize gradient flow during backpropagation. Transfer learning was used which is a branch of machine learning (ML) that focuses on storing and transferring information learned while dealing with one problem to another but similar problem. In simple words, Transfer learning in machine learning is the process of reusing previous models to address a new challenge or problem. Previous training information is reused to aid in the completion of a new task [12] . After obtaining the mel spectrograms for all of our classes, we fed them into our proposed transfer learning model, which we then trained using 100 epochs, 16 batch sizes, and 0.0001 transfer learning.
5 Results In this section, the simulation results for our proposed accent detection will be presented. Figure 5 shows that the loss per epoch graph is falling down for both the validation and testing losses, indicating a decent learning rate since they converge together. While Fig. 4 displays the Accuracy per Epoch, indicating a high level of accuracy using our proposed deep transfer learning model with an overall accuracy over 90%. The confusion matrix is seen in Fig. 6, which is used to assess the classification model’s performance. The matrix contrasts the actual target value with the predictions of the machine learning model and the results indicated an acceptable accuracy. Table 2 illustrates the Accuracy per class which displays the total accuracy of the class in the trained model. We have noticed an accuracy of 0.96, 0.98, and 0.97 in our transfer learning model for the Americans, British, and Indians accents, respectively.
284
Z. Al-Jumaili et al.
Fig. 4. Accuracy vs epochs
Fig. 5. Loss vs epochs
Table 3 compares our proposed system with the literature, as it can be noted, the proposed technique shows high accuracy when benchmarked with other techniques. While the proposed system generated good results in terms of the three classes, much work is needed to gather mode data and simulate other types of accents which could be gathered from various geographical regions. These details may provide further insights into the structures and ethnicity of various languages and histories of the corresponding peoples.
Classification of Spoken English Accents Using Deep Learning
285
Fig. 6. Confusion matrix of our proposed system
Table 2. Shows the Accuracy, Precision, Recall and F1 Score for each class for certain number of samples Accent Class
Number of samples
Accuracy
Precision
Recall
F1 Score
American
222
0.9595
0.9595
0.9221
0.9404
British
222
0.9820
0.9685
0.9773
0.9729
Indian
222
0.9685
0.9369
0.9674
0.9519
Table 3. Benchmark our proposed system with the use of ML for the detection of accent Literature Review
Methodology
Accent types
Testing Accuracy
Chen et al. [4]
Naive Bayes, logistic regression, SMO, LibSVM
Mandarin & German
Around 59% (Accuracy)
An Sheng et al. [10]
Gradient Boosting, Random Forest, MLP and CNN
Foreign-Accented English
Around 90% (Precision)
Radzikowski [5]
Autoencoder & CNN-RNN
English language pronounced by Japanese speakers
Several issues during the training and testing with low accuracy
Korkmaz et al. [2]
KNN
Turkish
90% (continued)
286
Z. Al-Jumaili et al. Table 3. (continued)
Literature Review
Methodology
Accent types
Testing Accuracy
Rizwan et al. [13]
SVM & ELM
English
Around 77%
Proposed method
Deep transfer learning
English accent
Around 95% (accuracy)
6 Conclusion In this paper, accent detection and classification system is proposed with the aim to be used for the detection of lie in order control. Deep transfer learning is proposed in our proposed system which has provided a significant improved result. A good classification of accents can provide an insight of the origin as well as the heritage of the speakers. This is certainly required with languages with multiple area and dialects and spoken widely by various populations such as the English language. Our extensive simulation results showed a good accuracy of over 90% when using deep transfer learning. Much research is required to gather and simulate our own dataset for the purpose of detection of lies in border control as well as the use for court proceedings, healthcare, border security and domestic lies. Another direction of research will involve the use of NIST for speaker recognition evaluation.
References 1. FluentU, English Language and Culture Blog, https://www.fluentu.com/blog/english/howmany-countries-speak-english Accessed on 1 Feb 2022 2. Korkmaz, Y., Boyacı, A.: A comprehensive Turkish accent/dialect recognition system using acoustic perceptual formants. Applied Acoustics 193, 108761 (2022) 3. Faria, A.: Accent classification for speech recognition. In: Machine Learning for Multimodal Interaction, pp. 285–293 (2005) 4. Chen, P., Lee, J., Neidert, J.: Foreign accent classification. https://cs229.stanford.edu/pro j2011/ChenLeeNeidert-ForeignAccentClassification.pdf Accessed 2 Feb 2022 5. Radzikowski, K., Wang, L., Yoshie, O., Nowak, R.: Accent modification for speech recognition of non-native speakers using neural style transfer. EURASIP J. Audio, Speech, Music Processing 2021(1), 1 (2021). https://doi.org/10.1186/s13636-021-00199-3 6. Chen, T., Huang, C., Chang, C., Wang, J.: On the use of Gaussian mixture model for speaker variability analysis. In: The International Conference SLP, Denver, CO (2002) 7. IBM Cloud Education (2020) https://www.ibm.com/cloud/learn/convolutional-neural-net works Accessed on 2 Feb 2022 8. Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio, Speech, Language Process. 22(10), 1533–1545 (2014) 9. A Database of Non-Native Accents to Assist Neural Speech Recognition, retrieved from AccentDB: https://accentdb.org/about/index.html Accessed on 3 Feb 2022 10. An Sheng, L.M., Wei Xiong Edmund, M.: Deep Learning Approach to Accent Classification https://cs229.stanford.edu/proj2017/final-reports/5244230.pdf, Accessed on 3 Feb 2022
Classification of Spoken English Accents Using Deep Learning
287
11. Detect covid19 with CNN | Detect COVID-19 From Mel Spectrogram. (2021, June 30). Detect covid19 with CNN | Detect COVID-19 From Mel Spectrogram: https://www.analyticsvidhya.com/blog/2021/06/how-to-detect-covid19-cough-frommel-spectrogram-using-convolutional-neural-network/ 12. Transfer Learning for Machine Learning. (2021, June 29). Seldon: https://www.seldon.io/tra nsfer-learning 13. Muhammad, R., David V.: A weighted accent classification using multiple words. Neurocomputing, 277, 120-128 (2018) https://doi.org/10.1016/j.neucom.2017.01.116
A Stable Community Detection Approach for Large-Scale Complex Networks Based on Improved Label Propagation Algorithm Xiangtao Chen(B) and Meijie Zhao College of Computer Science and Electronic Engineering, Hunan University, Changsha 410000, China [email protected]
Abstract. The detection of community structure plays an important role in understanding the properties and characteristics of complex networks. Label propagation algorithm (LPA) is a classical and effective method, but it has the problems of unnecessary node update operation and unstable label update strategy. In this paper, a stable community detection approach for large-scale complex networks based on improved label propagation algorithm (CBR-LPA) is proposed. CBRLPA first finds the initial boundary nodes and then performs label propagation on the boundary nodes. In fact, we search for the boundary nodes by node closeness, and only the boundary nodes are label redetected. Moreover, we modify the label propagation strategy to reduce the generation of random choices. Experimental results on both synthetic and real-world networks show that the CBR-LPA algorithm can discover stable and high-quality community structures in a short execution time, and the initial boundary node found according to our proposed approach is effective. Keywords: Community detection · Label propagation · Boundary node · Complex networks
1 Introduction With the rapid development of science and technology, many complex networks have emerged, such as bio-information, email correspondence, social and scientist cooperation networks. A graph structure is usually used to represent a complex network. Community is an essential structural. Community detection is one of the key techniques for mining the features and patterns of complex networks. Label propagation is a classical class of community detection algorithms. Raghavan et al. [1] first proposed a community detection algorithm based on label propagation, LPA. The algorithm has low time complexity and can perform community detection relatively quickly. However, the algorithm will discover a single large community, and often after five iterations, 95% of nodes are already correctly clustered [1]. The algorithm has high randomness and unstable results due to the label update rules. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 288–303, 2022. https://doi.org/10.1007/978-3-031-13832-4_25
A Stable Community Detection Approach for Large-Scale Complex Networks
289
According to the personnel proposed by previous studies, often after five iterations, 95% of the network nodes are already correctly clustered, and most of the nodes that have updated their labels are boundary nodes, so in 2019 Mursel et al. [2] proposed the G-CN algorithm, which finds communities by using boundary nodes to identify the borderlines between communities. The algorithm only performs label propagation on boundary nodes. However, the algorithm initially identifies all nodes as boundary nodes, and results in the problem that the initial boundary nodes are divided unclearly. The algorithm also suffers from unstable results. As shown in Fig. 1, we have done an experiment on the karate [3] dataset where the red nodes indicate the initial boundary nodes found by the G-CN algorithm. Figure 2 also gives the real boundary nodes, where the red nodes represent the real boundary nodes and the blue and green represent two different real community nodes. We also did experiments on the stability of the G-CN algorithm on the karate dataset, using the evaluation metric of modularity, for 50 experiments as seen in Fig. 3. It can be seen that the G-CN algorithm is not stable.
Fig. 1. Initial boundary nodes found by G-CN Fig. 2. The real boundary nodes of the karate algorithm on the karate dataset dataset
Fig. 3. Modularity values for the G-CN algorithm for fifty experiments on the karate dataset
In this paper, we propose a stable community detection approach for large-scale complex networks based on improved label propagation algorithm. Firstly, a unique label is assigned to each node, and the boundary nodes are obtained using the node closeness score suggested in this paper. Then a node is randomly selected in the set of boundary nodes, and the community score around the node is calculated based on the common neighborhood and node degree, and the community label with the highest score is used as the new label of the node. Re-detect if a neighbor node becomes a boundary node. Until the set of boundary nodes is empty, the algorithm is stopped.
290
X. Chen and M. Zhao
The main contributions of this paper include the following three aspects. 1. Suggesting a community detection algorithm based on node closeness and boundary node re-checking. 2. Proposing a method for calculating node closeness score to find boundary nodes more accurately. Suggesting a label propagation strategy on boundary nodes. 3. Extensive experiments have been conducted on a variety of datasets, including 11 real datasets, and artificial synthetic datasets with different complexity of network structure. Experimental results show that the proposed algorithm can shorten the running time and improve the accuracy. The paper is organized as follows: Sect. 2 describes related research on community detection; Sect. 3 introduces the CBR-LPA algorithm in this paper; Sect. 4 performs experimental analysis, and finally concludes the whole paper.
2 Related Work The LPA algorithm has low time complexity, good classification effect, and is suitable for large-scale networks. However, the algorithm has randomness in the label propagation process. In response to this problem, researchers have done a lot of this algorithm. The paper [4] proposes a label propagation algorithm GLLPA, the GLLPA algorithm proposes a new label initialization strategy that uses node influence and label influence to solve the instability problem of label propagation ideas. Yan et al. [5] proposed a density-peak based label propagation community discovery algorithm, which finds clustering centers based on density peaks. Li et al. [6] proposed the MELPA algorithm, which reconstructs the topology of the network, and then develops a new label propagation strategy based on node influence, node closeness, label frequency and propagation characteristics.. Mahdi et al. [7] proposed the DPNLP algorithm, which uses local information to form the network into an initial association structure for label propagation. Li et al. [8] proposes a stable community detection algorithm based on density peak clustering and label propagation. It takes advantage of topological information of networks, and improves robustness. As shown in [1], the LPA algorithm often after five iterations, 95% of nodes are already correctly clustered. The selection rules of the G-CN algorithm for the initial boundary nodes lead to the initial set of node updates being all the nodes in the network. Researchers have done a lot of research on boundary nodes. The paper [9] performs graph traversal while deciding whether a node is a boundary node by calculating node influence and merges the boundary nodes and neighboring communities in the network to form community structure. The SD-GCN [10] algorithm updates the labels of the boundary nodes simultaneously based on the scores of the degree centrality and the number of common neighbors metric, and then uses a density-based approach to merge communities. In order to address these shortcomings, this paper proposes a community detection algorithm based on node closeness and boundary nodes re-checking. Targeted boundary nodes for their label updates reduces the number of label update operation, thus achieving
A Stable Community Detection Approach for Large-Scale Complex Networks
291
stability of results and shorter running time. At the same time, the label update rules are improved to combine community common neighborhood and node degree to reduce the randomness of the algorithm and improve the quality.
3 Proposed Algorithm In this paper, a stable community detection approach for large-scale complex networks based on improved label propagation algorithm (shorted as CBR-LPA) is proposed. CBR-LPA includes the following two stages, as shown below: 3.1 Boundary Nodes Generation For ease of description, we use the concepts of Xie and Szymanski [11] to mark the nodes. If a node and its neighbors are in the same community, the node is called an internal node. If a node is not an internal node, then the node is called a boundary node. The boundary node has at least one neighbor node that is not in the same community. As shown in Fig. 4, the nodes in the dotted line constitute community C, the black nodes in community C are internal nodes, and the green nodes represent boundary nodes.
Fig. 4. Schematic diagram of boundary nodes
We study the initial boundary nodes and propose a formula to calculate the node closeness based on the network structure characteristics of the boundary nodes.: Iscore (v) =
u∈(v)
Juv /degree(v)
(1)
where Juv is the Jaccard index. The smaller the Iscore (v), the smaller the closeness between the node and its neighbors, and the more likely it is to become boundary node.
292
X. Chen and M. Zhao
Based on the node closeness Eq. (1) proposed above, we propose Algorithm 1 to find the initial boundary nodes of the network. The second line indicates that the node label initialization is performed first. Lines 4 to 10 indicate that If the node degree is not one or two, do not added to the initial set of boundary nodes. Lines 6 to 9 indicate that nodes with closeness less than α join the set of initial boundary node S.
The step of initializing the node labels, as is shown in Algorihtm 2, where lines 1 to 4 assign a unique label to each node, and lines 5 to 15 perform a round of label propagation to obtain the initial labels. 3.2 Label Propagation Based on the Number of Common Neighbors and Node Degree We propose a community group approach to update nodes’ labels based on the number of common neighbors and node degree: L(v) =
argmaxBv (k) k
(2)
A Stable Community Detection Approach for Large-Scale Complex Networks
Bv (k) =
bv (u)
u ∈ (v) L(u) = k bv (u) =
293
∩uv + 1/degree(u), du = 1 0, du = 1
(3)
(4)
where L(v) is the label of v, Bv (k) is the score of the community that belongs to the number k community, bv (u) is the efficiency score, ∩uv is the intersection of neighbor nodes of u and v, and degree(u) denotes the set of neighboring nodes of node u. The specific implementation steps are as follows: 1. Randomly select a node i from the set of boundary nodes S; 2. Determine whether to update the label according to Eq. (2); 3. If the node i updates the label, it is judged whether its neighbor node becomes a border node. Return to step 1 until the boundary node set S is empty.
294
X. Chen and M. Zhao
4 Experiments 4.1 Experimental Environment The experimental environment is: a 2.4 GHz Intel Core i5 CPU, 8GB memory, and the operating system is macOS. All experiments were implemented in python3. 4.1.1 Datasets 1. LFR Benchmark Network An artificial synthetic data set is randomly generated by the LFR benchmark program [20], N, k, maxk, minc, maxc denote the number of nodes, average node degree, maximum node degree, minimum community number, maximum community number, and μ is a mixed parameter, which indicates the complexity of the network structure. As μ increases, the network structure becomes more complex. Table 1. Details of LFR artificial networks. No
N
k
maxk
minc
maxc
µ
L1
10000
20
100
30
150
0.1–0.8
2. Real-world Networks Table 2 shows real networks, V, N, dmax , davg, c denote the numbers of nodes, edges and maximum node degree, average node degree and number of real communities (Table 1).
4.1.2 Baselines We compare CBR-LPA algorithm with representative approaches to label propagation based community detection algorithms and others, which includes: 1. LPA: Adopt the neighbor node label majority rule for label propagatio; 2. G-CN: Based on the idea of label propagation, the border node adopts the neighbor community with the highest score for community detection; 3. Infomap [21]: Based on the ideas of random walk coding and information compression coding, nodes with greater similarity are assigned to the same community; 4. Louvain [22]: A method of multi-level optimization of modularity; 5. GN [3]: Hierarchical clustering algorithm based on splitting. 4.1.3 Evaluation Metrics 1. Normalized mutual information(NMI) NMI is used to compare the similarity between real community detection and algorithm detection. If the structure of the community detection obtained by the algorithm is similar to the real community detection structure, the larger the NMI value.
A Stable Community Detection Approach for Large-Scale Complex Networks
295
Table 2. Details of real-world networks No
Datesets
V
E
dmax
davg
c
1
karate
34
78
17
4
2
2
Football [13]
115
613
12
10
12
3
Dolphin [14]
62
159
12
5
2
4
polBooks [15]15
105
441
25
8
3
5
Email [16]
1K
5K
71
9
/
6
Netscience [17]
2K
3K
34
3
/
7
ca-HepPh [18]
12K
118K
491
19
/
8
ca-AstroPh [18]
19K
198K
504
21
/
9
ca-CondMat [18]
23K
93K
279
8
/
10
Amazon [19]
335K
926K
549
5
/
11
DBLP [19]
317K
1M
343
6
/
NMI (A, B) =
−2 NA
NA NB i=1
j=1 mij log
n∗mij mi ∗mj
m mi NB j n + j=1 mj log n
(5)
i=1 mi log
where A and B respectively represent the real partition situation and the actual community detection situation determined by the algorithm, n represents the total number of nodes in the complex network, NA represents the number of real partitions, NB represents the actual community partition number of the algorithm, and mi represents the node of the ith real community The number, mj represents the number of nodes divided by the jth community, and mij represents the number of public nodes between the real community i and the community detection j. 2. Modularity(Q) Modularity is used to evaluate the quality of algorithmic. The larger the value of modularity, the higher the quality of the community mined by the algorithm. 1 |C| (6) Q(C) = 2m u,v∈Ck (Auv − ku kv /2m) k=1 where C represents the result of the algorithm community detection, m represents the number of all edges in the detection network, Ck represents the kth community divided by the algorithm, u and v are nodes belonging to the kth community; Auv represents whether node u and node v exist Link, if it exists, then Auv =1, otherwise Auv =0; ku represents the number of neighbor nodes of node u, that is, degree. 3. Confusion matrix We use confusion matrix to describe the true and predicted results, as follows (Table 3):
296
X. Chen and M. Zhao Table 3. Confusion matrix Predicted Actual
Boundary nodes
Internal nodes
Total
Boundary nodes
BB
IB
AB
Internal nodes
BI
II
AI
Total
PB
PI
|V|
The definitions of Precision, Recall, Accuracy, and the harmonic mean F-score of Precision and Recall are as follows: Precision = BB/PB
(7)
Recll = BB/AB
(8)
Accuracy = (BB + II )/|V |
(9)
F − score = 2 × (Precision × Recll)/(Precision + Recll)#
(10)
4.2 Influents of the Parameter The parameter α controls the number of initial boundary nodes, and the node closeness score is calculated by the Eq. (1). The closer to the surrounding nodes, the closer the score is to 1, the closer to 0, the more likely it is to become a boundary node. Therefore, the
Fig. 5. When the value range of the mixing parameter μ is 0–0.7, the comparison experiment results of execution time when the parameter α changes
A Stable Community Detection Approach for Large-Scale Complex Networks
297
value range of α is [0,1], and the experiment is performed on the artificially synthesized network from 0 to 1. Experiments were performed on the synthetic networks within 0 to 1, as shown in Fig. 5 and Table 4 below. Figure 5 and Table 4 show the comparison results of different α values on the artificial synthesis network of the L1 group. The artificial synthesis network parameter μ changes from 0.1 to 0.7. On these seven datasets with different mixed parameters, the network structure is different, the running time and the NMI change trend are similar, so the parameter α will not be affected by the dataset. The experimental results of the running time show an overall trend of increasing with the increase of the parameter α. In the range of the value of α from 0.1 to 0.3, the running time is relatively stable. Between 0.3 and 1.0, the running time first increases and then gradually remains unchanged. NMI generally decreases with the increase of α, but the changing trend is small. Table 4. When the value range of the mixing parameter μ is 0.1–0.7, the comparison experiment results of NMI when the parameter α changes
μ α 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.997 0.998 1.000 1.000 1.000 1.000 0.999 1.000 1.000 1.000
1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.999 1.000 1.000 1.000 1.000 0.999 1.000 1.000 1.000 0.999
0.998 0.998 0.998 0.998 0.998 0.998 0.998 0.997 0.998 0.997
0.991 0.992 0.988 0.990 0.990 0.989 0.989 0.989 0.989 0.989
0.970 0.970 0.921 0.922 0.970 0.966 0.965 0.965 0.965 0.964
0.859 0.725 0.855 0.855 0.855 0.856 0.857 0.853 0.854 0.854
From the experimental results, it can be concluded that the value range of α is between 0.1 and 0.3, and it is verified that the value of parameter α mainly affects the running time. 4.3 Effectiveness of Initial Boundary Nodes We used Precision, Recall, Accuracy, and the harmonic mean F-score of Precision and Recall [12] to measure the effectiveness of finding the boundary nodes. We use the BFT [9], G-CN as comparison algorithms. BFT discovers overlapping community. Table 5 shows the experiments comparing CBR-LPA with the G-CN and BFT on four real datasets, where the bold font indicates the best performance of the metric on that dataset. Table 6 shows the information of the four real datasets. On the karate and dolphin datasets, CBR-LPA found the boundary nodes more accurately. However, G-CN outperformed CBR-LPA on football and polbooks, even reaching 1.0 for each evaluation
298
X. Chen and M. Zhao
Table 5. The experiments comparing CBR-LPA with the G-CN and BFT on four real datasets. Datasets
Algorithms
Karate
G-CN BFT CBR-LPA
Football
G-CN BFT
Dolphin
34
13
Precision
Recall
F-score
Accuracy
0.382
1.000
0.553
0.382
2
1
0.500
0.077
0.133
0.618
12
9
0.750
0.692
0.720
0.794
115
115
1.000
1.000
1.000
1.000
3
3
1.000
0.026
0.051
0.026
96
96
1.000
0.835
0.910
0.765
G-CN
62
9
0.145
1.000
0.254
0.145
CBR-LPA G-CN BFT
6
1
0.167
0.111
0.133
0.790
18
6
0.333
0.667
0.444
0.758
105
84
0.800
1.000
0.888
0.800
4
4
1.000
0.048
0.091
0.238
97
77
0.794
0.917
0.851
0.743
G-CN
0.582
1.000
0.674
0.582
BFT
0.667
0.065
0.102
0.418
CBR-LPA
0.719
0.778
0.731
0.765
CBR-LPA Average
BB
CBR-LPA BFT Polbooks
PB
Table 6. Information on datasets, the number of true boundary nodes and the ratio of boundary nodes to total nodes. Datesets
Karate
Football
Dolphin
Polbooks
|V|
34
115
62
105
AB
13
115
9
84
AB/|V|
38.24%
100%
14.52%
80%
metric on football. This is because G-CN treats all network nodes as boundary nodes directly, without analyzing the network structure properties of the boundary nodes, and it happens that these two datasets have a high percentage of AB/|V|. So G-CN is more accurate in these two datasets. CBR-LPA has better average metrics than G-CN and BFT on the four datasets. Our CBR-LPA algorithm identifies boundary nodes with high accuracy when AB/|V| is low. And still performs well when the AB/|V| is high. On the average of the metrics on these four datasets, CBR-LPA performs the best except for the recall. When the proportion of boundary nodes to the summary points of the network is low, CBR-LPA is able to find the more accurate true boundary nodes.
A Stable Community Detection Approach for Large-Scale Complex Networks
299
4.4 Community Detection In order to verify the effectiveness of the CBR-LPA algorithm community detection results on real datasets and artificial datasets, the CBR-LPA algorithm is compared with other comparative algorithms. Since the GN algorithm exceeds the two-hour time limit on all but the small real datasets, experiments are not performed on datasets other than the small real datasets. 4.4.1 Real-world Networks In this subsection, we divided the real datasets into three groups according to their respective characteristics. We did Q and NMI experiments on the small dataset, as shown in Table 7. Experiments on time and Q were done on the medium-sized dataset, as shown in Table 8. Experiments were done on the large dataset for time and modularity, as shown in Table 9. Table 7. Experiment results on small real-world networks with ground-truth communities. Karate
Football
Dolphin
Polbooks
Q
0.371
0.553
0.518
0.526
0.492
NMI
1.00
0.902
0.515
0.430
0.712
G-CN
Q
0.364
0.564
0.501
0.510
0.485
NMI
0.800
0.905
0.538
0.401
0.661
LPA
Q
0.334
0.586
0.485
0.505
0.477
NMI
0.603
0.893
0.531
0.402
0.607
Infomap
Q
0.402
0.579
0.522
0.524
0.507
NMI
0.691
0.901
0.481
0.39
0.616
Q
0.418
0.582
0.502
0.52
0.506
NMI
0.587
0.812
0.507
0.404
0.577
Q
0.401
0.592
0.502
0.517
0.503
NMI
0.401
0.830
0.554
0.433
0.555
CBR-LPA
Louvain GN
Average
Table 7 shows the experimental results on small real-world networks with groundtruth communities. The Infomap algorithm has the highest average modularity. Among the three community detection algorithms based on the idea of label propagation, LPA, G-CN and the CBR-LPA algorithm, CBR-LPA has the best modularity value. In particular, on the karate dataset, the NMI result of the CBR-LPA is 1.0, which means that it finds exactly the correct community division. And the CBR-LPA algorithm has the best average NMI performance, which is due to the fact that CBR-LPA uses the common neighborhood and node degree for label propagation, so the discovered community structure has a high accuracy. Table 8 shows the experimental results on real-world networks without groundtruth communities, where the execution time T is in seconds. Louvain algorithm has
300
X. Chen and M. Zhao
the highest modularity. This is because the Louvain algorithm uses the maximization of the community modularity for community detection. CBR-LPA algorithm compares G-CN and LPA, which are also label propagation idea, with the highest modularity. The CBR-LPA algorithm has the fastest running time on all datasets. Table 8. Experiment results on real-world networks without ground-truth communities. Email
Netscience 0.040
ca-HepPh
ca-AstroPh 5.296
T(s)
0.029
Q
0.515
0.902
0.556
0.555
0.623
0.630
G-CN
T(s)
0.070
0.050
8.400
16.347
3.576
5.689
Q
0.479
0.864
0.545
0.539
0.609
0.607
T(s)
0.169
0.053
4.190
20.750
4.010
5.834
Q
0.230
0.880
0.490
0.315
0.578
0.499
Infomap
T(s)
0.780
0.320
81.980
180.260
Q
0.509
0.875
Louvain
T(s)
0.674
0.279
Q
0.509
0.942
0.597 49.44
2.184
Average
CBR-LPA
LPA
4.607
ca-CondMat
163.11
2.431
85.29
0.546
0.632
0.632
234.004
68.099
70.499
0.521
0.671
0.651
0.612
Table 9 shows the experimental comparison results on two real large-scale networks,where the execution time is in seconds. The experiment set a two-hour run time limit, and both Infomap and Louvain ran more than two hours. CBR-LPA takes longer time on Algorithm1, which is to compute the initial boundary nodes. However, due to finding the boundary nodes more accurately and using an optimized label propagation strategy, the nodes labels quickly converge to stability. CBR-LPA performs best on Q. Table 9. Experimental results on big real-world networks. CBR-LPA
G-CN
LPA
Amazon
DBLP
Amazon
DBLP
Amazon
DBLP
Execution time(s)
43.160
45.390
32.780
44.210
65.52
82.96
Algorithm1’s time(s)
34.382
29.248
—
—
—
—
Label propagation’s time(s)
7.187
7.245
23.585
34.787
63.90
80.40
Label propagation’s iteration number
327663
312097
576720
589556
—
—
Q
0.743
0.676
0.724
0.659
0.709
0.625
A Stable Community Detection Approach for Large-Scale Complex Networks
301
4.4.2 LFR Benchmark Networks Figure 6 shows the results of the NMI experiment on the artificial synthetic network of the L1 group. It can be seen that the NMI of all algorithms decreases with the increase of the mixing parameter μ, that is, the complexity of the network structure. When μ is in the range of 0.1 to 0.7, the CBR-LPA algorithm ranks behind the Infomap algorithm, but it is still better than the other comparison algorithms. When μ > 0.6, the NMI of most algorithms drops sharply. When μ > 0.7, the CBR-LPA algorithm performs best, and still has high accuracy rate. Figure 7 shows the results of the running time on the L1 artificial synthesis network. The figure shows that when μ is in the range of 0.1 to 0.5, the LPA algorithm has the lowest running time. When μ > 0.5, the time efficiency of the CBR-LPA algorithm is better than other algorithms. When the network structure gradually becomes more complex, the initial boundary nodes found by the CBR-LPA algorithm in this paper are still more accurate, and only the boundary nodes are propagated with labels instead of all the nodes of the network, which reduces the time complexity.
Fig. 6. NMI scores of our method and known algorithms on LFR benchmark network datasets.
Fig. 7. Execution times of our method and known algorithms on LFR benchmark network datasets.
4.4.3 Algorithm Stability Due to the random nature of the label propagation algorithm, we conducted 50 experiments on karate dataset. The variance is used to measure the stability of the label propagation idea, as shown in Table 10. The smaller the variance, the more stable it is. It can be seen that the LPA algorithm has the highest randomness, and our proposed algorithm has lower randomness in the label propagation phase, so it has the highest stability among the algorithms based on the label propagation idea.
302
X. Chen and M. Zhao Table 10. Stability test of the algorithm, with variance as a measure
D(X)
CBR-LPA
G-CN
LPA
0.000025
0.000155
0.007564
5 Conclusion Aiming at the problems of unclear boundary node detection and unstable results of existing label propagation algorithms, we propose a community detection algorithms based on node closeness and boundary node re-checking. It includes two phases. Firstly, the initial boundary nodes are obtained according to the node closeness suggested in this paper. This step can find more definite community boundaries at the beginning. Then, the labels of the set of boundary nodes are updated. The algorithm in this paper uses a method that precisely determines the boundary nodes while tightly propagating the labels to the boundary nodes, which shortens the execution time while improving the stability of the algorithm. The algorithm in this paper performs well in both small and large community structures. But, the proposed algorithm is only for non-overlapping community detection. In the future, we will consider solving the issue of boundary nodes and stability of overlapping community detection. Acknowledgments. This research has been supported by the National Natural Science Foundation of China under Grant (No. 61873089) and Natural Science Foundation of Hunan Province of China (No.2021JJ30134).
References 1. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Physical Review E 76(3 Pt 2), 036106 (2007) 2. Tasgin, M., Bingol, H.O.: Community detection using boundary nodes in complex networks. Physica A: Statistical Mechanics its Appl. 513 (2018) 3. Zachary, W.W.: An information flow model for conflict and fission in small groups. J Anthropol Res 33, 452–473 (1977) 4. Zhang, Y., Liu, Y., Jin, R., et al.: Gllpa: a graph layout based label propagation algorithm for community detection. Knowl.-Based Syst. 206, 10636 (2020) 5. Ma, Y., Chen, G.: Label propagation community detection algorithm based on density peak optimization. In: Proc of 2021 17th International Conference on Computational Intel-ligence and Security (CIS) (2021) 6. Li, C., Tang, Y., Tang, Z., et al.: Motif-based embedding label propagation algorithm for community detection. Int. J. Intell. Syst. 37(3), 1880–1902 (2022) 7. Zarezadeh, M., Nourani, E., Bouyer, A.: DPNLP: distance based peripheral nodes label propagation algorithm for community detection in social networks. World Wide Web 25(1), 73–98 (2022) 8. Li, C., Chen, H., Li, T., Yang, X.: A stable community detection approach for complex network based on density peak clustering and label propagation. Applied Intelligence (2021)
A Stable Community Detection Approach for Large-Scale Complex Networks
303
9. Wujian, Z.W., Yichen, H.S.: Local community detection algorithm based on graph traversal. Appl. Res. Comput. 36(09), 2636–2670 (2019) 10. Zarezade, M., Nourani, E., Bouyer, A.: Community detection using a new node scoring and synchronous label updating of boundary nodes in social networks. J. Artificial Intelligence Data Mining 8(2), 201-212 (2020) 11. Xie, J., Szymanski, B.K.: Community detection using a neighborhood strength driven label propagation algorithm. In: IEEE Computer Society, pp. 188-195 (2011) 12. Ma, L., Huang, H., He, Q., et al.: GMAC: A Seed-Insensitive Approach to Local Community Detection. Springer, Berlin Heidelberg (2013) 13. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc Natl Acad, U S A 99(12), 7821–7826 (2002) 14. Lusseau, D., Schneider, K., Boisseau, O.J., et al.: The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behavioral Ecology & Sociobiology 54(4), 396–405 (2003) 15. Krebs, V.: http://www.orgnet.com/ (unpublished). 16. Guimerà, R., Danon, L., Díaz-Guilera, A., et al.: Self-similar community structure in a network of human interactions. Physical Review E 68(6 Pt 2), 065103 (2004) 17. Newman, M.: The structure and function of complex networks. Siam Review (2003) 18. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (ACM TKDD), 1(1) (2007) 19. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. ICDM (2012) 20. Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E. 80(2), 056117 (2009) 21. Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. U.S.A. 104(18), 7327–7331 (2006) 22. Blondel, V.D., Guillaume, J.L., Lambiotte, R., et al.: Fast unfolding of communities in large networks. Journal of Statistical Mechanics Theory & Experiment (2008)
An Effective Method for Yemeni License Plate Recognition Based on Deep Neural Networks Hamdan Taleb1,2(B) , Zhipeng Li1 , Changan Yuan3,4 , Hongjie Wu5 , Xingming Zhao6 , and Fahd A. Ghanem7,8 1 Institute of Machine Learning and Systems Biology, School of Electronics and Information
Engineering, Tongji University, Shanghai 201804, China [email protected] 2 Department of Information Technology, College of Engineering and Information Technology, Aljanad University for Science and Technology, Taiz, Yemen 3 Guangxi Academy of Science, Nanning 530007, China 4 Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy Sciences, Guangxi, China 5 School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China 6 Institute of Science and Technology for Brain Inspired Intelligence (ISTBI), Fudan University, Shanghai 200433, China 7 Department of Computer Science and Engineering, PES College of Engineering, University of Mysore, Mandya, India 8 Department of Computer Science, College of Education – Zabid, Hodeidah University, Hodeidah, Yemen
Abstract. As the existing publicly license plate recognition (LPR) datasets available for training are restricted and almost non-existent in some countries, for example, some developing countries. In this paper, first we present the first Yemeni License Plate dataset (Y-LPR dataset) includes vehicles and license plate images for Yemeni license plate detection and recognition. Second, we propose a new LPR method for license plate detection and Recognition. It consists of two key stages: First, License plate detection from images based on the latest state-of-theart deep learning-based detector which is YOLOv5. Second, Yemeni Character and number recognition based on the CRNN model. Experimental results show that our method is effective in detecting and recognizing license plates. Keywords: License plate detection · Character recognition · Deep neural networks
1 Introduction: Automatic license plate detection and recognition (ALPR) plays an important role nowadays in the field of intelligent transportation systems (ITS) [1–3], such as Entrance and exit management systems in car parks, collection of toll payments, traffic control systems, enforcement of traffic laws, and controlling security measures in military areas and security and protected havens. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 304–314, 2022. https://doi.org/10.1007/978-3-031-13832-4_26
An Effective Method for Yemeni License Plate Recognition
305
With advances technology over the past years in the field of machine learning and deep learning [4–8] based on artificial neural networks [9–11] many researchers have developed different methods to license plate detection and recognition [12–18]. For example, Hui Li, et al. [19] proposed system to detect car license plates using the promising CNN method. To the present time, there are still several restrictions on ALPR systems, For Example, each country has its own LP which includes specific features, such as size, color, and position of letters and numbers. Also publicly license plate recognition (LPR) datasets available for training are restricted and almost non-existent in some countries, for example, some developing countries. In this paper, we focus on the Yemeni license plate recognition dataset. In 1993 vehicle license plates started in Yemen and the current version started in 2007. In this paper, we introduce the ALPR framework which has good performance in real-time, it is dedicated to the Yemeni License plate, which is not explored yet. Our first contribution is the part of the dataset, we collected data from a variety of real-time scenarios in different cities in Yemen at different weather times, in daylight, and in the dark. After that, we applied the data augmentation technique to these images, the final dataset called Y-LPR dataset. The second contribution of this paper is, the ALPR method which includes two sub-task, the first task is License Plate Detection, and for this task we used the popular CNN model Yolo-v5. The second task, character and numbers Recognition for License plates, for this task, we used the CRNN [20] model which is widely used in Optical Character Recognition (OCR) with the Spatial Trans-former Networks (STN) [21]. The main contributions of this paper are summarized below: (1) We propose a new LPR method for license plate detection and Recognition. It consists of two key stages: 1) License plate detection from images based on the latest state-of-the-art deep learning-based detector which is YOLOv5. 2) Yemeni Character and number recognition. (2) We present the first Yemeni License Plate dataset (Y-LPR dataset) includes vehicles and license plate images for Yemeni license plate detection and recognition. This paper is arranged as follows: Sect. 2 discusses some work related to the issue of LPR. Section 3 presents our methodology which includes two sub-parts, the first part describes our proposed dataset (Y-LPRD Dataset), and the second part describes the proposed ALPR Method. In Sect. 4 we discussed some experimental results. The final conclusions of this paper and future work are summarized in Sect. 5.
2 Related Work According to previous works published in the literature [22, 23], the task of ALPR can be classified into three subtasks, which are: License Plate (LP) Detection, LP Segmentat-ion [24, 25], and character and numbers Recognition. Also, some works [26, 27] have categorized the task of ALPR into two subtasks which are: License Plate Detection (LPR) and Character Recognition. In this section, we review several previous works based on deep neural networks in ALPR.
306
H. Taleb et al.
2.1 License Plate Detection In this step, the location of the license plate (LP) is determined in the raw image or video by placing a square surrounding the license plate. In the last decade, researchers have used a lot of computer vision and image processing techniques to detect vehicle license plates. Also, other researchers used artificial neural networks, deep learning, and machine learning methods [12, 13, 28]. Over the past decade, several methods based on CNNs [23, 29–31] have been developed for object detection which are used for LP detection with efficient and economical advantages, such as SSD [32], R-CNN [33], Faster R-CNN [34], YOLO [35], FCOS detectors. Qiuying Huang et al. [36] proposed ALPR-Net method based on FCOS [37] detectors to detect license plate and character recognition. Many of the real-time ALPR methods applied YOLO object detection model or its modified versions to perform the task of extracting the license plate detection. Lele Xie et al. [29] proposed MD-YOLO system based on CNN for License Plate Detection. S. M. Silva et al. [22] used FAST-YOLO model to detect the LPD in the Brazilian LP dataset. 2.2 License Plate Recognition After obtaining the vehicle’s license plate, the second stage in the LPR system is the re-cognition of the characters and numbers of the plate. There are many methods used for this task in the literature, such as Ting Tao et al. [38] used an image segmentation method named sliding-window to extract features of LP and then used the CNN model to re-cognize numbers and characters. Jiafan Zhuang et al. [39] applied semantic segmentat-ion method. Rayson Laroca et al. [40] proposed a CR-NET model based on the YO-LO object detector model network to detect and recognize LP. CR-NET network includes the first 11 layers of YOLO and is improved by adding 4 convolutional layers.
3 Methodology In this paper, we start with describe our proposed dataset (Y-LPRD Dataset) in Sect. 3.1. In Sect. 3.2 we present ALPR Proposed Method. 3.1 The Proposed Dataset The first important contribution of this paper is introduce a new dataset for Yemeni licese plate detection and recognition. Due to the lack of available data, it took us a lot of time and effort to prepare this data. We have collected and captured vehicles images in real-world and then manually validated them. Figure 1 shows an example of images from a dataset that was captured and aggregated under various weather conditions during the day and night, including differ-rent types of vehicles, small and large cars, buses, and trucks. The dataset contains 1200 Yemeni national LP images was collected during the period from August 2019 to May 2021 in differrent weather times. Also, some images of license plate detection data have been collected from government or private websites available on the Internet. Each image in the dataset contains at least one vehicle, and at least one vehicle’s license plate is clearly visible.
An Effective Method for Yemeni License Plate Recognition
307
Table 1. Features of Yemeni license plate format Vehicle types license plate Image
Background color
Description in English
Description in Arabic
Police cars
اﻟﺸﺮطﺔ
Blue
- اﻟﯿﻤﻦ ﺧﺼﻮﺻﻲ
Blue
Public Private Vehicles
Taxies Vehicles of transport goods, trucks, and tractor-trailers
أﺟﺮه- اﻟﯿﻤﻦ
Yellow
ﻧﻘﻞ- اﻟﯿﻤﻦ
Red
The features of national Yemeni license plate format shown in Table 1, and the governorate symbols and codes on the Yemeni LP shown in Table 2. Yemeni license plate is a complex type, as it contains two lines, the first line on the license plate shows the , private , description of the license, for example, government . The second line on the license plate contains the governorate or transport code on the left and the license plate number on the right.
Fig. 1. Examples of Yemeni license plate dataset
308
H. Taleb et al.
The new Yemeni license plate dataset named called Y-LPRD dataset. The Y-LPR dataset are used for end-to-end license plate detection and recognition. The Yemeni license plates we collected include license plate information for public-private vehicles, government cars, Police cars, taxies, vehicles of transport goods, trucks, and tractortrailers. Table 2. Some of Governorate symbols and codes on the Yemeni LP.
3.2 Proposed ALPR Method In this section, we present our proposed method for license plate detection and recognition (Fig. 2).
Fig. 2. Flowchart of proposed ALPR method
An Effective Method for Yemeni License Plate Recognition
309
3.2.1 License Plate Detection The main aim of this stage is to detect and localize of license plate on vehicle. To do this task there are many machine learning and computer vision techniques have been used. In this work, to locate the license plate on photos on our proposed images in the Y-LPRD dataset, we utilize LabelImage. And then by taking inspiration from previous works [2, 40–42], we propose to use the YOLO-v5 [43] model to complete detect vehicle license plates. YOLO-v5 is the latest version upgraded from YOLO [35] models. It is the fastest real-time object detection model available in the state-of-the-art was utilized to detect objects. 3.2.2 License Plate Recognition After obtaining the vehicle’s license plate, the next stage of LPR task is number and character recognition. In this stage we segment the LP into two parts horizontally, the upper part will be ignored and we will extract the numbers of LP from the lower part. The following steps of LP recognition: First we input LP detected image from previous stage to a spatial transformation (STN) network to transform and align LP data transformation such as rotation, scaling, translation, and other geometry, etc. and then, by following the structure of [9] the CN-N and RCNN networks was designed in this work, the CNN includes 9 convolution layers with 3 × 3 and three max pooling layers with 2 × 2 size. The number of filters for each convolutional layer is 32, 64, 182, 256, and 512. As shown in Fig. 3, the CNN model used
Fig. 3. The pipeline of deep CRNN model for character recognition
310
H. Taleb et al.
to extract the feature from images. And RNN model long-short term memory (LSTM) used to obtain context information. After that we get the predict re-cognition result from last layer of the RNN model. The structure of LSTM was used in the CRNN model, includes of two bidirectional LSTM layers with 256 hidden units and one Connectionist Temporal Classification (CTC) output layer. As a special LSTM, RNN can handle with the gradient vanishing problem through a more complex internal structure (input/output gate, forget gate, and memory cell) [44]. Bidirectional-RNN can training the fore time series and the backward time series respectively, so that the RNN can learning the context information from the series perfect [45]. The loss of the whole improved CNN and CRNN models shown in the following formula: lwhole = lCNN + lCRNN
(1)
4 Experimental Results All the experiments was implemented in Python 3.8 on computer with GPU NVIDIA GeForce GTX 1650. For the LPD, we used YOLO –v5 for LPD of Y-LPR dataset. To extract the characters and numbers from the license plates, we improved a CRNN network. Table 3 shows the structure of our proposed CRNN layers for License plate recognition model. To verify the effectiveness of the proposed method for both LPD from Yemeni vehicles and Yemeni character/numbers recognition, we have used the Y-LPR datasets mentioned in the Sect. 3.1. We divided detected LP dataset to as follows: 85% of dataset for training and testing and the remaining 15% was utilized for verification. After that input LP image into the improved CRNN network structure (Our improved CRNN Network shown in Table 3). Table 3. The structure of our proposed CRNN layers for LPR. Layer
Type
Filter
kernel
1
Input (32x32)
–
–
2
Conv
32
3×3
3
max
32
2×2
4
Conv
64
3×3
5
max
64
2×2
6
Conv
128
3×3
7
max
128
2×2
8
Conv
256
3×3 (continued)
An Effective Method for Yemeni License Plate Recognition
311
Table 3. (continued) Layer
Type
Filter
kernel
9
Fully - connected
512
–
10
Fully - connected
128
–
11
Recognition
Table 4 shows the performance of processing speed for our method for LP detection and LP Recognition. It can be seen that the system can detect License plates and characters/ numbers recognition in a short time, it can reach 0.089 s per image. Table 4. The performance of processing speed for our method of LPD and LPR Time in seconds Our ALPR method
Accuracy (%)
LPD
LPR
Total
0.014
0.075
0.089
92.01
We compared our proposed method with the state-of-the-art methods used for license plate detection and recognition. Table 5 shows the performance comparison of our proposed method with different methods which are ALPR [46], CDLP-LPR [47], DELP-DAR [13], Iranian-ALPR [48]. Table 5. The Precision of our proposed method in Y-LPR dataset for LPR Method
Precision of LPD & LPR (%) LPD
LPR
ALPR [46]
86.20
84.20
CDLP-LPR [47]
87.5
85.30
DELP-DAR [13]
88.40
86.50
Iranian-ALPR [48]
89.10
88.05
Our proposed method
92.5
91.01
To evaluate the performance of our method, we have utilized the Precision measure which is defined as follows: Pr ecision = TP/(TP + FP)
(2)
312
H. Taleb et al.
Where TP denotes true positive and FP false positive. Precision can be calculated by dividing the number of correctly classified samples (LP in license plate detection and characters/numbers in Character Recognition) on the total digits of classified instances.
5 Conclusions and Future Work In this paper, we first present the first Yemeni License Plate dataset (Y-LPR dataset) includes vehicles and license plate images for Yemeni license plate detection and recognition. Second, we propose a novel framework License Plate recognition system for various types of Yemeni LP. It consists of two key stages: the first stage, License plate detection from images based on the latest state-of-the-art deep learning-based detector which is YOLO-v5. The second stage, Character segmentation and recognition. Regarding future research, we will improve our method of recognizing the license plate by recognizing the entire license plate with the Arabic description and specifying the license description. Also, improve the Y-LPRD dataset by adding more license plate images. Acknowledgements. This work was supported by the grant of National Key R&D Program of China (No. 2018AAA0100100 & 2018YFA0902600) and partly supported by National Natural Science Foundation of China (Grant nos. 61732012, 62002266, 61932008, and 62073231), and Introduction Plan of High-end Foreign Experts (Grant no. G2021033002L) and, respectively, supported by the Key Project of Science and Technology of Guangxi (Grant no. 2021AB20147), Guangxi Natural Science Foundation (Grant nos. 2021JJA170204 & 2021JJA170199) and Guangxi Science and Technology Base and Talents Special Project (Grant nos. 2021AC19354 & 2021AC19394).
References 1. Shashirangana, J., et al.: Automated license plate recognition: a survey on methods and techniques. IEEE Access 9, 11203–11225 (2021) 2. Wang, W., et al.: A light CNN for end-to-end car license plates detection and recognition. IEEE Access 7, 173875–173883 (2019) 3. Hoang, V.-T., et al.: 3D facial landmarks detection for intelligent video systems. IEEE Trans. Industr. Inf. 17(1), 578–586 (2021) 4. Shen, Z., et al.: A deep learning model for RNA-protein binding preference prediction based on hierarchical LSTM and attention network. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, p. 1 (2020) 5. Wu, Y., et al.: Person Re-identification by Multi-scale Feature Representation Learning with Random Batch Feature Mask. IEEE Trans. Cogn. Dev. Syst. 13(4), 865–874 (2020) 6. Wu, D., et al.: Attention deep model with multi-scale deep supervision for person reidentification. IEEE Trans. Emerging Topics Comput. Intell. 5(1), 70–78 (2021) 7. Wu, D., et al.: Deep learning-based methods for person re-identification: a comprehensive review. Neurocomputing 337, 354–371 (2019) 8. Wu, D., et al.: A novel deep model with multi-loss and efficient training for person reidentification. Neurocomputing 513, 662–674 (2019)
An Effective Method for Yemeni License Plate Recognition
313
9. Wang, H., et al.: Robust Korean license plate recognition based on deep neural networks. Sensors (Basel) 21(12), 4140 (2021) 10. Huang, D.S.: Radial basis probabilistic neural networks: model and application. Int. J. Pattern Recogn. Artif. Intell. 13(7), 1083–1101 (1999) 11. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015) 12. Li, Z., et al.: License Plate Detection and Recognition Technology for Complex Real Scenarios. In: Huang, D.-S., Bevilacqua, V., Hussain, A. (eds.) ICIC 2020. LNCS, vol. 12463, pp. 241–256. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60799-9_21 13. Selmi, Z., et al.: DELP-DAR system for license plate detection and recognition. Pattern Recogn. Lett. 129, 213–223 (2020) 14. Zherzdev, S., Gruzdev, A.: LPRNet: license plate recognition via deep neural networks. arXiv: 1806.10447 (2018) 15. Zhang, M., et al., Chinese license plates recognition method based on a robust and efficient feature extraction and BPNN algorithm. In: Journal of Physics: Conference Series, vol. 1004 (2018) 16. Xu, Z., et al.: Towards end-to-end license plate detection and recognition: a large dataset and baseline. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 261–277. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-012618_16 17. Yang, Y., Li, D., Duan, Z.: Chinese vehicle license plate recognition using kernel-based extreme learning machine with deep convolutional features. IET Intel. Transport Syst. 12(3), 213–219 (2018) 18. Zhen-Xue, C., et al.: Automatic license-plate location and recognition based on feature salience. IEEE Trans. Veh. Technol. 58(7), 3781–3785 (2009) 19. Li, H., et al.: Reading car license plates using deep neural networks. Image Vis. Comput. 72, 14–23 (2018) 20. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017) 21. Jaderberg, M., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS 2015), vol. 28 (2015) 22. Silva, S.M., Jung, C.R.: Real-time license plate detection and recognition using deep convolutional neural networks. J. Vis. Commun. Image Represent. 71, 102773 (2020) 23. Selmi, Z., Ben Halima, M., Alimi, A.M.: Deep learning system for automatic license plate detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1132–1138 (2017) 24. Liang, X., Wu, D., Huang, D.-S.: Image co-segmentation via locally biased discriminative clustering. IEEE Trans. Knowl. Data Eng. 31(11), 2228–2233 (2019) 25. Lianga, X., Zhua, L., Huang, D.-S.: Multi-task ranking SVM for image cosegmentaiton. Neurocomputing 247, 126–136 (2017) 26. Hendry, Chen, R.-C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019) 27. Hsu, G.-S., Chen, J.-C., Chung, Y.-Z.: Application-oriented license plate recognition. IEEE Trans. Veh. Technol. 62(2), 552–561 (2013) 28. Chen, S.-L., et al.: Simultaneous end-to-end vehicle and license plate detection with multibranch attention neural network. IEEE Trans. Intell. Transp. Syst. 21(9), 3686–3695 (2020) 29. Xie, L., et al.: A new CNN-based method for multi-directional car license plate detection. IEEE Trans. Intell. Transp. Syst. 19(2), 507–517 (2018) 30. Lu, Q., Liu, Y., Huang, J., Yuan, X., Hu, Q.: License plate detection and recognition using hierarchical feature layers from CNN. Multimedia Tools Appl. 78(11), 15665–15680 (2018). https://doi.org/10.1007/s11042-018-6889-1
314
H. Taleb et al.
31. Wang, Q.: License plate recognition via convolutional neural networks. In: 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, pp. 926–929. IEEE (2017) 32. Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-46448-0_2 33. Girshick, R., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA. pp. 580–587. IEEE (2014) 34. Girshick, R.: Fast_R-CNN__2015_paper. In: ICCV, pp. 1440–1448. IEEE (2015) 35. Redmon, J., et al.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016) 36. Huang, Q., Cai, Z., Lan, T.: A single neural network for mixed style license plate detection and recognition. IEEE Access 9, 21777–21785 (2021) 37. Tian, Z., et al.: FCOS: fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9626–9635 (2019) 38. Tao, T., et al.: Object detection-based license plate localization and recognition in complex environments. Transp. Res. Record: J. Transp. Res. Board 2674(12), 212–223 (2020) 39. Zhuang, J., Hou, S., Wang, Z., Zha, Z.-J.: Towards human-level license plate recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 314–329. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_19 40. Laroca, R., et al.: An efficient and layout-independent automatic license plate recognition system based on the YOLO detector. IET Intel. Transport Syst. 15(4), 483–503 (2021) 41. Laroca, R., et al.: A robust real-time automatic license plate recognition based on the YOLO detector. In: 2018 International Joint Conference on Neural Networks (IJCNN). IEEE: Rio de Janeiro, Brazil (2018) 42. Kessentini, Y., et al.: A two-stage deep neural network for multi-norm license plate detection and recognition. Expert Syst. Appl. 136, 159–170 (2019) 43. Online: A. (2021). https://github.com/ultralytics/yolov5 44. Shu, X., Tang, J.: Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1110–1118 (2021) 45. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arxiv.org/1508.01991 (2015) 46. Henry, C., Ahn, S.Y., Lee, S.-W.: Multinational license plate recognition using generalized character sequence detection. IEEE Access 8, 35185–35199 (2020) 47. Omar, N., Sengur, A., Al-Ali, S.G.S.: Cascaded deep learning-based efficient approach for license plate detection and recognition. Expert Syst. Appl. 149, 113280 (2020) 48. Tourani, A., et al.: A Robust deep learning approach for automatic Iranian vehicle license plate detection and recognition for surveillance systems. IEEE Access 8, 201317–201330 (2020)
Topic Analysis of Public Welfare Microblogs in the Early Period of the COVID-19 Epidemic Based on LDA Model Ji Li1(B) and Yujun Liang2(B) 1 Department of Network and New Media, School of Journalism and Communication, Guangdong University of Foreign Studies, Guangzhou 510006, People’s Republic of China [email protected] 2 School of Journalism and Communication, Guangdong University of Foreign Studies, Guangzhou 510006, People’s Republic of China [email protected]
Abstract. Due to the outbreak of COVID-19 in early 2020, a flood of information and rumors about the epidemic have filled the internet, causing panic in people’s lives. During the early period of the epidemic, public welfare information with active energy had played a key role in influencing online public opinion, alleviating public anxiety and mobilizing the entire society to fight against the epidemic. Therefore, analyzing the characteristics of public welfare communication in the early period can help us better develop strategies of public welfare communication in the post-epidemic era. In China, Sina Weibo is a microblog platform based on user relationships, and it is widely used by Chinese people. In this paper, we take the public welfare microblogs released by the Weibo public welfare account “@ 微公益” (Micro public welfare) in the early period of the epidemic as the research object. Firstly, we collected a total of 1863 blog posts from this account from January to April in 2020, and divided them into four stages by combining the Life Cycle Theory. Then the top 10 keywords from the blog posts of different stages were extracted using word frequency statistics. Finally, the LDA topic model were utilized to find out the topics of each stage whose characteristics of public welfare communication were analyzed in detail. Keywords: Text mining · The LDA topic model · COVID-19 · Micro public welfare · Weibo
1 Introduction In the early period of the COVID-19 epidemic, it was easy for people to feel anxious about a lot of epidemic-related information. At this time, public welfare information with positive energy played an important role in spreading epidemic prevention information and alleviating people’s anxiety. In China, Sina Weibo1 is a widely-used microblog platform where people can not only share and disseminate information in real time in 1 Sina Weibo: https://weibo.com/
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 315–328, 2022. https://doi.org/10.1007/978-3-031-13832-4_27
316
J. Li and Y. Liang
the form of multimedia such as words, pictures and videos, but also get various kinds of information from others. Therefore, it provides an important way for the popularization of micro public welfare. In this paper, we explored the hot topics of public welfare information in Weibo in the early period of the epidemic, so as to grasp the characteristics of public welfare dissemination on Weibo. It is conducive to the media to better formulate public welfare dissemination strategies and better unleash the social value of public welfare dissemination in the post-epidemic era. With the advancement of computational communication and social media computing, an increasing number of scholars are using text mining technologies to analyze the data on social media platforms such as Weibo and Facebook. For example, Wu et al. used the sentiment analysis and the LDA topic model to collect user data on Sina Weibo and discover Chinese inhabitants’ attitudes and emotional tendencies towards garbage sorting policies [1]. Koh and Liew explored the potential effect of social media on mental health issues by topic modeling to deal with the loneliness expressed by users on Twitter during the COVID-19 epidemic [2]. This paper uses text mining technologies to analyze the topics of public welfare communication on Weibo in the early period of the COVID-19 epidemic. Firstly, we collect the microblog posts published by Weibo account of “@微公益”2 (Micro public welfare) from January to April, 2020, and divide them into four stages (the initial stage, the outbreak stage, the fluctuation stage and the long tail stage) according to the Life Cycle Theory. Secondly, word frequency statistics are used to get the high-frequency words of each stage for analysis. Thirdly, the LDA topic model is adopted to mine the topics of each stage. Finally, we explore the characteristics of periodic topics of public welfare microblogs, and summarize the communication strategies of Weibo public welfare at different stages.
2 Related Work Text mining, as an important method of social media computing, can be free from the interference by subjective will, so as to find the relationship between human language and their emotions, opinions, attitudes and behaviors more scientifically and objectively. The dictionary approach, unsupervised machine learning, and supervised machine learning are common methods of text mining [3]. For example, Wang et al. used text mining to collect the Chinese people’s views on electronic cigarette on social media [4]. Liu et al. used sentiment analysis to classify brand-related texts published by users on Twitter to explore the emotional differences and rankings between different types of mainstream product brands [5]. In unsupervised machine learning, the Latent Dirichlet Allocation (LDA) topic model is commonly used in topic clustering. For example, Han et al. analyzed the public opinion of COVID-19 in the early days on Sina Weibo by the LDA topic model and random forest algorithm, and they found that different development stages corresponded to different topics and sub-topics of the events [6]. Xu et al. took the comment data on Douban as the study object, and the LDA topic model was utilized to analyze the Chinese texts, which helped film creators understand the public’s watching needs [7]. 2 @微公益: https://m.weibo.cn/u/2089358175.
Topic Analysis of Public Welfare Microblogs
317
The application of Life Cycle Theory to the topic evolution of Weibo hot events has become popular in the research area in recent years, and the evolution analysis of the topic distribution can clearly reveal the focus of issues at different life cycle stages [8]. For example, Zhang et al. analyzed the Hurricane Irma event based on the change in the life cycle of netizens’ views, so as to comprehend the impact of various emotions on the information dissemination [9]. Based on the life cycle theory, Chen et al. took the Sanlu milk powder incident as an example, separated the life cycle of food safety emergencies into five phases and discussed the identification and assessment methods of each phase [10]. In conclusion, the LDA topic model is very useful in text mining practices. Therefore, this paper uses word frequency statistics and the LDA model combining Life Cycle Theory to analyze the topic characteristics of different stages of Weibo public welfare dissemination in the early period of the epidemic, and try to provide some strategies for public welfare communication in the post-epidemic era.
3 The LDA Topic Model Blei et al. introduced the Dirichlet prior distribution based on Probabilistic Latent Semantic Indexing (PLSI) to create the Latent Dirichlet Allocation (LDA) topic model which is a three-level Bayesian probabilistic model of document-topic-word item [11]. In short, every word in an article is obtained by the following steps: first picking a topic that has a specific probability, and then choosing a word from the topic which has a specific probability. Figure 1 depicts the structure of the LDA topic model. The circle with shading indicates the observable variables, while the circles without shading indicate the hidden variables [12]. α is a priori parameter that describes the distribution of the topic specific to the document, and β is a priori parameter that describes the distribution of the word specific to the topic. Both of them are used to define the dirichlet prior distribution and calculate their respective posterior distribution. N is the total number of words in a document, M is the total number of documents, and K is the number of topics in a document. Therefore, the generation process of a document is as follows [12]: First of all, the model determines the topic distribution θ of a document according to prior knowledge. Secondly, the topic Z is derived from the topic multinomial distribution θ specific to the document. Thirdly, prior knowledge is used to determine the word distribution ϕ of the current topic. Fourthly, the model extracts a word W from the word multinomial distribution ϕ specific to the topic Z. Finally, this whole procedure is carried out again and again for several times to generate a document.
318
J. Li and Y. Liang
Fig. 1. The LDA topic model structure [12]
4 Process of Experiment The experimental process of this study can be broken down into seven steps which is exhibited in Fig. 2.
Data collection Data preprocessing
Life cycle division Word frequency statistics TF-IDF features extraction LDA topic model establishment Result analysis Fig. 2. The experiment framework
The first step is to collect data of public welfare microblogs. The second step is to preprocess the data. The third step is to divide the communication cycle into four stages. The fourth step is to extract keywords at each stage based on word frequencies. The fifth step is to extract TF-IDF features of the microblogs. The sixth step is to establish the LDA topic model for clustering topics. The seventh step is to analyze the characteristics of the topics mined at each stage.
Topic Analysis of Public Welfare Microblogs
319
5 Experiments and Discussion 5.1 Data Collection This study uses GooSeeker3 , a web crawler, to collect data from the microblog account “@微公益” (Micro public welfare) from January to April in 2020. A total of 1,863 records are collected as the corpus including posting time, blog content, the number of reposts, the number of comments and the number of being liked, etc. 5.2 Data Preprocessing We used jieba4 to implement Chinese word segmentation, and then remove stop words with no real meaning. As a result, clean data are obtained for the experiment. 5.3 Life Cycle Division Based on the trend of reposts and the comments of the microblog posts as shown in Fig. 3, we divide the life cycle of public health emergencies in Weibo using the method proposed by Cao and Yue [13]. The life cycle of Weibo public welfare communication in the early period of the COVID-19 epidemic was separated into four phases: the initial stage (January 1 to January 27), the outbreak stage (January 28 to February 27), the fluctuation stage (March 28 to April 3) and the long tail stage (April 4 to April 30).
Fig. 3. Life cycle division of four stages
At the initial stage, there were a very few topics related to the epidemic, and the number of comments and reposts of the Weibo posts was very small, which could not attract wide attention of the public. At the outbreak stage, the number of topics related to the epidemic increased rapidly in a short time, and the comments and reposts also surged 3 Gooseeker: https://www.gooseeker.com/ 4 Jieba: https://github.com/fxsjy/jieba.
320
J. Li and Y. Liang
and fluctuated rapidly, which drew great attention of the public. At the fluctuation stage, the fluctuation became slower than at the outbreak stage. The number of comments and reposts decreased at the beginning, but increased again and fluctuated for a long time due to the emergence of some new topics, showing an overall upward trend, and the public’s attention to some topics reached their peaks. At the long tail stage, there was a sharp decline in public attention, and the number of comments and reposts remained at a low level. However, some people still kept their constant attention on some topics. 5.4 Word Frequency Statistics We use word frequency statistics to extract keywords from Weibo data of the four stages respectively. The top 10 keywords with the highest word frequencies are acquired at each stage with a total of 40 keywords as shown in Table 1. And the word cloud images of keywords of top 10 highest frequencies in four periods are shown in Fig. 4. According to Table 1 and Fig. 4, there were different focuses at different stages. At the initial stage, the action to fight the epidemic was the focus of attention, and the Weibo posts included the supply and verification of epidemic prevention materials in hospitals. At the outbreak stage, the Weibo account focused on a series of public welfare actions of anti-epidemic, and also encouraged people to participate in punch-card campaign of fighting against the epidemic. At the fluctuating stage, the account paid attention to not only public welfare actions of anti-epidemic, but also women, civilian heroes and other groups. At the long tail stage, there were many activities related to Earth Day, and at the same time, the account continued to pay attention to the epidemic-related topics such as epidemic maps and civilian heroes. Table 1. Keywords of top 10 highest frequencies Initial stage
Outbreak stage
Fluctuation stage
long tail stage
fight (抗击)
295
action (行动)
903
public welfare (公益)
388
earth (地球)
119
medical (医用)
294
fight (抗击)
745
action (行动)
358
knowledge (知识)
102
COVID-19 (新型肺炎)
185
COVID-19 (新型肺炎)
505
fight (抗击)
292
public welfare (公益)
78
action (行动)
183
punching (打卡)
444
epidemic (疫情)
192
Contest (大赛)
68
standard (标准)
176
Fight the epidemic (战疫)
342
COVID-19 (新型肺炎)
145
epidemic (疫情)
59
help (求助)
159
public welfare (公益)
341
anti-pneumonia (抗疫)
135
action (行动)
39
(continued)
Topic Analysis of Public Welfare Microblogs
321
Table 1. (continued) Initial stage
Outbreak stage
Fluctuation stage
long tail stage
mask (口罩)
130
epidemic (疫情)
272
female (女性)
124
Fight ( 抗击)
39
public welfare (公益)
106
enterprises (企业)
248
project (项目)
102
world (世界)
34
hospital (医院)
101
cheer up (加油)
231
civilian heroes (平民英雄)
96
civilian heroes (平民英雄)
28
medical backup forces (医护后盾)
179
Wuhan (武汉)
94
Epidemic map (疫情地图)
21
verification (核实)
93
Fig. 4. Word cloud at four stages
5.5 TF-IDF Features Extraction Before building the LDA topic model, TF-IDF features extraction is carried out and Weibo blogs are vectorized.
322
J. Li and Y. Liang
5.6 LDA Topic Model Establishment In order to establish the LDA topic model, the following parameters are used in the study. At first, we set α = 0.1 for the prior parameter of topic-document distribution and β = 0.01 for the prior parameter of word-topic distribution. Secondly, we set the parameters learning_offset = 50, max_iter = 50, and random_state = 0. learning_offset is a positive parameter that reduces the early iterations in online learning, max_iter is the maximum number of passes over the training data, and random_state is a parameter to pass an integer for reproducible results across multiple function calls. By calculating the topic perplexities and visualizing the topic distributions at different stages, we find that when the number of topic K for each stage is 4, 4, 5 and 5 respectively, the model has a good clustering effect. Figure 5 shows the visualization of clustering result.
Fig. 5. Visualization of topic distributions at the four stages
Topic Analysis of Public Welfare Microblogs
323
5.7 Analysis of Results The Results of the Initial Stage. In this paper, we select the microblog text from January 1 to January 27, 2020 as the initial stage. After the text set is analyzed by the LDA topic model, we get 4 topics finally. The 5 most relevant keywords are shown for each topic in Table 2.
Table 2. Topics and keywords at the initial stage Topic Keywords 1
medical(医用), standards(标准), help(求助), mask(口罩), pneumonia(肺炎)
2
Wuhan(武汉), public welfare(公益), epidemic(疫情), supplies(物资)
3
children(孩子), starlight charity(星光公益), hope(希望), challenge(挑战), dance(舞蹈)
4
rural(乡村), teachers(教师), Jack Ma(马云), attention(关注), children(孩子)
We summarize the topic characteristics of the initial stage as follows. Topic 1: Due to the lack of medical materials during the pneumonia epidemic, the hospital launched requests for help which included the information on medical consumables, such as medical surgical masks (not less than YY0469-2011 standard), etc. Topic 2: The enterprises (such as Midea Group, Lenovo Group, and Yili Group) donated medical supplies to Wuhan for public welfare in order to help the hospitals in the epidemic prevention and control work. Topic 3: There are some public welfare projects full of humanistic care including the project of “Hope relay hip-hop challenge” to help rural children realize their street dance dreams and the Project of “Starlight Public Welfare” to expand the transmission of public welfare information through the participation and interaction of celebrities. The theme objects of care included vulnerable groups, occupational health, natural creatures, etc. Topic 4: In order to raise people’s attention to rural children’s education, Jack Ma launched the activities of “Jack Ma Rural Teacher Award” where many rural teachers were awarded prizes. At the initial stage, the mismatch between the delay of information and the rapid spread of the epidemic resulted in the shortage of epidemic prevention and control materials in hospitals. Therefore, the microblog posts focused on the supply and verification of epidemic prevention and control materials including requests from hospitals and donations from corporates. Secondly, in addition to some daily rural public welfare activities, there were also some activities that cooperated and interacted with celebrities to raise public attention to the prevention and control of the epidemic, such as inviting celebrities to call on people not to eat wild animals. The Results of the Breakout Stage. In this paper, we select the microblog text from January 28 to February 27, 2020 as the outbreak stage. After the text set is analyzed by the LDA topic model, we get 4 topics finally. The 5 most relevant keywords are shown for each topic in Table 3.
324
J. Li and Y. Liang Table 3. Topics and keywords at the breakout stage
Topic
Keywords
1
COVID-19(新型肺炎), public welfare(公益), enterprises(企业), epidemic(疫情), pneumonia(肺炎)
2
punching(打卡), fight against the epidemic(战疫), join(加入), persistence(坚持), commitment(承诺)
3
medical backup forces (医护后盾), cheer up(加油), COVID-19(新型肺炎), antipneumonia(抗疫), front line(一线)
4
medical(医用), pneumonia(肺炎), help(求助), hospital(医院), standard(标准)
We summarize the topic characteristics of the outbreak stage as follows. Topic 1: The enterprises (such as Suning Group, Intel, Hongxing Erke) donated medical supplies to Wuhan for public welfare in order to help the hospitals in the epidemic prevention and control work. Topic 2: The punch-card campaign of fighting against the epidemic encouraged users to join the action of insisting on punching card and promising to help prevent the epidemic. Topic 3: On the front line, medical backup forces struggled to combat with the pneumonia. And the society cheered them up. Topic 4: Due to the lack of medical materials during the pneumonia epidemic, the hospital launched requests for help which included the information on medical consumables, such as medical surgical masks (not less than YY0469-2011 standard), etc. During the outbreak stage, the epidemic broke out one after another across the country. On the one hand, some Weibo posts focused on the supply of epidemic prevention materials and medical backup forces to fight against the epidemic. On the other hand, in order to raise the public’s attention of self-prevention, the Weibo account issued the activity named “Combating Epidemic Punching Action” which inspired the public’s enthusiasm in battling with the epidemic and reinforced their will and confidence in overcoming the epidemic. The Results of the Fluctuation Stage. In this paper, we select the microblog text from February 28 to April 3, 2020 as the fluctuation stage. After the text set is analyzed by the LDA topic model, we get 5 topics finally. The 5 most relevant keywords are shown for each topic in Table 4.
Table 4. Topics and keywords at the fluctuation stage Topic
Keywords
1
COVID-19(新型肺炎), donation(捐赠), medical backup forces(医护后盾), cheer up(加油), special fund(专项基金) (continued)
Topic Analysis of Public Welfare Microblogs
325
Table 4. (continued) Topic
Keywords
2
children(孩子), world(世界), women(女性), program(计划), care(关爱)
3
anti-pneumonia(抗疫), civilian heroes(平民英雄), salute(致敬), front line(一线), gratitude(感谢)
4
washing hands(洗手), punching(打卡), fight the epidemic(战疫), challenge(挑战), protection(防护)
5
COVID-19(新型肺炎), epidemic map(疫情地图), districts and counties(区县), environment(环境), diagnosed(确诊)
We summarize the topic characteristics of the fluctuation stage as follows. Topic 1: Medical backup forces fought against the epidemic, and received donations and support from the Sina Anti-Pneumonia Special Fund. Topic 2: There were some public welfare activities that pay attention to women, children and nature, such as Women’s Public Welfare Festival, World Autism Day and World Water Day. Topic 3: The society looked for civilian heroes in the fight against pneumonia, especially those who struggled on the front line, and showed respect and expressed gratitude to them. Topic 4: Interactive activities like Hand-washing Challenge, Punching Campaign of Fighting the Epidemic encouraged people for personal protection. Topic 5: The national epidemic map of counties and districts with the number of cured and newly diagnosed patients demonstrated how the epidemic environment is changing. During the fluctuating stage, the situation of national epidemic was eased to a certain extent. However, the epidemic still occurred sporadically in localized areas. The persistence and protection of the medical backup forces still played an important role. As a result, during this period, the Weibo account continued to pay attention to and encourage the work of medical backup force, and posted epidemic maps with the development trend of the epidemic. On the other hand, the account focused on thematic activities of public welfare in conjunction with popular festivals, such as Women’s Public Welfare Day, World Autism Day and World Water Day. And it also carried out activities of Hand-washing Challenge to reply to the World Health Organization. Some of the public welfare programs for women and children with anti-epidemic color reflected humanistic care during the epidemic. The Results of the Long Tail Stage. In this paper, we select the microblog text from April 4 to April 30, 2020 as the long tail stage. After the text set is analyzed by the LDA topic model, we get 5 topics finally. The 5 most relevant keywords are shown for each topic in Table 5.
326
J. Li and Y. Liang Table 5. Topics and keywords at the long tail stage
Topic Keywords 1
knowledge(知识), contest(竞赛), protection(保护), guard(守护), world(世界)
2
fight(抗击), action(行动), national(全国), epidemic map(疫情地图), environment(环 境)
3
help(帮助), action(行动), against(抗击), medical backup forces(医护后盾), guardians( 守护者)
4
anti-epidemic(抗疫), civilian heroes(平民英雄), look for(寻找), gratitude(感恩), salute(致敬)
5
thanks(感谢), love(爱心), children(孩子), support(支持), project(项目)
We summarize the topic characteristics of the long tail stage as follows. Topic 1: Public service activities related to cosmopolitan festivals like World Earth Day were held such as the Earth knowledge contest. Topic 2: The national epidemic map of counties and districts with the number of healed people and newly diagnosed people showed the current trend of epidemic environment changes. Topic 3: There were many activities of medical backup forces and guardians fighting the epidemic and guarding lives. And the society provided support and assistance for them. Topic 4: The society looked for the civilian heroes of the epidemic and expressed gratitude to them. Topic 5: Some charity projects were devoted to helping the children and thanked the public for their support. At the long-tail stage, the epidemic situation has stabilized. Therefore, the Weibo account has shifted to disseminating daily public welfare activities such as World Earth Day, and organizing related knowledge contests to stimulate the enthusiasm of public’s participation. In addition, it continued to focus on the medical backup forces, civilian heroes, vulnerable groups and the changes of the epidemic environment. 5.8 Discussion In general, the topics related to the epidemic went through the four stages of public welfare dissemination, but the proportion of the epidemic-related topics in each stage is different. At the initial stage, people had paid attention to the supply of medical materials since the emergence of the COVID-19 epidemic, while the daily public welfare activities still account for a considerable proportion in the microblog posts. And when the epidemic broke out, the Weibo account began to pay full attention to the epidemic including the situation of material assistance, medical backup forces and so on, and also encouraged people to participate in the action of anti-epidemic. When it came to the fluctuating stage the epidemic eased to a certain extent, while the account still paid much attention to the information of epidemic, especially the medical backup forces and civilian heroes in
Topic Analysis of Public Welfare Microblogs
327
the epidemic, and focused on the public welfare activities in conjunction with popular festivals. At the long tail stage, the epidemic gradually subsided, and the Weibo posts paid less attention to the epidemic and more attention to the world public welfare activities. At the same time, the epidemic map was also the focus of attention to understand the dynamic changes of the epidemic. Therefore, the public welfare dissemination in the early period of epidemic has different characteristics at different stages. Grasping these characteristics can provide some strategies for public welfare dissemination in the post-epidemic era. First of all, for the spread of the epidemic-related topics, the media should release information in time. Authoritative and accurate information can alleviate people’s anxiety in the epidemic, and also help to collect important information in time in order to benefit more people. Secondly, it is effective to cooperate with celebrities and disseminate public welfare information in combination with popular festivals. Using celebrity effects and popular festivals can increase the audience’s attention and participation in public welfare information. Thirdly, paying attention to the interaction with the public is also an important strategy for the media, such as holding punching activities and knowledge contests, which can not only raise people’s confidence in the fight against the epidemic, but also enable people to learn some interesting knowledge. Finally, the media should pay attention not only to the epidemic itself, but also to the common people, especially medical staff, women and children, so as to show basic humanistic care. Only by grasping the stage characteristics of public welfare dissemination and carrying out targeted strategies can we unleash the social value of public welfare information and improve the communication effect.
6 Conclusion and Future Work In this study, we collected the Weibo posts of public welfare in the early period of the epidemic and divided them into four stages firstly. And then we extracted the keywords with the highest frequencies at each stage and mined the key topics of each stage by using LDA topic model. Based on the results, we analyzed the topic characteristics of public welfare microblogs at different stages in the early period of the epidemic. In the future, we will further analyze the users’ comments and emotions in order to figure out how the media and the public interact with each other when it comes to the topics of public welfare. Acknowledgements. The research was supported by National Natural Science Foundation of China (Grant No.: 62077005), National Social Science Fund Key Project (Grant No.: 20AZD057), Guangzhou Urban Public Opinion Governance and International Image Communication Research Center (Guangzhou Philosophy and Social Science Development “the Fourteenth Five-year Plan” 2022 Joint Project, No.: 2022JDGJ02), School Project “School Innovation Team of Crossculture Communication and International Communication” in the School of Journalism and communication, Guangdong University of Foreign Studies, P. R. China.
328
J. Li and Y. Liang
References 1. Wu, Z., Zhang, Y., Chen, Q., Wang, H.: Attitude of Chinese public towards municipal solid waste sorting policy: a text mining study. Sci. Total Environ. 756(10), 142674 (2020) 2. Koh, J.X., Liew, T.M.: How loneliness is talked about in social media during COVID-19 pandemic: Text mining of 4,492 twitter feeds. J. Psychiatr. Res. 145, 317–324 (2020) 3. Zhong, Z., Wang, T.: Application of big data text mining technology in journalism and communication. Contemporary Commun. (5), 7 (2018). (in Chinese) 4. Wang, D., Lyu, J.C., Zhao, X.: Public opinion about e-cigarettes on Chinese social media: a combined study of text mining analysis and correspondence analysis. J. Med. Internet Res. 22(10), e19804 (2020) 5. Liu, X., Burns, A.C., Hou, Y.: An investigation of brand-related user-generated content on twitter. J. Advert. 46(2), 236–247 (2017) 6. Han, X., Wang, J., Zhang, M., Wang, X.: Using social media to mine and analyze public opinion related to COVID-19 in china. Int. J. Environ. Res. Public Health 17(8), 2788 (2020) 7. Xu, A., Qi, T., Dong, X.: Analysis of the Douban online review of the MCU: based on LDA topic model. J. Phys: Conf. Ser. 1437, 012102 (2020) 8. Liu, G., Quan, C.: Research on topic evolution of microblog hot events based on life cycle of network public opinion. Inf. Res. 4, 11–19 (2018) 9. Zhang, L., Wei, J., Boncella, R.J.: Emotional communication analysis of emergency microblog based on the evolution life cycle of public opinion. Inf. Disc. Deliv. 48(3), 151–163 (2020) 10. Chen, Q.L., Zhang, Q., Huang, S.T., Xiao, L.: Evolution mechanism study for food safety emergency — based on life-cycle theory. In: 2010 IEEE 17th International Conference on Industrial Engineering and Engineering Management, pp. 1053–1057 (2010) 11. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) 12. Blei, D., Carin, L., Dunson, D.: Dunson: probabilistic topic models. IEEE Signal Process. Mag. 27(6), 55–65 (2010) 13. Cao, S., Yue, W.: Mining and evolution analysis of public opinion theme of public health emergencies in Weibo. J. Inf. Resour. Manage. 10(6), 10 (2020). (in Chinese) 14. Li, J., Dong, D.: Analysis of Weibo comments based on SVM and LDA models. In: The 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 11–15 (2019) 15. Li, L., Zhang, Q., Wang, X., Zhang, J., Wang, F.Y.: Characterizing the propagation of situational information in social media during COVID-19 epidemic: a case study on Weibo. IEEE Trans. Comput. Soc. Syst. 7(2), 556–562 (2020) 16. Li, J., Tang, X., Dong, D.: Identification of public opinion on COVID-19 in Microblogs. In: The 16th International Conference on Computer Science & Education (ICCSE), pp. 117–120 (2021) 17. Kouzy, R., Jaoude, J.A., Kraitem, A., Alam, M., Baddour, K.: Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on twitter. Cureus 12(3), e7255 (2020) 18. Wu, H., He, J., Pei, Y., Long, X.: Finding research community in collaboration network with expertise profiling. In: Huang, D.-S., Zhao, Z., Bevilacqua, V., Figueroa, J.C. (eds.) ICIC 2010. LNCS, vol. 6215, pp. 337–344. Springer, Heidelberg (2010). https://doi.org/10.1007/ 978-3-642-14922-1_42 19. Wu, B.-X., Xiao, J., Chen, J.-M.: Friend recommendation by user similarity graph based on interest in social tagging systems. In: Huang, D.-S., Han, K. (eds.) ICIC 2015. LNCS (LNAI), vol. 9227, pp. 375–386. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-220536_41
Intelligent Computing in Computer Vision
Object Detection Networks and Mixed Reality for Cable Harnesses Identification in Assembly Environment Yixiong Wei1,2 , Hongqi Zhang1,2 , Hongqiao Zhou1,2 , Qianhao Wu1,2(B) , and Zihan Niu3 1 Department of Engineering and Technology Center, No.38 Research Institute of CETC,
Hefei 230088, China [email protected] 2 Anhui Technical Standard Innovation Base (Intelligent Design and Manufacturing, Intelligence Institute, Civil-military Integration), Hefei 230088, China 3 National Engineering Research Center for Agro-Ecological Big Data Analysis and Application, School of Internet, Anhui University, Hefei 230601, China
Abstract. This paper proposes a smart cable assembly assistance system which composed of MR (mixed reality) cable assembly guidance sub-system and cable type inspection sub-system. In the cable type inspection sub-system, a deep object detection algorithm called Cable-YOLO is developed. The proposed algorithm integrates the multiple scale features and anchor-free mechanism. At the same time, a visual guidance sub-system is deployed in the MR device of Hololens to aid the cable assembly process. In order to evaluate the Cable-YOLO algorithm, a dataset containing 3 types of aviation cables is collected. The experimental results show that Cable-YOLO can recognize the cable type and locations, which achieves the best performance in detection accuracy and inference time comparing other four baseline algorithms. The visual guidance subsystem can assist the assembly worker who has a professional experience to install the cable at the corresponding location with the cable type detection results. The system can provide quick assembly guidance for the worker in a real environment. Keywords: Object detection · Mixed reality · Cable harnesses identification · YOLO
1 Introduction Cable harnesses are employed to connect electrical components, equipment, or some control devices for power and signal transmission [1]. The cable assembly process is a time-consuming and labor-intensive job, usually involves many components in a complicated environment [2]. The main steps include the inspection of the type of cable and the laying of cables. However, the inspection process is complicated: the cable type has many varieties and quantity, and the assembly workers need to identify the correct cable type manually and determine where to lay or install the cable according to the assembly © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 331–340, 2022. https://doi.org/10.1007/978-3-031-13832-4_28
332
Y. Wei et al.
manual. The experience and proficiency of the assembly workers are the key factors affecting the reliability of the cable assembly and the final quality of these electronical products. In addition, workers are easy to make mistakes under such high-intensity labor work. In order to improve assembly performance, some researchers attempt to use assembly simulation of cable in virtual environment with virtual reality (VR) technology [3–5]. But VR technology is usually used for assembly worker training since the models in VR environment are not real. On the other hand, Augmented Reality (AR) device is a promising solution to aid in the assembly industry assembly [6, 7]. In an assembly site, the worker can follow the assembly instructions when he wears an AR device. In such situation, the assembly process can reduce the dependence on worker’s experience and improve the quality of assembly significantly. In this research, the target is to develop a method which can guides the cable assembly process by using wearable AR devices. In order to satisfy the requirements for aid cable installation, it is necessary to recognize the type of cable to confirm that the right position is installed. Considering that deep learning techniques have been developed for many computer vision applications [8, 9], we attempt to use deep learning technique combined AR devices to aid cable assembly process. First, the deep learning technique is used to detect the type of cable, then the inspection results are transferred to the AR devices in real-time for visual guidance of the assembly process. In deep learning-based object detection, many typical techniques have been developed. Among these techniques, one-stage object detector YOLO [10–12] and two-stage object detector Faster R-CNN [13, 14]are representation methods. However, there are two major challenges still existing in cable harnesses detection: the first challenge is a that deep learning is data-driven method, and there is no public cable harnesses dataset available. Second one is that the cable harnesses are soft objects, which are different from ordinary object. The detection performance will decrease when an ordinary object detector is employed. In this research, we attempt to solve these two challenges and develop a system to guide the entire cable assembly process. The contributions of this research are summarized below. First, we propose a novel object detection algorithm called Cable-YOLO for cable type. The proposed algorithm uses multiple scale features and anchor-free mechanism. Compared to the existing object detection algorithms, the proposed algorithm can achieve better performance with fewer image data and computational resources. Second, we develop a cable assembly guidance system consisting of a Hololens and a computer with a 1080Ti graphic process unit (GPU). The proposed model is deployed in the computer as cable type detection server and Hololens is used as client to guide the worker operation when the client receives the detection results.
2 The Proposed System Framework The proposed system framework is a typical client-server architecture, which combines Holelens2 as a client and a personal computer (PC) as a visual inspection server. The client is AR cable assembly guidance sub-system and server is cable type inspection sub-system. In the subsystem for cable type inspection, the Hololens is also used as acquisition device. The cable images were taken from real environment by Hololens and transferred
Object Detection Networks and Mixed Reality
333
to the PC server in real time. Then, these images are processed by a deep learning-based object detection model deployed on a PC server with GPU. The AR cable assembly guidance sub-system is developed by Unity and C#, which will determine the final cable laying position and give the guidance information according to the cable type detection results. The diagram of proposed framework is shown in Fig. 1. The worker can assemble the cable by following the guidance displayed in Hololens without experience.
Transfer image Hololens as input device Obtain the cable posiƟon
Cable type detecƟon
Return cable coordinate & type
Cable-YOLO with GPU
Give user the guidance Server
Fig. 1. Diagram of the proposed framework, which is client-server architecture
2.1 Cable Type Inspection Sub-system It is not easy to use traditional image processing algorithm for cable type detection in assembly site because of the complicated environment and light change. Therefore, deep learning-based object detection is employed in this research. Considering that deep learning requires heavy computational resources, a PC server with GPU is used to deploy deep object detection model. Given various types and corresponding laying positions in complicated assembly environments, the object detection algorithm for the cable type in the assembly site is very important. In general, the cables are soft linear objects and very similar, which make the difference between cables are not obvious. These factors make cable type detection is a challenge task at the assembly site. First, the assembly process needs real-time detection, which requires that the deep model requires as few computational resources as possible. Second, compared to the field of view, the cable in the assembly scenario is relatively small. Small object detection is very important for cable type detection. Third, deep learning usually requires a large dataset to train the model. The public dataset for cable type detection is not available. Therefore, we need to collect a dataset for cable type detection. In order to solve these three challenges, we integrated multiple methods in an object detection framework. 2.1.1 Recognition of Cable Type In deep object detection research community, YOLO series always achieve the optimal speed and accuracy trade-off for real-time applications. In contrast, other deep object detection algorithms usually require more computational resources and can not run in real
334
Y. Wei et al.
time. In cable assembly environment, the real time inspection to the cable type is required. Therefore, we choose the YOLO-V3 algorithm as our baseline in this investigation. In recent years, the researchers have focused on anchor-free and nonmaximum suppression (NMS)-free detection for object detection. Many anchor-free detectors [15–17] have been developed. The anchor-free mechanism can reduces the number of network parameters and improve the detection performance significantly. Network training is much faster and simpler. However, YOLO series do not employed anchor-free mechanism even some advanced detection technologies have been integrated in the latest YOLO version. In this research, a very simple anchor-free mechanism is integrated into the original YOLO algorithm. First, we set a predefined scale range and the center location of each object, and specify the FPN level for each object. For each location we predict four values: two offsets to the left-top corner of the grid and the height and width of the predicted box. To detect the cable type more accurate, two mechanisms are employed. Considering that the cable is soft and small objects, the multiple scale features obtained from different network layers are fused. The network is composed of two subnetwork modules, the first one is top-down architecture and the second one is bottom-up architecture. Each subnetwork uses ResNet block. The image is inputted into bottom-up subnetwork, then the features map is transmitted to the next subnetwork. In top-down subnetwork, each feature map of every layer is all combined to predict the object class and corresponding candidate bounding box. The Schematic diagram of the network architecture is shown in Fig. 2.
Fig. 2. The proposed network architecture for cable type detection
Another mechanism is data augmentation (DA). DA is an effective method for improving detection performance. In spite of YOLO series have employed some DA strategies, Mosaic and Mixup are not integrated into YOLO3. In this research, we use these two strategies in a single-stage object detector. Through these strategies; the recognition rate is obviously increased (Fig. 3).
Object Detection Networks and Mixed Reality
335
Fig. 3. Flowchart for cable type detection from captured images.
2.2 Assembly Process Guidance Based on AR Currently, cable assembly depends on the worker’s experience, which leads to significant limitations. AR can address this problem effectively. In the case of cable assembly, the visualization of the assembly process is needed. The visual aided assembly contains two major parts: cable type detection and cable assembly guidance. Section 2.1 elaborates the cable type detection process in detail. The detection results are exported to a wearable AR device (Hololens in this research) for display in real-time. The content of the developed assistance system is managed by a control panel displayed in the wearable device. After the captured scene image is changed, the cable type detection sub-system begins work. The AR device will obtain the location and type. The final interface of this AR assisted assembly subsystem is shown in Fig. 4. For security, the device is marked in the figure. In the assembly process, the cable is marked and displayed in the AR device. Most importantly, the corresponding assembly position is also marked in the actual industrial scene displayed in the AR device according to the detection type. The product and process information are displayed on the upper left corner of the screen. A control panel can be moved by the user to any position on the screen, which can be used to schedule assembly instructions.
Fig. 4. Interface of AR assisted assembly system (the device was shielded for security).
336
Y. Wei et al.
3 Experiments and Discussions Since there is no public cable type dataset available, an extensive collection of a dataset was established for this research. The collected images of different cables are taken from real working environment. All experiments are conducted on the collected dataset. 3.1 Dataset and Experimental Parameters Our dataset is taken from real factory scenarios involving 3 cable types. For Type 1, 2 and 3, the number of images taken from many different angles and shapes is 686, 653, and 689 respectively. To validate the proposed method, the data set was split into training and test sets and each image was manually annotated with ground truth regions (bounding boxes). Some examples of the data set are shown in Fig. 4. The details of the data set are shown in Table 1, where class labels 1–3 represent different cable type. Considering that workers only assemble one cable at a time in a real working environment, only one cable is taken in each image. Table 1. The distribution of collected dataset with detail information Type 1
Type 2
Type 3
Total
686
653
669
2008
Network training uses the stochastic gradient descent (SGD) optimization algorithm. The initial learning rate is 0.001, the learning rate attenuation coefficient is 0.9 (the learning rate is updated every 2 000 iterations), the total number of iterations is 70000, and the batch size is 8. The experimental platform is win 10, using an Intel i7–8700 processor and a NVIDIA Geforce GTX 1080Ti. 3.2 Baselines and Evaluation Metric With the collected dataset, we were able to thoroughly test and four deep learning object detection networks as baselines, specifically SSD, Faster-RCNN, YOLOv3,and RetinaNet, and evaluate their performance on cable type detection task with the proposed method. For all experiments, the object detection networks were pre-trained on the ImageNet and then fine-tuned by the collected dataset for our application. To evaluate the recognition performance, the mean average precision (mAP) is used as the evaluation metric of the recognition performance. Average precision (AP) refers to the sum of the Precision and the Recall of one cable class, while mAP is the average of all cable class. The definitions of Precision and Recall are given below: Precision =
TP TP , Recall = TP + FP TP + FN
(1)
Object Detection Networks and Mixed Reality
337
where TP is the number of cables detected correctly, FP is the number of cables detected incorrectly, and FN indicates that the cable is not detected. The calculation equation of average precision (AP) is:
1
AP =
PRdr, mAP =
0
1 AP(q) q∈QR |QR |
(2)
where P is Precision and R is Recall. Q is the number of cable type. Considering that the cable type detection is needed to aid assembly in real time, the inference time and frame per second (FPS) are also used as evaluation metrics. 3.3 Experimental Results and Discussion In order to verify the effectiveness of the proposed method, four other object detection models, including SSD [12], Faster R-CNN, Retina net [13], and YOLOv3 are used in comparison experiments. To ensure the fairness of the experiment, the training data and test data used by each network model in the comparison experiments are exactly the same. Through visual inspection, the predictions made by each of the networks on the test data were highly accurate with respect to the ground-truth regions created by manual annotation of the data set. Due to cable bending deformation and light change, YOLOv3 and Faster R-CNN have more missed detection and false detection, while the proposed method has been greatly improved. The detection effect example is shown in Fig. 5. It is obvious to find that the proposed model has less missed detection and false detection.
Fig. 5. The diagram of detection comparison results for different networks
The multi-scale features strategy usually increases the computational time, the realtime performance of the algorithm is also need to be considered. In this research, FPS and inference time are used to evaluate proposed model. The higher the FPS value, the better the real-time performance. The following table shows the precision, recall and mAP values obtained by different scaling strategies, as well as the corresponding FPS values.
338
Y. Wei et al. Table 2. Results of the multiscale strategy Scale
Precision
Recall
mAP
FPS
1+3+5
88.36
91.73
0.90
49
1+3+7
88.44
92.74
0.909
51
1+5+7
89.14
92.37
0.91
49
3+5+7
89.76
92.89
0.905
50
1+3+5+7
90.47
93.56
0.913
46
Table 2 shows that four scales features of 1 + 3 + 5 + 7 are employed, the performance is higher than those of three scales features. On the other hand, from the perspective of computational efficiency, the three scales features of 1 + 3 + 5 have achieved the best real-time performance, the model can process 8.6 images per second, but the Precision and mAP values are significantly lower than the four scales features. In this research, we finally choose the structure that can achieve best detection performance. To figure out the contribution of our improved Cable-YOLO, we also conducted comparison experiments with other baselines. The training set and the test set are the same. The evaluation metrics of different models are calculated on the test set, the quantitative experimental results are shown in Table 3. Table 3. Experimental results and comparison Method
Precision
Recall
mAP
Inference time (ms)
SSD
84.36
85.42
0.87
30
Faster R-CNN
86.79
90.63
0.89
120
Retina-net
87.21
90.33
0.86
100
YOLO-V3
88.91
90.82
0.89
23
The proposed
90.47
93.56
0.91
22
By observing the experimental results, it can be found that the precision, recall and mAP obtained by SSD are low, indicating its poor detection performance for cable type. FasterR-CNN(f-r-cnn) and Retina-net have similar performance in the three indicators, but these two methods still cannot compete with YOLO-V3 and the proposed method. Compared with YOLO-V3, the proposed method achieved better performance. It proved that multiple-scale features are an effective strategy to improve the performance of object detection tasks. If referring to FPS and inference time, YOLO3, SSD and the proposed Cable-YOLO can run in real time, but Faster R-CNN and Retina-net cannot. Therefore, the proposed method is very suitable for cable type real-time detection. Figure 6 shows the details on the training loss and testing accuracy for the proposed Cable-YOLO and four baselines (SSD, Faster R-CNN, Retina-Net and YOLO-V3). It is clear that the convergence process of our method is more rapid and stable than that
Object Detection Networks and Mixed Reality
339
of other four baselines, the similar accuracy curve is also shown in test set. Therefore, the proposed Cable-YOLO outperforms the other four baselines in terms of the model stability, the prediction accuracy, and inference time.
Fig. 6. The training loss and test accuracy in collected dataset.
4 Conclusions In order to aid cable assembly in industrial environment, a smart assistance system that combines wearable AR device and cable type detection algorithm is proposed. The proposed system consists of two subsystems: one is for cable type detection based on deep learning and the other is assembly process guidance based on AR. In this research, a special deep learning-based object detection model is designed in cable type detection subsystem. To make detection in real time, we propose a cableYOLO algorithm. The proposed cable-YOLO uses multi-scale features and a free-anchor mechanism, the multiple scale features are fused to obtain the final detection results. To validate the proposed algorithm, a dataset consisting of 3 types of cables is collected from industrial environment. The experimental results show that the cable-YOLO algorithm detects the cable type effectively, and the mAP reaches 0.91. At the same time, the detection speed is improved. After the cable type is detected, the AR-guidance subsystem shows the assembly location and the guidance arrow in the real scene. The results show that the proposed system can assist workers to quickly inspect cable type, and readily show the installation path of cables in industrial environment. It has improved the assembly efficiency and quality of the cable assembly process.
References 1. Mora, N., et al.: Numerical simulation of the overall transfer impedance of shielded spacecraft harness cable assemblies. IEEE Trans. Electromagn. Compat. 57(4), 894–902 (2015) 2. Geng, J., Zhang, S., Yang, B.: A publishing method of lightweight three-dimensional assembly instruction for complex products. J. Comput. Inf. Sci. Eng. 15(3), 031004.1–031004.12 (2015)
340
Y. Wei et al.
3. Ng, F.M., et al.: Designing cable harness assemblies in virtual environments. J. Mater. Process. Technol. 107(1–3), 37–43 (2000) 4. Xia, P.J., et al.: A new type haptics-based virtual environment system for assembly training of complex products. Int. J. Adv. Manuf. Technol. 58(1–4), 379–396 (2012) 5. Ritchie, J.M., et al.: Cable harness design, assembly and installation planning using immersive virtual reality. Virtual Reality 11(4), 261–273 (2007) 6. O’B Holt, P., et al.: Immersive virtual reality in cable and pipe routing: design metaphors and cognitive ergonomics. J. Comput. Inf. Sci. Eng. 4(3), 161–170 (2004) 7. Erkoyuncu, J.A., et al.: Improving efficiency of industrial maintenance with context aware adaptive authoring in augmented reality. CIRP Ann. 66(1), 465–468 (2017) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2014) 9. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 10. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv e-prints (2018) 11. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) 12. Liu, W., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-46448-0_2 13. Ren, S., et al.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015) 14. Dai, J., et al.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems (2016) 15. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45 16. Tian, Z., et al. Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) 17. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Improved YOLOv5 Network with Attention and Context for Small Object Detection Tian-Yu Zhang1,2 , Jun Li3 , Jie Chai3 , Zhong-Qiu Zhao1,2,4(B) , and Wei-Dong Tian1,2 1 College of Computer and Information, Hefei University of Technology, Hefei, Anhui, China
[email protected], [email protected] 2 Intelligent Manufacturing Institute of HFUT, Hefei, China 3 Anhui Fiber Inspection Bureau, Hefei, China 4 Guangxi Academy of Science, Guangzhou, China
Abstract. Object detection is one of the most important and challenging branches in computer vision. Although impressive progress have been achieved on large or medium scale objects, detecting small objects from images is still difficult due to the limited image size and feature information. To deal with the small object detection problem, we explore how the popular YOLOv5 object detector can be modified to improve its performance on detecting small objects. To achieve this, we integrate Coordinate Attention (CA) and Context Feature Enhancement Module (CFEM) in YOLOv5 network. Coordinate Attention is based on attention mechanism and it embeds positional information into channel attention, which enables deep neural network to augment the representations of the objects of interest. Context Feature Enhancement Module explores rich context information from multiple receptive fields and only contains several additional layers. Extensive experimental results on the VisDrone-Detection dataset demonstrate that our approach can improve the performance of the proposed method for small object detection. Keywords: Small object detection · Attention mechanism · Context information
1 Introduction Object detection is fundamental task of many advanced computer vision problem, such as instance segmentation [1] and image caption [2]. Over the past few years, the emergence of deep convolutional neural network [3, 4] has boosted the performance of object detection, which mainly include two-stage object detection [5–7] and one-stage object detection [9–11]. Although these general object detection methods have improve accuracy and efficiency, the limited resolution and context information are not enough to a model, detecting small objects in images can be still difficult. Efforts have been made to improve the small object detection. Feature pyramid network (FPN) [8] is the first method to enhance features by fusing features from different levels and constructing feature pyramids. Another approach to small object detection is to generate high-resolution feature to the detection model. Li et al. [12] propose © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 341–352, 2022. https://doi.org/10.1007/978-3-031-13832-4_29
342
T.-Y. Zhang et al.
Perceptual GAN to enhance features of small objects with the characteristics of large objects. Leveraging the relationship between an object and its coexisting environment in the real world, context information is another novel method to improve small object detection. Many methods [13–15] employ additional layers to build context information from multiple layers. Augmented RCNN [16] proposes a novel region proposal network (RPN) to encode the context information around a small object proposal. A context module consisting of three sub-networks is designed to obtain the context information around the proposal network. As a one-stage object detector, YOLOv5 network [17] is widely used in academia and industry with its excellent detection accuracy and speed. Compared with other onestage object detectors, YOLOv5 network has a lightweight model size and is easier to train, so many systems are built on it and further improved. However, it is designed to be a general-purpose object detector and is not optimized to detect small objects. In this paper, an improved YOLOv5 network is proposed. We take YOLOv5 as the main network of our model, then we improve the YOLOv5 network with an attention module to capture key visual information and add a context feature enhancement module. The main contribution of this paper are summarized as follows: (1) In order to capture more visual information for the detection model, we use Coordinate Attention (CA) to capture key visual information. Coordinate attention can embed positional information into channel attention to enable deep neural network to augment the representations of the objects of interest. (2) A Context Feature Enhancement Module (CFEM) is proposed, which can capture rich context information from different receptive fields by using multi-path dilated convolutional layers. Furthermore, it uses concatenation to merge the layers with different receptive fields for fusing the coarse-and-fine-grained features in CFEM. (3) We evaluate our method on VisDrone-Detection dataset. The results demonstrate that the improved YOLOv5 network can get better performance than the baseline method (YOLOv5).
2 Related Work 2.1 CNN-Based Object Detection CNN-based object detection can be mainly divided into two categories: 1) two-stage detectors and 2) one-stage detectors, where the former generate a lot of regions proposals and then classifies each proposal into different object categories. And the later regard object detection as a regression or classification problem, using a unified network to achieve final detection results directly. Two-stage Object Detection: In 2014, R. Girshick et al. [18] proposed the Regions with CNN features (RCNN) for object detection. It generates 2000 candidate proposals by Selective Search [19]. These proposals are fed into a CNN model to extract features. Finally the presence of objects and object categories are predicted by linear SVM classifiers. One of the major issues with RCNN was the need to train multiple systems separately. Fast-RCNN [5] solved this problem by creating a single end-to-end trainable system. Moreover, for Faster-RCNN [6], Region Proposal Network (RPN) integrates
Improved YOLOv5 Network with Attention and Context
343
proposal generation with the classifier into a single convolutional network. Besides, a lot of two-stage object detection methods have been proposed, such as FPN [8], R-FCN [7], Mask-RCNN [1], and Cascade RCNN [20]. One-stage Object Detection: In 2015, R. Joseph et al. proposed YOLO [9], which is the first one-stage object detector in computer vision era. The core idea of YOLO is to use the whole feature map to directly predict the location and category of the bounding box. Then, SSD [10] was proposed by Liu et al. in 2015. The main contribution of SSD is the introduction of the multi-reference and multi-resolution detection techniques, which significantly improves the detection accuracy of a one-stage detector, especially for some small objects. There are also extensive other one-stage object detection methods enhancing the detection process in the prediction objectives or the network architectures, such as YOLOv4 [30], RetinaNet [11], and CenterNet [21].
2.2 Attention Mechanism Attention mechanism is a data processing method in machine learning. The basic idea of attention mechanism in computer vision is to let the model learn to focus on key information and ignore unimportant information. SENet [22] is the first to use channel attention. The core of SENet is a squeeze-and-Excitation block which is used to collect global information, capture channel-wise relationships and improve representation ability. In 2018, Woo et al. [23] proposed the convolutional block attention module (CBAM) which stacks channel attention and spatial attention. It decouples the channel attention map and spatial attention map for computational efficiency, and leverages spatial global information by introducing global pooling. For object detection, Cao et al. [24] use an attention-guided module to adaptively extract the useful information around the salient object through the attention mechanism. Moreover, in recent years, Non-local neural network [25] and Self-attention [26] have become very popular due to their capability of building spatial or channel-wise attention. However, due to the large amount of computation inside the self-attention modules, they are often adopted in large models but not suitable for mobile networks. 2.3 Context Information Many Studies have proved that the context information can improve the performance of object detection and image classification. The feature from the top layers in generic object detectors are enough to capture large objects but the information is greatly limited for small objects. While the feature from the bottom layers contain too specific information which is not useful for detecting large objects but useful for small objects. Then, some detection methods based on context information were proposed to use the relationship between small objects and other objects or background. Oliva et al. [27] illustrate that the around region of small object could provide useful context information to help detect small object. Moreover, the experimental result in [28] also demonstrate that adding a special context module can significantly improve the detection accuracy. Some studies [28, 29] also propose to use dilated convolution layer to better segment small
344
T.-Y. Zhang et al.
objects because dilated convolution layer convers larger receptive fields without losing resolution. In this paper, we also use dilated convolution layer to extract features with different receptive fields.
3 Proposed Method 3.1 Network Architecture Generally, YOLOv5 network can be divided into three parts: the architecture of CSPDarknet53 as backbone, SPP layer and PANet as Neck and YOLO detection head [9]. To further optimize the whole architecture, bag of freebies and specials [30] are provided. In order to balance the influence of detection performance and computing resources, we select YOLOv5-l as our baseline model. In this paper, the Coordinate Attention and the Context Feature Enhancement Module are introduced to improve the performance of small object detection on the original YOLOv5 network. In order to extract more feature for small objects, we add one more prediction head for small object detection. The structure of four prediction heads can ease the negative influence caused by violent object scale variance and can be more helpful to detect small objects. The improved YOLOv5 network structure is shown in Fig. 1.
Fig. 1. The complete structure of the improved YOLOv5 network.
Improved YOLOv5 Network with Attention and Context
345
3.2 Coordinate Attention Coordinate Attention [31] can be viewed as a computational unit that aim to enhance the expressive power of the learned features for CNN-based network. The structure of Coordinate Attention module is shown in Fig. 2. Coordinate attention is based on the Squeeze-and-Excitation (SE) network [20], which can be divided into two steps: squeeze and excitation. The squeeze step is designed for global information embedding and excitation step is used for adaptive recalibration of channel relationships. The squeeze step can be formulated as follows: zc =
W H 1 xc (i, j) H ×W
(1)
i=1 j=1
where xc is the c - th channel for the input X and zc is the output related to the c - th channel. The excitation step aims to fully capture channel-wise dependencies, which can be formulated as. Xˆ = X · σ (T2 (ReLU(T1 (z))))
(2)
where · refers to channel-wise multiplication, σ is the sigmoid function and T1 and T2 are two linear transformations which can be learned to capture the importance of each channel.
Fig. 2. (a) The architecture of SE network. (b) The architecture of coordinate attention module
Compared to SE network, Coordinate Attention takes into account both inter-channel relationships and positional information. It can be decomposed into two steps: coordinate information embedding and coordinate attention generation. First of all, in order to encourage attention blocks to capture long-range interactions spatially with precise positional information, coordinate attention factorizes the global pooling into a pair of 1D feature encoding operations, which encode each channel along
346
T.-Y. Zhang et al.
the horizontal coordinate and the vertical coordinate through two spatial extents of pooling kernels (H , 1) and, (1, W ) respectively. The coordinate information embedding step can be formulated as follows: zch (h) =
1 xc (h, i) W
(3)
1 xc (j, w) H
(4)
0≤i 0, |c ,ij ≤ c ij , |d 1,j ≤ d 1,j , dn1,j + d 1,j > 0, |d 2,j ≤ d 2j , dn2,j + d 2,j > 0,
g ,i ≤ g , j = 1, 6 , (6)
i where mij , d 1j , d 2j , c ij , g i are estimations of maximum deviations of the AUVs parameters and entries of corresponding matrixes from their nominal values. The vector τ defines the resulted force and torque which are formed by all AUV thrusters. Each of these motors has typical control system [16] and dynamic of the vector τ can be described by following equation: Tt τ˙ + τ = τd ,
(7)
where τd = [τdx , τdy , τdz , Mdx , Mdy , Mzd ]T ∈ R6 , is the desired value of force and torque vector; Tt is the equal time constant for all thrusters.
508
F. Vladimir et al.
3 The Synthesis of the AUV Control System The approach described in [13, 17] is used for synthesis of the AUV control system. This approach supposes creation of the control system consisted of two loops: the velocity control loop and the position control loop (see Fig. 1). The inner AUV velocity control loop provides decoupling all control channels, compensation of uncertain and variable AUV parameters and ensures desired dynamic properties of AUV. The external AUV position control loop provides independently control of all linear and rotational degree of freedom. This separation of control system on two loops allow to simplify considering the AUV dynamic properties and using for this only first equation in (1) as compared to single loop systems where it need to consider the dynamical and kinematical properties simultaneously.
SVC - sliding velocity controller; NVC - nonlinear velocity controller; NPC – nonlinear position controller; NTC – nonlinear thruster controller Fig. 1. The structure of AUV control system
The velocity control loop consists of the nonlinear velocity controller (NVC) to ensure the AUV desired dynamic properties when the AUV parameters have nominal values, and the additional sliding controller (SVC) to compensate uncertain or changing AUV parameters. The nonlinear thruster controller (NTC) is used to ensure the desired dynamical properties of the AUV thrusters [17]. It allows to ensure the constant dynamic properties for AUV thrusters when different defects arise and thus to keep high accuracy of AUV movement control. 3.1 The Synthesis of the AUV Velocity Control Loop The model (1) with taking into account of the thruster dynamic (7) are used for synthesis of the AUV velocity control loop. First, we transform two these equations to one. The Eq. (1) is differentiated and applied to (7). As result we get the equation:
˙ + D υ + Tt g˙ + g = τd . (8) Tt M υ¨ + (M + Tt (C + D)υ˙ + Tt C˙ + C + Tt D The Eq. (8) is used for synthesis of AUV velocity loop. The velocity control signal is formed by two signals: τd = τn + τs .
(9)
Development of AUV Two-Loop Sliding Control System
509
The part τn ∈ R6 is the output signal of nonlinear velocity controller which ensures the desired dynamical properties when the AUV parameters have nominal values, and the part τs ∈ R6 is additional control signal to compensate possible deviation of AUV parameters from its nominal values. Such separation allows to form main part of control signal as continuous signal and additional part as discrete signal with reduced amplitude. The Model Based Control (MBC) method [18] is used for synthesis of the nonlinear velocity controller. The signal τn is formed in following view:
˙ n + Cn + Dn )υ + Tt g˙ n + gn , τn = Tt M n α˙ + (Mn + Tt (Cn + Dn ))α + (Tt C˙ n + D α = υ˙ d − e, e = υ − υd ,
(10)
where υd ∈ R6 is the desired value of AUV velocity vector in BCF; ∈ R6×6 is the diagonal matrix of positive coefficients. The compensation of uncertain parameters is provided by means of the sliding controller which forms the additional control signal by following expressions: τs = −K s |e|sign(s), s = e˙ + e,
(11)
where Ks is the diagonal matrix of positive coefficients. Herewith the term |e| allows to decrease amplitude of the discrete signal when AUV parameters close to nominal value. In this case nonlinear controller (10) provides closeness to zero of velocity error. Let consider the Lyapunov function V = sT Tt Ms for defining parameters of controller (11). The derivative of Lyapunov function has following view: ¨ V˙ = sT Tt M s˙ = sT (Tt M υ¨ − Tt M υ¨ d + ˙e) = sT (−Tt M α + Tt M υ).
(12)
Substituting Eq. (10) into (12) we obtain: ˙ V˙ = sT (−Tt M α − (M + Tt (C + D))υ−
˙ + C + D υ − Tt g˙ − g + τd ). Tt C˙ + D
(13)
The expression (13) has the following view with taking into account of the expressions (5), (9)–(11):
˙ + C + D υ − Tt g˙ − g + Tt M n α˙ V˙ = sT −Tt M α − (M + Tt (C + D))υ˙ − Tt C˙ + D
˙ n + Cn + Dn υ + Tt g˙ n + gn − Ks |e|sign(s) + (Mn + Tt (Cn + Dn ))α + Tt C˙ n + D = sT (−Tt M α˙ − (Mn + Tt (Cn + Dn ))s − (M + Tt (C + D ))υ˙
˙ + C + D υ − Tt g˙ − g − Ks |e|sign(s) − Tt C˙ + D
˙ + C + D υ = sT −Tt M α˙ − (M + Tt (C + D ))υ˙ − Tt C˙ + D − Tt g˙ − g − Ks |e|sign(s)) − sT (Mn + Tt (Cn + Dn ))s.
(14)
510
F. Vladimir et al.
Because the condition Mn > 0 is true, the matrix Dn is diagonal matrix with positive entries (see expression (3)) and the matrix Cn is skew symmetric [15] and condition sT Cn s ≡ 0 is true, then for the last term in (12) the following inequality is true: −sT (Mn + Tt (Cn + Dn ))s < 0.
(15)
With taking into account (15) the coefficients of matrix K s must satisfy the following condition to ensure the validity of inequality V˙ < 0:
˙ + C + D υ Ksii > max(−T t M α˙ − (M + Tt (C + D ))υ˙ − Tt C˙ + D − Tt g˙ − g )/|eai |, (16) where eai is an acceptable value of the AUV velocity error. Herewith the values of coefficients Ksii should be calculated with using values of the desired velocities and accelerations which arise when AUV movement along typical trajectories. Therefore, the AUV velocity control system (9)–(11) provides to describe of AUV dynamic by means of linear differential equation: e˙ + e = 0,
(17)
during changing AUV parameters in ranges (6). 3.2 The Synthesis of the AUV Position Control Loop The synthesis of the AUV position loop is carry out after synthesis of velocity loop. The AUV spatial movement is described by means of following equation when control law (10), (11), (16) is used: υ˙ + υ = υ˙ d + υd , η˙ = J (η)υ.
(18)
One rewrites the system (18) as one equation which will use for synthesis of the position control system [13]:
η¨ = J (υ˙ d + υd ) + J˙ − J J −1 η. ˙ (19) The equation described of desired dynamical AUV model in ACF is set in the following form: η¨ + η˙ + δη = η˙ d + δηd ,
(20)
where ηd ∈ R6 is the vector of desired position and orientation of AUV in ACF; δ ∈ R6×6 is the diagonal matrix with positive coefficients which are reasonable choose as δ = 2 /4 [13]. Such a choice of values δ provide the AUV movement to desired position and orientation without overshooting.
Development of AUV Two-Loop Sliding Control System
511
Express variable η¨ from (20) and set given expression to right part of Eq. (19) we obtain:
˙ d − η) J (υ˙ d + υd ) + J˙ − J J −1 η˙ = (η ˙ + δ(ηd − η). (21) From Eq. (21) with taking into account of (16) we obtain:
υ˙ d = J −1 δ(ηd − η) + υ − υd + J −1 η˙ d − J −1 J + J˙ υ.
(22)
It is obvious that the expression (22) which considering AUV kinematic provides AUV movement in accordance with the desired model (20). For implementation of control system (10), (11), (16), (22) it is necessary to set the value of nominal AUV parameters, ranges of their possible changing and matrix defined the desired dynamic properties of AUV. The enter to saturation of AUV thrusters can be excepted by using additional system for automatic formation of the AUV desired velocity [18].
4 The Research of AUV Control System The simulation in Matlab software was carry out for checking efficacy of offered approach. The AUV with following parameters is considered in this simulation: MR = diag 170 kg, 170 kg, 170kg, 10.2 kgm2 , 23.4 kgm2 , 23.4 kgm2 d1 = (18 Ns/m, 105 Ns/n, 105 Ns/m, 20 Nms, 80 Nms, 80 Nms), d2 = 18 Ns2 /m2 , 105 Ns2 /m2 , 105 Ns2 /m2 , 20 Nms2 , 80 Nms2 , 80 Nms2 , xG = 0.0 m, yG = 0.0 m, zB = −0.01 m, xG = 0.0 m, yB = 0.0 mM, zB = 0.0 m, MA = diag 15 kg, 185 kg, 185 kg, 5 kg, 19.6 kgm 2 , 19.6 kgm 2 . During control system synthesis it is supposed that thruster dynamic is described by equation τ = 10(τd − τ ), where τd is signal, which is formed by AUV control system. The saturation of thrusts and torques has the following values: τmax = 150N , Mmax = 150Nm. The ranges of changing the AUV parameters were following: M = diag 140 kg, 140 kg, 140 kg, 4 kgm2 , 7 kgm2 , 7 kgm2 . max = (9 Ns/m, 50 Ns/m, 50 Ns/m, 10 Nms, 30 Nms, 30 Nms), d 1 max = (9 Ns2 /m2 , 50 Ns2 /m2 , 50 Ns2 / m2 , 10 Nms2 , 30 Nms2 , 30 Nms2 ), d 2 x B = 0.02 m, y B = 0.0 m, z B = 0.05 m. (23)
512
F. Vladimir et al.
Matrix defined the desired dynamical properties of AUV has following view: = diag(0.6, 0.6, 0.6, 3, 3, 3), Ks = diag(1000, 200, 200, 200, 1000, 400]). Two modes of the AUV movement are researched during simulation. In the first mode the AUV starts from the point (0 m, 0 m, 0 m) in ACF and finish in the point (5 m, 5 m, 5 m), and in second mode the AUV starts for same point and further moves along trajectory (1.0t, 10cos(0.002t) − 10, 5). The desired values of AUV orientation angles are calculated such that AUV is directed to the target point. The simulation results of the AUV movement by means of the control systems consisting of the nonlinear controller velocity (10), (11), (16) and the position controller (22) when AUV parameters have nominal values are shown in Fig. 2 and 3.
Fig. 2. Process of changing the coordinates (a), the AUV dynamical errors (b), the control thrusts and torque (c) when AUV moves to the desired position and its parameters have nominal values
Development of AUV Two-Loop Sliding Control System
513
Fig. 3. The process of changing the AUV dynamical errors (a), the control thrusts and torques (b) when the AUV moves along the desired trajectory and its parameters have nominal values
The processes of changing the vector of dynamic errors εη = ηd − η = (εx , εy , εz , εϕ , εθ , εψ )T are shown in these figures. As one can see if the AUV parameters have nominal values then AUV enter to the desired position in accordance with the desired dynamic model. The dynamic errors do not exceed 0.2 m when AUV moves along trajectory. Also the discrete control signal has small amplitude that do not lead to reverse in AUV thrusters. Same processes are shown in Fig. 4 and 5 when AUV parameters are changed in ranges (23). As on can see the quality of control process is not changed and correspond of the desired model. This achieves by means of bigger amplitude of signal formed by velocity sliding controller. Increasing this amplitude takes place because of increasing the error of nonlinear velocity controller. Also we can see that presence of high-frequency signal doesn’t lead to reverse of AUV thrusters. It means that using proposed sliding controller doesn’t lead to wear of mechanical parts of AUV thrusters.
514
F. Vladimir et al.
Fig. 4. The process of changing the coordinates (a), the AUV dynamical errors (b), the control thrusts and torque (c) when the AUV moves to desired position and its parameters change in ranges (23)
For comparison the processes of AUV coordinates changing are shown in Fig. 6 and 7 when the typical PID controllers are used in each control channels. The PIDcontrollers are selected for comparison as they most used for underwater robot control. The parameters of PID controllers are tuning such so the transition process in linear coordinates do not exceed 30 s and on rotation coordinates do not exceed 5 s and without overshooting.
Development of AUV Two-Loop Sliding Control System
515
Fig. 5. The process of changing the coordinates (a), the AUV dynamical errors (b), the control thrusts and torques (c) when the AUV moves along desired trajectory and its parameters change in ranges (23)
The process of changing the dynamic errors when the AUV parameters have nominal values is shown in Fig. 9a. In the Fig. 9b the same process is shown when the AUV parameters change in ranges (39). In these figures the AUV moves from the point (0 m, 0 m, 0 m) in ACF to the point (5 m, 5 m, 5 m). The desired values of AUV orientation angles are calculated such that AUV is directed to the target point. In Fig. 9 one can see that changing the AUV parameters leads to the overshooting and increasing time of transient processes. It can be unacceptable when the AUV make maneuvers near underwater object.
Fig. 6. The process of changing the AUV dynamical errors when the AUV moves to desired position by means of PID-controllers
516
F. Vladimir et al.
Fig. 7. The process of changing the AUV dynamical errors when the AUV moves to desired trajectory by means of PID-controllers
5 Conclusions The paper proposes a new synthesis method of high-precision control systems for AUV spatial motion under conditions of uncertainty of their parameters, as well as considering the dynamic properties of thrusters. The proposed control system consists of two nested loops: a velocity control loop and a position control loop. The inner AUV velocity control loop provides decoupling all control channels, compensation of uncertain and variable AUV parameters and ensures desired dynamic properties of AUV with considering of thruster dynamic properties. The external AUV position control loop provides independently control of all linear and rotational degrees of freedom. The simulation results confirmed the operability and high quality of the synthesized systems in various operating modes of the AUV. Acknowledgements. This work is supported by Russian Science Foundation (grant 22–2901156).
References 1. Yuh, J., Marani, G., Blidberg, R.: Applications of marine robotic vehicles. Intell. Serv. Robot. 2, 221–321 (2011) 2. Yu, L., et al.: Inspection robots in oil and gas industry: a review of current solutions and future trends. In: 2019 25th International Conference on Automation and Computing (ICAC), Lancaster, United Kingdom, pp. 1–6 (2019) 3. Lei, M.: Nonlinear diving stability and control for an AUV via singular perturbation, Ocean Eng. 1(197) (2020). https://doi.org/10.1016/j.oceaneng.2019.106824 4. Juul, D.L., McDermott, M., Nelson, E.L., Barnett, D.M., Williams, G.N.: Submersible control using the linear quadratic Gaussian with loop transfer recovery method. In: IEEE Symposium on Autonomous Underwater Vehicle Technology (AUV’94), Cambridge, MA, USA, pp. 417– 425 (1994)
Development of AUV Two-Loop Sliding Control System
517
5. Lakhwani, D.A., Adhyaru, D.M.: Performance comparison of PD, PI and LQR controller of autonomous under water vehicle. In: 2013 Nirma University International Conference on Engineering (NUiCONE), Ahmedabad, pp. 1–6 (2013) 6. Gonzalez, J., Benezra, A., Gomariz, S., Sarriá, D.: Limitations of linear control for CormoranAUV. In: 2012 IEEE International Instrumentation and Measurement Technology Conference Proceedings, Graz, pp. 1726–1729 (2012) 7. Wu, H., Song, S., You, K., Wu, C.: Depth control of model-free AUVs via reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 12(49), 2499–2510 (2019) 8. Liu, X., Zhang, M., Rogers, E.: Trajectory tracking control for autonomous underwater vehicles based on fuzzy re-planning of a local desired trajectory. IEEE Trans. Veh. Technol. 12(68), 11657–11667 (2019) 9. Narasimhan, M., Singh, S.N.: Adaptive optimal control of an autonomous underwater vehicle in the dive plane using dorsal fins. Ocean Eng. 33, 404–416 (2006) 10. Koofigar, H.R.: Adaptive control of underwater vehicles with unknown model parameters and unstructured uncertainties. In: The 2012 Proceedings of SICE Annual Conference (SICE), Akita, pp. 192–196 (2012) 11. Lebedev, A.V., Filaretov, V.F.: The synthesis of multi-channel adaptive variable structure system for the control of AUV. In: The 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, pp. 2834–2839 (2008) 12. Xu, J., Wang, M., Qiao, L.: Dynamical sliding mode control for the trajectory tracking of underactuated unmanned underwater vehicles. Ocean Eng. 105, 54–63 (2015) 13. Filaretov, V., Yukhimets, D.: Synthesis method of control system for spatial motion of autonomous underwater vehicle. Int. J. Ind. Eng. Manage. (IJIEM) 3(3), 133–141 (2012) 14. Dai, P., Lu, W., Le, K., Liu, D.: Sliding mode impedance control for contact intervention of an I-AUV: simulation and experimental validation. Ocean Eng. 196, 2019 (2020). https://doi. org/10.1016/j.oceaneng.2019.106855 15. Fossen, T.: Handbook of Marine Craft Hydrodynamics and Motion Control. Willey, Chichester (2011) 16. Insartsev, A.V., Kisilev, L.V., Kostenko, V.V., et al.: Underwater robotic complexes: systems, technology, application. IMPT FEB RAS, Vladivostok (2018). (in Russian) 17. Filaretov, V.F., Lebedev, A.V., Yukhimets, D.A.: The Devices and Control Systems of Underwater Robots. Nauka, Moskow (2005). (in Russian) 18. Slotine, J.: Applied Nonlinear Control. Prentice-Hall (1991)
An Advanced Terminal Sliding Mode Controller for Robot Manipulators in Position Tracking Problem Anh Tuan Vo1
, Thanh Nguyen Truong1 , Hee-Jun Kang1(B) and Tien Dung Le2
,
1 Department of Electrical, Electronic and Computer Engineering, University of Ulsan,
Ulsan 44610, South Korea [email protected] 2 The University of Danang – University of Science and Technology, 54 Nguyen Luong Bang Street, Danang 550000, Vietnam
Abstract. Scientists have always been drawn to robot manipulators because of their essential role in real applications. There have always been new control algorithms proposed to improve their performance. Following this trend, we develop an advanced terminal sliding mode control (TSMC) method for robot manipulators to address the position tracking problems. The development of the proposed controller is based on a modified sliding mode surface (SMS) and a super twisting control algorithm (STCA). The result of this combination is a worthy performance improvement including higher tracking accuracy, stronger against uncertain components, converges and stabilizes more quickly towards equilibrium. Especially, the proposed controller achieves convergence and stability in finite time. The simulation results performed for a 3 degrees of freedom (DOF) industrial robotic manipulator verify the effectiveness of the proposed method. Keywords: Terminal sliding mode control · Industrial robot manipulators · Super twisting control algorithm
1 Introduction In literature pertaining to the control of industrial robots, several control schemes for position tracking have been proposed for achieving desired performance under a variety of uncertainties, including external disturbances, and dynamical uncertainties. For nonlinear systems such as robot manipulators, inverted pendulums, maglev systems, underwater vehicles, etc., sliding mode control (SMC) has been validated its effectiveness to provide powerful properties in dealing with uncertainties and disturbances [1, 2]. Because of its simplicity and robustness, SMC is often applied in practical applications. Despite these advantages, SMC still has a few limitations to overcome, including only achieving asymptotic stability, extreme chattering, and low performance under uncertain nonlinear situations. In this sense, for systems requiring higher performance © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 518–528, 2022. https://doi.org/10.1007/978-3-031-13832-4_43
An Advanced Terminal Sliding Mode Controller
519
control, smoother control signals, finite-time convergence, and finite-time stability are expectations. The efforts of the researchers were expended to overcome the weaknesses of traditional SMC and achieve the mentioned expectations. Some TSMCs can achieve finitetime convergence but only achieve asymptotic stability which means that the control errors only reach the equilibrium exactly when the time elapses to infinity. Singularity problems have also occurred with other TSMCs [4, 5], or the problem of chattering is not thoroughly solved, the convergence and stability to equilibrium may not be fast enough. To speed up the convergence of TSMCs or solve the singularity problem, variants of TSMC have been introduced as fast terminal sliding mode control (FTSMC) [6, 7] and non-singular terminal sliding mode control (NTSMC) [8]. In addition, a modified method from TSMC, non-singular fast terminal sliding mode control (NFTSMC) [3, 9] was proposed to address both fast convergence and singularity problems. In all three types of methods above, a high-frequency reach law is used to achieve fast convergence and robustness of the control methods generated from first-order sliding mode control (FOSMC), so a chattering problem still exists in all of them. In addition, the maximum possible sliding accuracy obtainable of such methods may be limited with two-sliding accuracy [10]. As a result, improving the performance of the controllers mentioned above remains an important topic. It has been shown that each conventional SMC and TSMC only can be solved separate problems and can be ignored other problems, but this research aims to simultaneously achieve benefits including chattering removal, non-singularity, finite-time convergence, and improved tracking accuracy. From the above analysis, our objective is to design an advanced TSMC for robot manipulators to address the position tracking problems. The development of the proposed controller is based on a modified SMS and a STCA [11, 12]. The modified integral SMS not only avoids the singularity problem but also provides faster convergence in finite time. The combination between the proposed SMS and STCA provides worthy performance improvement including higher tracking accuracy, stronger against uncertain components, the state errors converge and stabilize more quickly towards equilibrium in finite time. In addition, the control input signals are smooth without the chattering phenomenon. Following is the rest of the work. After the introduction, the preliminaries and problem formulations are introduced, followed by the design procedure of the proposed algorithm. For evaluation of the proposed control system, the simulation results are performed for a 3 DOF industrial robotic manipulator and its performance is compared with that of TSMC and FTSMC. Finally, concluding remarks are provided.
2 Preliminary Concepts and Problem Formulations 2.1 Preliminary Concepts Let us use⎧the following notations: [x]0 = sign(x), and [x]ϕ = |x|ϕ sign(x) with ⎨ 1 if x > 0 sign(x) = 0 if x = 0 and ϕ > 0. ⎩ −1 otherwise
520
A. T. Vo et al.
Consider the following nonlinear system: x˙ (t) = f (x, t), x(0) = x0 ,
(1)
with f : Rn × R+ → Rn and assume that f (0, t) = 0. The solutions of (1) are understood in the sense of Filippov. Definition 1 [13]. If the origin of the system (1) is globally asymptotically stable and any solution x(x0 , t) of Eq. (1) reaches the equilibrium at some finite time moment, i.e. x(x0 , t) = 0, ∀t ≥ T (x0 ), where T : Rn → R+ ∪{0} is the settling-time function, the nonlinear system (1) is termed globally finite-time stable around its equilibrium point at the origin. Lemma 1 [14]. The differential equation has the following origin:
Q(j)
β h−j
+ λj−1
Q(j−1)
β h−j+1
β β β ˙ h−1 + λ0 Q] h ¨ h−2 + λ1 Q] + . . . + λ2 [Q] ··· = 0
(2)
is finite-time stable for each j = 1, . . . , h − 1 in that case, it satisfies the following condition where λk (k = 0, . . . , h − 1) are chosen to be sufficient, h ≥ 2 and is an integer, and β is a positive scalar. Lemma 2 [13]. Consider the system below:
ω˙ = −v1 (t)[ω]1/2 − v2 (t)ω + γ . γ˙ = −v3 (t)[ω]0 − v4 (t)ω + χ (t)
(3)
Some unidentified scalars are supposed to be δχ ≥ 0 such that |χ (t)| ≤ δχ . If ρ0 (t) is defined as positive function and its time derivative is given as:
ε if |ω| ≥ δω , (4) ρ˙0 (t) = 0 otherwise where δω denotes positive constant, vm (t), (m = 1, 2, 3, 4) indicate time-varying gains and they are gained by: √ v1 (t) = v10 ρ0 (t); v3 (t) = v30 ρ0 (t) , v2 (t) = v20 ρ0 (t); v4 (t) = v40 ρ02 (t)
(5)
in constants vm0 that satisfy the following constrain: 4v30 v40 ≥ which positive 2 v2 . 8v30 + 9v10 20 Accordingly, the state variables of Eq. (3) converge to the exact origin in finite time.
An Advanced Terminal Sliding Mode Controller
521
2.2 Dynamic Equation of n - DOF Manipulators The dynamic equation of n - DOF manipulators is described as: M (q)¨q + C(q, q˙ )˙q + G(q) + fr (˙q) + τd = τ ,
(6)
where q, q˙ , q¨ ∈ Rn×1 denote vectors of joint angular position, velocity, and acceleration, respectively; M (q) ∈ Rn×n denotes matrix of inertia parameters; C(q, q˙ ) ∈ Rn×1 denotes vector of the Coriolis and centripetal forces; τ ∈ Rn×1 indicates vector of the control input torque; G(q) ∈ Rn×1 , fr (˙q) ∈ Rn×1 , and τd ∈ Rn×1 symbolize vector of the gravitational forces, friction forces, and external disturbance, respectively. A precise model of the robot dynamic is not easily achieved. Thus, it is assumed that: ˆ ˆ ˆ (q) + M (q), C(q, q˙ ) = C(q, M (q) = M q˙ ) + C(q, q˙ ), and G(q) = G(q) + G(q). ˆ ˆ ˆ (q) ∈ Rn×n , C(q, q˙ ) ∈ Rn×n , and G(q) ∈ Rn×1 are corresponding to estimation values M of the real values of M (q), C(q, q˙ ), and G(q). M (q) ∈ Rn×n , C(q, q˙ ) ∈ Rn×n , and G(q) ∈ Rn×1 are uncertain dynamics. T Let us define x = [x1 , x2 ]T = q, q˙ and u = τ ; henceforth, the modelling of the robot dynamic (6) is described in the following formula:
x˙ 1 = x2 , (7) x˙ 2 = a(x)u + b(x) + δ(x, , τd ) ˆ ˆ ˆ −1 (q) C(q, represents all known terms of the q˙ )˙q + G(q) where b(x) = −M −1 ˆ robot, a(x) = M (q) represents a smooth function, while δ(x, , τd ) = ˆ −1 (q)(fr (˙q) + M (q)¨q + C(q, q˙ )˙q + G(q) + τd ) represents all unknown −M uncertainties. For the sake of brevity, all unknown uncertainties including dynamical uncertainties, friction, and external disturbances are collectively referred to as uncertainty. Our objective is to design an advanced TSMC for robot manipulators to address the position tracking problems under the effects of uncertainties. In addition to the absence of singularity and chattering phenomenon, the proposed controller achieves convergence and stability to the equilibrium point in finite time. Here are the descriptions of the position error, velocity error, and acceleration error, respectively: ⎧ ⎨ e1 q − qd (8) e q˙ − q˙ d . ⎩ 2 e˙ 2 q¨ − q¨ d Noting Eq. (8), Eq. (7) is rewritten as:
e˙ 1 = e2 , e˙ 2 = b(e) + a(x)u + δ(e, , τd )
(9)
T = indicates vector of control errors, e1 e2 −1 ˆ ˆ ˆ b(e) = −M (q) C(q, q˙ )˙q + G(q) − q¨ d symbolizes the smooth nonlinear function. where
e
522
A. T. Vo et al.
Assumption 1. Uncertainty is assumed by equality below: δ(e, ˙ , τd ) < δu ,
(10)
where δu is the positive constant.
3 Proposed Controller Scheme 3.1 Proposed Sliding Mode Surface To obtain non-singularity and guarantee a finite-time sliding mode motion, a modification of integral SMS is constructed as h−2 β β β s = e2 − e2 (0) + ∫t0 σ1 [e2 ] h−1 + σ0 [e1 ] h d ι, (11) where s is a SMS, ι denotes the time variable, and σ0 , σ1 are designed coefficient. If s = 0 and s˙ = 0 in the proposed system, it operates in sliding mode. In this case, Eq. (11) yields: h−2 β β β . (12) e˙ 2 = − σ1 [e2 ] h−1 + σ0 [e1 ] h Then, Eq. (12) can be rearranged in the below form: e˙ 1 = e2 β β . [¨e1 ] h−2 + σ1 [e2 ] h−1 + σ0 e1 = 0
(13)
It is observed that the differential formula (13) has a same form as differential formula (2) in Lemma 1 with j = 2. According to Lemma 1, the error states, e(t), exactly reach origin within a finite time with any initial state vectors, e0 . 3.2 Proposed Control Method To design the proposed control law, we proceed as follows. Using dynamical error (9), the time derivative of the SMS (11) is obtained as: h−2 β β β . (14) s˙ = b(e) + a(x)u + δ(e, , τd ) + σ1 [e2 ] h−1 + σ0 [e1 ] h The control torque system is constructed as: u = −b−1 (e) ueq − ustw .
(15)
The term of ueq is designed as: h−2 β β β ueq = b(e) + σ1 [e2 ] h−1 + σ0 [e1 ] h .
(16)
Based on Lemma 2 known as STCA, the reaching term is formulated below: 1 ustw = v1 (t)[s] 2 + v2 (t)s + ∫t0 v3 (t)[s]0 + v4 (t)s d ι. (17) An overview of the control design process is stated in the theorem below.
An Advanced Terminal Sliding Mode Controller
523
Theorem 1. For the robot system (6), we use the modified integral nonlinear SMS (11) and the reaching control law (17). Consequently, the controller formed from the above combination will provide sliding mode movements, i.e., s = 0, which will occur in finite time.
3.3 Stability Analysis Substituting the proposed control laws (15)–(17) into Eq. (14), one has s˙ = δ − ustw
1 = δ − v1 (t)[s] 2 − v2 (t)s − ∫t0 v3 (t)[s]0 + v4 (t)s d ι.
Consequently, we obtain the results the same as Eq. (3) stated in Lemma 2: 1 s˙ = −v1 (t)[s] 2 − v2 (t)s + γ , γ˙ = −v3 (t)[s]0 − v4 (t)s + δ˙
(18)
(19)
where γ = − ∫t0 v3 (t)[s]0 + v4 (t)s d ι + δ. Based on Lemma 2, it implies that s = 0 and γ = 0 will be obtained in finite time. This completes the proof of Theorem 1.
4 Control Performance A 3 DOF robot manipulator was used to simulate the trajectory tracking via the proposed scheme. In the article [15], the readers can find details about the robot and its specifications. In addition, the control performance comparison among other controllers including the proposed controller, TSMC [16], and FTSMC [16] has been performed to investigate their effectiveness in motion tracking problems. The control torque of TSMC is given by:
s = e˙ + c[e]κ , (20) u = −b−1 (e) b(e) + cκ|e|κ−1 e˙ + s + δsign(s) where s is a nonlinear SMS and c, k, , δ are positive constants. The control torque of FTSMC is given by:
s = e˙ + de +c[e]κ , u = −b−1 (e) b(e) + cκ|e|κ−1 e˙ + d e˙ + s + δsign(s) where s is a nonlinear SMS and c, d , k, , δ are positive constants. The tracking trajectory is established as: ⎧ ⎨ x = 0.85 − 0.01t y = 0.2 + 0.2 sin(0.5t) (m). ⎩ z = 0.7 + 0.2 cos(0.5t)
(21)
(22)
524
A. T. Vo et al.
Fig. 1. SOLIDWORKS model of a 3-DOF manipulator.
The manipulator is configured with the following initial state values: q1 = −0.05(rad), q2 = 1(rad), and q3 = −1.2(rad) (Fig. 1). To verify robustness in dealing with uncertainty, we have the following assumptions for all simulation cases:
M (q) = 0.3M (q), C(q, q˙ ) = 0.3C(q, q˙ ), G(q) = 0.3G(q), fr (˙q) = T 0 0 0 0.01 q˙ 1 + 2˙q1 0.01 q˙ 2 + 2˙q2 0.01 q˙ 3 + 2˙q3 (N.m), and 0.8 ⎤ 6 sin(2t) + 2 sin(t) + 4 sin(t/2) + 3 q1 ⎥ ⎢ τd (t) = ⎣ 5 sin(2t) + 2 sin(t) + 1 sin(t/2) + 2 q2 0.8 ⎦(N.m). 0.8 7 sin(2t) + 2 sin(t) + 3 sin(t/3) + 3 q3 ⎡
Table 1 reports the selected control parameters for three different controllers. Table 1. Control parameter selection for or three different control methodologies. Description
Symbol
Value
TSMC
c, κ, , δ
5, 0.8, 5, 16.1
FTSMC
c, d , κ, , δ
5, 5, 0.8, 5, 16.1
Proposed Method
σ1 , σ0 , β, h, ε, δω v01 , v02 , v03 , v04
400, 10, 3, 3, 30, 0.01 10, 10, 30, 200
Figure 2 shows sliding mode surfaces from different control methods. Using an integral SMS (11), we can see that the initial point of the proposed SMS starts at zero,
An Advanced Terminal Sliding Mode Controller
525
s = 0. Moreover, the SMS of the proposed algorithm has a smaller value than TSMC and FTSMC, and it achieves convergence and stability in finite time, while TSMC and FTSMC achieve asymptotic stability only.
Fig. 2. Time histories of sliding mode surfaces from three different control methodologies.
Tracking results including trajectory tracking and tracking errors are displayed Figs. 3 and 4. It is seen that the proposed control provided outstanding control performance compared to that of TSMC or FTSMC. In particular, the proposed controller converges fastest and tracks the trajectory with the highest accuracy among all three methods. Regarding robustness, the proposed controller is the best of the three methods under the effects of uncertainty. Therefore, its tracking accuracy is always maintained at a high level throughout the simulation duration. Look at Fig. 5, only the suggested controller provides a smooth control signal without chattering behavior because of the benefits of STCA. Meanwhile, the control signal of the other two controllers appeared severe chattering behavior because of using the high frequency reaching control law.
526
A. T. Vo et al.
Fig. 3. Desired path and real path under three control methodologies.
Fig. 4. Time histories of X-axis, Y-axis, and Z-axis errors.
An Advanced Terminal Sliding Mode Controller
527
Fig. 5. Control input signals from three different control methodologies.
5 Conclusion This paper has proposed an advanced TSMC methodology for robot manipulators to address the position tracking problems. The combination between the proposed SMS and STCA provided worthy performance improvement including higher tracking accuracy, stronger against uncertain components, the state errors converge and stabilize more quickly towards equilibrium in finite time. In addition, the control input signals are smooth without the chattering phenomenon. Based on simulated results among three controllers, it is concluded that the proposed controller provides the best control performance in some terms including robustness against uncertainty, tracking precision, chattering elimination, and small steady-state error in finite time. Acknowledgement. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2019R1D1A3A03103528).
References 1. Shtessel, Y., Edwards, C., Fridman, L., Levant, A.: Sliding Mode Control and Observation, pp. 1–356. Springer, New York (2014). https://doi.org/10.1007/978-0-8176-4893-0
528
A. T. Vo et al.
2. Truong, T.N., Vo, A.T., Kang, H.-J.: Implementation of an adaptive neural terminal sliding mode for tracking control of magnetic levitation systems. IEEE Access 8, 206931–206941 (2020) 3. Truong, T.N., Vo, A.T., Kang, H.-J., Van, M.: A novel active fault-tolerant tracking control for robot manipulators with finite-time stability. Sensors 21(23), 8101 (2021) 4. Zhao, D., Li, S., Gao, F.: A new terminal sliding mode control for robotic manipulators. Int. J. Control 82(10), 1804–1813 (2009) 5. Li, S., Zhou, M., Yu, X.: Design and implementation of terminal sliding mode control method for PMSM speed regulation system. IEEE Trans. Industr. Inf. 9(4), 1879–1891 (2013) 6. Truong, T.N., Vo, A.T., Kang, H.-J.: A backstepping global fast terminal sliding mode control for trajectory tracking control of industrial robotic manipulators. IEEE Access 9, 31921– 31931 (2021) 7. Doan, Q.V., Vo, A.T., Le, T.D., Kang, H.-J., Nguyen, N.H.A.: A novel fast terminal sliding mode tracking control methodology for robot manipulators. Appl. Sci. 10(9), 3010 (2020) 8. Feng, Y., Yu, X., Han, F.: On nonsingular terminal sliding-mode control of nonlinear systems. Automatica 49(6), 1715–1722 (2013) 9. Vo, A.T., Truong, T.N., Kang, H.J., Van, M.: A robust observer-based control strategy for n-DOF uncertain robot manipulators with fixed-time stability. Sensors 21(21), 7084 (2021) 10. Levant, A.: Higher-order sliding modes, differentiation and output-feedback control. Int. J. Control 76(9–10), 924–941 (2003) 11. Laghrouche, S., Liu, J., Ahmed, F.S., Harmouche, M., Wack, M.: Adaptive second-order sliding mode observer-based fault reconstruction for PEM fuel cell air-feed system. IEEE Trans. Control Syst. Technol. 23(3), 1098–1109 (2015) 12. Tuan, V.A., Kang, H.-J.: A new finite-time control solution to the robotic manipula tors based on the nonsingular fast terminal sliding variables and adaptive super-twisting scheme. J. Comput. Nonlinear Dyn. 14, 031002 (2018) 13. Tran, X.T., Oh, H.: Prescribed performance adaptive finite-time control for uncertain horizontal platform systems. ISA Trans. 103, 122–130 (2020) 14. Ding, S., Levant, A., Li, S.: Simple homogeneous sliding-mode controller. Automatica 67, 22–32 (2016) 15. Vo, A.T., Truong, T.N., Kang, H.-J.: A novel prescribed-performance-tracking control system with finite-time convergence stability for uncertain robotic manipulators. Sensors 22(7), 2615 (2022) 16. Yu, X., Zhihong, M.: Fast terminal sliding-mode control design for nonlinear dynamical systems. IEEE Trans. Circ. Syst. I Fundam. Theory Appl. 49(2), 261–264 (2002)
An Observer-Based Fixed Time Sliding Mode Controller for a Class of Second-Order Nonlinear Systems and Its Application to Robot Manipulators Thanh Nguyen Truong1
, Anh Tuan Vo1 , Hee-Jun Kang1(B) and Tien Dung Le2
,
1 Department of Electrical, Electronic and Computer Engineering, University of Ulsan,
Ulsan 44610, South Korea [email protected] 2 The University of Danang – University of Science and Technology, 54 Nguyen Luong Bang Street, Danang 550000, Vietnam Abstract. An observer-based fixed-time sliding mode controller for a class of second-order nonlinear systems under matched uncertainties and disturbances is proposed and applied to robot manipulators in this paper. To begin with, a fixedtime disturbance observer (FxDO) based on a uniform robust exact differentiator (URED) proactively addresses external disturbances and uncertain terms. As a result of the designed observer, uncertain terms can be precisely approximated within a fixed time and contribute to reducing the chattering and improving the tracking performance of traditional sliding mode controllers. Secondly, on the basis of phase plane analysis and Lyapunov theory, we construct a modified fixedtime non-singular terminal sliding surface as well as a guaranteed closed-loop convergence time independent of initial states. Consequently, an observer-based fixed-time sliding mode controller was developed based on the combination of a designed fixed-time disturbance observer and a fixed-time sliding mode method. Finally, the proposed controller is applied to a 3-DOF FARA robot manipulator to demonstrate its effectiveness. Keywords: Fixed time observer · Second-order nonlinear systems · Fixed time sliding mode control · Industrial robot manipulators
1 Introduction Due to its computational simplicity and in particular robustness against matched uncertainties or disturbances, sliding mode control (SMC) has been widely used in industrial applications, such as robots, aircraft, power systems, and so on. SMC involves determining the control laws so that the system trajectory intersects a sliding surface. It’s well-known that conventional SMC laws are discontinuous, and high-frequency control switching may lead to chattering. The obvious drawback of conventional SMC can be overcome by many techniques, such as the boundary layer approach [1], the higher-order sliding mode [2, 3], the disturbance observer [4–7], the super-twisting technique [8], © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 529–543, 2022. https://doi.org/10.1007/978-3-031-13832-4_44
530
T. N. Truong et al.
neural-network-based methods [9, 10], among others. Conventional SMC is normally prescribed as a switching function for a linear sliding surface. A concept in the SMC community known as terminal sliding mode control (TSMC) uses a non-linear sliding mode surface to achieve fast or finite-time convergence. Unfortunately, the singularity problem hinders the TSMC’s application in real-life scenarios. There have been several proposals to eliminate or avoid singularities to some extent. They can be mentioned as follows. For the higher-order nonlinear systems, the TSMC in [11] used a saturation function in control design to overcome the singularity. A finite-time disturbance observer was used in [12] to solve the singular problem for dynamic systems subjected to unmatched disturbances. However, they did not derive explicit settling time calculations for the non-singular TSMC above. Besides finite-time stability, SMC also has a property called fixed-time stability [13], which can calculate the boundedness of a settling time regardless of initial conditions. A survey for fixed-time stability and convergent analysis of advanced SMC can be found in a paper [13]. The work [14] considers an important application of the concept of fixed-time stability to design a URED based on a super twisting algorithm (STA). Moreover, [15] show that any finite-time convergent homogeneous SMC can be transformed into a fixed-time convergence one through a discrete dynamic extension. The existence of uncertain terms is a universal property of nonlinear systems. It is therefore necessary that the controllers be designed as robust and durable as possible to resist these obvious effects. To reduce the computational burden, approximate the uncertain components, and improve control performance, several observer-based controllers have been introduced. For instance, a few FTCs of the robot manipulators have been proposed such as the extended observer-based synchronous SMC scheme [16] and FTC based on a higher-order observer [17]. Several more observers have also been proposed, including the high-gain observer (HGO) [18] and the third-order sliding mode observer (TOSMO) [19]. Nevertheless, they only provide asymptotic stability or finite-time stability. From the above discussion, an observer-based controller for a class of secondorder nonlinear systems under matched uncertainties and disturbances is proposed and applied to robot manipulators in this paper which provides a predefined convergence time independent of initial conditions. Our paper has the following specific contributions: • by designing the FxDO, uncertain terms within nonlinear systems can be accurately approximated within a fixed time and chattering can be extremely decreased in the proposed controller’s inputs; • the global fixed-time stability is guaranteed with the design of the proposed method; • with the control parameters, the proposed method’s convergence time independent of initial states can be estimated a priori; • high tracking accuracy, chattering reduction, strong anti-uncertainty ability, and fast convergence of both tracking errors and approximate errors were achieved by the proposed controller; • due to its simple design, the controller can be extended to robotics, aircraft, power systems, and so on; • the proposed controller verified definitively that it was able to obtain global stability in fixed time utilizing the Lyapunov theory.
An Observer-Based Fixed Time Sliding Mode Controller
531
Following is the remainder of this paper. Section 2 presents the preliminary concepts and problem statement. Next, Sect. 3 discusses the main results. Following that, Sect. 4 presents numerical simulations. Section 5 concludes this work.
2 Preliminary Concepts and Problem Formulation 2.1 Preliminary Concepts This subsection introduces some of the concepts and lemmas used in this paper. Consider the autonomous system as follows: x˙ (t) = f (x(t)), x(0) = ε0 and f : → where x ∈ equilibrium point of Eq. (1). Rn
Rn
Rn
(1)
is a nonlinear function. Assume that the origin is an
Definition 1. [20] The origin of Eq. (1) is said to be a finite-time stable equilibrium if the equilibrium point of Eq. (1) is Lyapunov stable, and any solution x(t) starting from ε0 satisfies lim x(t, ε0 ) = 0 for all t ≥ T (ε0 ), where T : Rn → R+ is called the settling x→∞ time function. Definition 2. [20] The origin is considered to be a fixed-time stable equilibrium if it is globally finite-time stable and its convergence time is bounded by T (ε0 ) < Tmax , where Tmax > 0 is a positive number. Lemma 1. [20] Consider the following scalar differential equation: m n y˙ = −z 1 y − z2 y
(2)
where z1 , z2 > 0, m > 1, and 0 < n < 1. As a result, the system (2) is globally stable in fixed time with the following settling time: 1 1 1 1 + (3) z1 m − 1 z2 1 − n k To make the presentation easier, we use notation such as y = |y|k sign(y), ∀k > 0, y ∈ R. T1 < Tmax =
2.2 Problem Formulation A general nonlinear second-order system is described as follows: x˙ 1 = x2 (4) x˙ 2 = H (x, t) + B(x, t)u + L(x, t) T where x = x1T , x2T ∈ R2n is the system state vector, H (x, t) ∈ Rn , B(x, t) ∈ Rn×n are the smooth nonlinear functions with H (0) = 0. L(x, t) represents the whole disturbances and uncertainties, and u is the control input. The following assumptions are assumed for the design procedure of the suggested control algorithm in the sequel.
532
T. N. Truong et al.
Assumption 1: Assume that the whole disturbances and uncertainties are limited by: L(x, t)∞ ≤ ϒ
(5)
where ϒ is a positive constant. Assumption 2: It is assumed that the first derivative of the whole disturbances and uncertainties is also bounded by: ˙ t)∞ ≤ L(x,
(6)
where is a positive constant. The objective of our article is to develop a new control input law u to guarantee that the variable x1 will precisely track the reference trajectory despite the influence of disturbances and uncertainties.
3 Design of the Proposed Controller 3.1 Design of a Fixed-Time Disturbance Observer The whole disturbances and uncertainties are accurately approximated in a fixed time by a FxDO. This observer is constructed based on a URED [14], as follows: ⎧ ⎪ ⎨ ω0 = x2 − x2 x˙ 2 = H (x, t) + B(x, t)u + L − α1 ψ1 (ω0 ) (7) ⎪ ⎩˙ L = −α2 ψ2 (ω0 )
where x2 , L respectively indicate the estimated values of x2 , L. α1 > 0 and α2 > 0 are observer gains. Based on URED in [14], the terms ψ1 (ω0 ) and ψ2 (ω0 ) are designed as follows:
1 3 ψ1 (ω0 ) = [ω0 ] 2 + ϑ[ω0 ] 2 (8) ψ2 (ω0 ) = 21 sign(ω0 ) + 2ϑω0 + 23 ϑ 2 [ω0 ]2 where ϑ is a positive constant. ˙ t)∞ ≤ is satisfied, > 0, then the estimation error Theorem 1. If the term L(x, of the designed observer (7) will converge to zero in a fixed time, independent of the initial condition. Proof of Theorem 1. Subtracting Eq. (7) from Eq. (4), we can obtain the estimation dynamic errors as follows:
ω˙ 0 = −α1 ψ1 (ω0 ) + L˜ (9) L˙˜ = −α2 ψ2 (ω0 ) − L˙
˜ ∞ ≤ ξ0 , ξ0 > 0. where L˜ L − L is the approximation error of L, L˜ is bounded by L
An Observer-Based Fixed Time Sliding Mode Controller
The gains α1 and α2 are selected in the following set: √
= (α1 , α2 ) ∈ R2 0 < α1 ≤ 2 Ξ , α2 >
√ ∪ (α1 , α2 ) ∈ R2 α1 > 2 Ξ , α2 > 2Ξ
α12 4
+
4Ξ 2 α12
533
(10)
By referring to the URED [14], we can realize that the estimation dynamic errors between the designed observer in Eq. (9) and the URED in Eq. (3) have the same form. Therefore, it has been proved in [14] that Eq. (9) is fixed-time stable. By selecting appropriate gains α1 and α2 (follows Eq. (10)), we can determine that ω0 and L˜ will converge to zero in fixed-time To (see Eq. (12) in [14]). As a result, after a fixed time To , then x2 = x2 and L = L.
3.2 Design of a Fixed-Time Sliding Surface (FxSS) Let e xd − x1 and e˙ x˙ d − x2 as the tracking positional error and the tracking velocity error, xd is the reference trajectory. So, the FxSS is designed as s = e˙ + μ1 [e]2−β + μ2 e + μ3 [e]β
(11)
T where s = s1 s2 · · · sn ∈ Rn is a vector of sliding surfaces, μ1 , μ2 and μ3 are positive coefficients, 0 < β < 1. According to SMC theory, the following equations must be satisfied when the tracking positional error operates in the sliding mode [21]: s = 0 and s˙ = 0
(12)
From Eqs. (11) and (12), the sliding mode dynamics are given by e˙ = −μ1 [e]2−β − μ2 e − μ3 [e]β
(13)
It is seen that Eq. (13) has the same form as Eq. (14) in [22]. Therefore, by using the same proof procedure as theorem 1 in [22], we completely obtain the same result as [22], which means the error states will to zero in fixed converge time with the following settling time Ts ≤
2 (1−β) 4μ1 μ3 −μ22
π 2
− arctan
μ2 4μ1 μ3 −μ22
.
3.3 Design of Observer-Based Fixed Time Sliding Mode Controller The proposed controller is developed using the designed observer and the SMC approach follows: u = ueq + usw + uo
(14)
The control term ueq plays the role to maintain the error states on the sliding surfaces. To obtain this term, we need to calculate the time derivative of the FxSS. Then, consider
534
T. N. Truong et al.
it in case s˙ = 0 along with the nominal system without the presence of uncertain components. The attainment of ueq is described below. Equation (4) can be described in errors’ state-space below e¨ = x¨ d (t) − x˙ 2 = x¨ d (t) − H (x, t) − B(x, t)u − L(x, t)
(15)
Calculating the derivative of the FxSS in Eq. (11) and noting Eq. (15), we yield s˙ = e¨ + (2 − β)μ1 |e|1−β e˙ + μ2 e˙ + βμ3 |e|β−1 e˙ = x¨ d (t) − H (x, t) − B(x, t)u − L(x, t) + (2 − β)μ1 |e|1−β e˙ + μ2 e˙ + βμ3 |e|β−1 e˙
(16)
Consequently, we obtain the control term ueq using the constraint condition s˙ = 0: ueq = B−1 (x, t) x¨ d (t)− H (x, t) + (2 − β)μ1 |e|1−β e˙ + μ2 e˙ + βμ3 |e|β−1 e˙
(17)
In addition, a fixed time switching control law (FxSCL) is constructed to manipulate error states those converge to zero in fixed time in the reaching phase. usw = B−1 (x, t) ξ0 sign(s) + ξ1 [s]p + ξ2 [s]q (18) where ξ0 , ξ1 , ξ2 are positive constants, p > 1, 0 < q < 1. And u0 is designed based on the output of the observer as (19) u0 = B−1 (x, t) L
Finally, the proposed control law u obtains u = ueq + usw + uo = B−1 (x, t) x¨ d (t) − H (x, t) + (2 − β)μ1 |e|1−β e˙ + μ2 e˙ + βμ3 |e|β−1 e˙ + ξ0 sign(s) + ξ1 [s]p + ξ0 [s]q + L
(20)
The proposed control method structure is shown in form of a block diagram in Fig. 1. Theorem 2. For the dynamic system of Eq. (4), if the actuation control input is designed as Eq. (20) which is constructed from the output of the FxDO in Eq. (7), the FxSS in Eq. (11), and the FxSCL in Eq. (18). Then, the system is globally fixed time stable. Proof of Theorem 2. By substituting the proposed control law in Eq. (20) into Eq. (16), we obtain
s˙ = L − L − ξ0 sign(s) − ξ1 [s]p − ξ0 [s]q ∼
= L −ξ0 sign(s) − ξ1 [s]p − ξ2 [s]q
(21)
An Observer-Based Fixed Time Sliding Mode Controller
535
Fig. 1. The block diagram of the proposed control method.
From Eq. (21), a component of s˙ is expressed by s˙i = L˜ i − ξ0 sign(si ) − ξ1 [si ]p − ξ2 [si ]q , i = 1, 2, ..., n
(22)
To investigate the fixed-time stability of the proposed control system, a Lyapunov function is chosen as Vi = si2 , i = 1, 2, . . . , n. Next, we calculate the time derivative of the Lyapunov function in the following way: V˙ i = 2si s˙i = 2si L˜ i − ξ0 sign(si ) − ξ1 [si ]p − ξ2 [si ]q (23) ≤ 2|si | L˜ i − ξ0 − 2ξ1 |si |p+1 − 2ξ2 |si |q+1 ≤ −2ξ1 |si |p+1 − 2ξ2 |si |q+1 From Eq. (23), we can conclude that the proposed control system is globally stable because Vi > 0 and V˙ i < 0, fulfilling Lyapunov theory. To demonstrate a globally fixed-time stable of the proposed control system, Eq. (23) is rewritten in the following form: V˙ i ≤ −2ξ1 |Vi |
p+1 2
− 2ξ2 |Vi |
q+1 2
(24)
It is easily seen that Eq. (24) has the same form as the Lemma 1, so the reaching time is calculated as Tr ≤
1 2 2 1 + 2ξ1 p − 1 2ξ2 1 − q
(25)
We can conclude that the proposed control method can converge to zero in the following fixed time: T = Ts + Tr + To π 2 = 2 − arctan (1−β) 4μ1 μ3 −μ22 2 2 + 2ξ11 p−1 + 2ξ12 1−q + To
This proof is completed.
μ2
4μ1 μ3 −μ22
(26)
536
T. N. Truong et al.
4 Numerical Simulation Results and Discussion The proposed control method is applied to position tracking control of robot manipulators. The dynamic equation of an n-link robotic manipulator is given as (see [23]) M (φ)φ¨ + C φ, φ˙ φ˙ + G(φ) + F φ˙ + ΔD = τ (27) ˙ φ¨ ∈ Rn are defined as the vectors of joint angle position, joint angular where φ, φ, velocity, and joint angular acceleration, respectively. M (φ) = M (φ) + δM (φ) ∈ Rn×n represents a real inertia matrix, C φ, φ˙ = C φ, φ˙ +δC φ, φ˙ ∈ Rn×n represents a real matrix of Coriolis and centrifugal force, = G (φ)+δG(φ) ∈ Rn×1 represents a real G(φ) n×1 ˙ represents a friction matrix, D ∈ Rn×1 matrix of the gravitational force, F φ ∈ R represents an external disturbance matrix, τ ∈ Rn×1 is defined as a control torque matrix. M (φ), C φ, φ˙ , and G (φ) are approximated matrices of M (φ), C φ, φ˙ , and G(φ), respectively. δM (φ), δC φ, φ˙ , δG(φ) are uncertain terms of the robot dynamic model. We can rewrite Eq. (27) as follows: −1 φ¨ = M (φ) −C φ, φ˙ φ˙ − G (φ) + + τ (28)
in which = −δM (φ)φ¨ − δC φ, φ˙ φ˙ − δG(φ) − F φ˙ − ΔD . The robotic dynamic model in Eq. (28) can be transferred to the following secondorder state-space: x˙ 1 = x2 (29) x˙ 2 = H (x, t) + B(x, t)u + L(x, t) ˙ x = xT xT T ∈ R2n is the system state vector, H (x, t) = where x1 = φ, x2 = φ, 1 2 −1 −1 −1 M (φ) −C φ, φ˙ φ˙ − G (φ) , B(x, t) = M (φ), L(x, t) = M (φ), u = τ . As we can see, Eq. (29) is the same as Eq. (1), which is a general second-order nonlinear system equation. Therefore, the proposed controller can directly apply to the robotic system presented in Eq. (29). Tests are carried out using a SAMSUNG FARA ROBOT AT2 3-DOF robotic manipulator. Our simulations are performed utilizing SIMULINK/MATLAB with an ODE5 0.001s time step. The robot’s mechanical model is designed on SOLIDWORK software (as shown in Fig. 2) and embedded in the SIMULINK/MATLAB environment through the tool SIMSCAPE MULTIBODY LINNK. Consequently, the robot’s simulation model is not different from the robot’s actual mechanical model. The design parameters of the robot system are described in Table 1.
An Observer-Based Fixed Time Sliding Mode Controller
537
Fig. 2. 3D Model of FARA ROBOT AT2 3-DOF robotic mechanical system in SOLIDWORKS.
Table 1. The design parameters of the FARA ROBOT AT2 3-DOF robotic manipulator. Mass (kg)
Length (mm)
T Center of Mass lcx , lcy , lcz (mm)
T Inertia Ixx , Iyy , Izz (kg.m2 )
Link 1
33.429
250
[0, 0, −74.610]T
[0.7486, 0.5518, 0.5570]T
Link 2
34.129
700
[347.7, 0, 0]T
[0.3080, 2.4655, 2.3938]
Link 3
15.612
600
[314.2, 0, 0]T
[0.0446, 0.7092, 0.7207]
The friction forces at joints are assumed as T F φ˙ = 0.01sign φ˙ 1 + 2φ˙ 1 ; 0.01sign φ˙ 2 + 2φ˙ 2 ; 0.01sign φ˙ 3 + 2φ˙ 3 and the external disturbances are added to the joints as ⎤ ⎡ −5(1 − exp(−0.4t) − 0.3sin(0.8t)) D = ⎣ −3(1 − exp(−0.4t) − 0.1sin(0.5t)) ⎦ −1.8(1 − exp(−0.4t) − 0.1sin(1.6t)) The uncertain terms of the robot dynamic model are assumed below δM (φ) = 0.15M (φ); δC φ, φ˙ = 0.15C φ, φ˙ ; δG(φ) = 0.15G(φ) The desired trajectory of the robot end-effector is set as ⎤ ⎡ ⎤ ⎡ xr 0.43 + 0.01sin(0.5t) ⎦ ⎣ yr ⎦ = ⎣ 0.06sin(0.5t) 0.26 + 0.06cos(0.25t) zr
(30)
(31)
(32)
(33)
538
T. N. Truong et al.
To validate the effectiveness of the proposed control method, a comparison of tracking control performance is performed between the proposed controller and NFTSMC in recent work [24]. The comparison is conducted based on several criteria such as convergence speed, position tracking control accuracy, and control input signals. The NFTSM controller [24] has the control input as ⎧ q q 2 ) w arctan(e )˙ w ⎪ s = e + c(1 + e e ⎪ i i i i i ⎪ ⎪ (−1) ⎪ ⎪ ⎨ u = Bn (x, t)(ue q + us w) q 2− q w (34) ueqi = x¨ di (t) − Hni (x, t) + cq (1 + ei2 ) w −1 (1 + 2q ei arctan(ei )˙ei w ) ⎪ w ⎪ ⎪ ⎪ u = ϕ0 sign(si ) + ϕ1 Si ⎪ ⎪ ⎩ swi i = 1, 2, . . . , n where q, w are positive odd integers, 1 < wq < 2, c, ϕ0 , ϕ1 are positive constants. The parameters of the NFTSMC method and the proposed controller have been selected experimentally to obtain the best performance and they are presented in Table 2. Table 2. The selected parameters of both controllers Symbols
Value
NFTSMC
q, w, c, ϕ0 , ϕ1
Proposed control method
α 1 , α2 μ1 , μ2 , μ3 , β ξ0 , ξ1 , ξ2 , p, q
5, 3, 1, 6.1, 10 √ 2 60, 120 5, 5, 5, 0.8 0.1, 10, 10, 1.4, 0.6
A comparison over time between the assumed value of the whole disturbances and the output value of the observer is shown in Fig. 3. From Fig. 3, we can see clearly that FxDO provided fast convergence and high accuracy. By using the accurate approximation value provided by the FxDO, the tracking performance of the proposed method can be greatly enhanced. The comparison of the desired trajectory and the real trajectory of the end-effector is depicted in Fig. 4. Both controllers provided a good tracking performance even if there are the existence of the disturbances and uncertainties in the robot system, as shown in Fig. 4. For a more detailed analysis, Fig. 5 shows the tracking angular errors at the joints of the two controllers. Both controllers provided high control accuracy as depicted in Fig. 5. With the accurately estimated information of FxDO combined with the fixed-time SMC that is formed from the FxSS and the FxSCL, we can easily realize that the proposed controller offered a faster convergence and better tracking accuracy than the NFTSMC.
An Observer-Based Fixed Time Sliding Mode Controller
Fig. 3. The assumed value of whole disturbances and the FxDO’s output value.
Fig. 4. The desired trajectory and the real trajectory of the robot’s end-effector.
539
540
T. N. Truong et al.
Fig. 5. The tracking errors at joints of two controllers.
A comparison between the control torque signals generated from the two controllers is shown in Fig. 6. As it can be clearly seen, the NFTSMC generated the control input signals with serious chattering behavior because it used the sign(•) function with a large enough sliding gain to counteract the whole uncertainty and disturbance. On the other hand, the proposed controller produced smoother torque signals since the entire disturbance and uncertainty had been accurately estimated by the FxDO, which allows us to use a small sliding gain to compensate for the estimation errors of the observer. By providing smoother control signals of the proposed control method, the mechanical system and the electrical components of the robot system will be significantly improved longevity.
An Observer-Based Fixed Time Sliding Mode Controller
541
Fig. 6. The control torque inputs at joints of two controllers.
5 Conclusion In our work, an observer-based fixed time sliding mode controller has been developed for a class of second-order nonlinear systems. The proposed control method was designed based on a fixed-time disturbance observer and a fixed-time sliding mode method which was constructed by a fixed-time sliding mode surface and a fixed-time reaching law. Therefore, both observer estimation error and control tracking error quickly converge to zero in a predetermined fixed time. The stability and fixed-time convergence of the proposed control algorithm have been proven based on Lyapunov theory and fixed time control theory. The proposed controller has been applied to a 3-DOF FARA industrial robotic manipulator and its control performance has been compared to NFTSMC. From the comparison results, we could see that the proposed controller has provided a better tracking performance with higher tracking accuracy, faster convergence, and smoother control signals than NFTSMC. Furthermore, the proposed controller can also apply to other systems, such as aircraft, magnetic levitation systems, and other non-linear second-order systems.
542
T. N. Truong et al.
Acknowledgement. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF2019R1D1A3A03103528).
References 1. Boiko, I.M.: Chattering in sliding mode control systems with boundary layer approximation of discontinuous control. Int. J. Syst. Sci. 44(6), 1126–1133 (2013) 2. Van, M.: Higher-order terminal sliding mode controller for fault accommodation of Lipschitz second-order nonlinear systems using fuzzy neural network. Appl. Soft Comput. 104, 107186 (2021) 3. Truong, T.N., Vo, A.T., Kang, H.-J.: A backstepping global fast terminal sliding mode control for trajectory tracking control of industrial robotic manipulators. IEEE Access 9, 31921– 31931 (2021) 4. Vo, A.T., Truong, T.N., Kang, H.J.: A novel tracking control algorithm with finite-time disturbance observer for a class of second-order nonlinear systems and its applications. IEEE Access 9, 31373–31389 (2021) 5. Truong, T.N., Vo, A.T., Kang, H.-J., Van, M.: A novel active fault-tolerant tracking control for robot manipulators with finite-time stability. Sensors 21 (23), (2021) 6. Truong, T.N., Kang, H.-J., Vo, A.T.: An active disturbance rejection control method for robot manipulators. In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 190–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_16 7. Vo, A.T., Truong, T.N., Kang, H.-J.: A novel fixed-time control algorithm for trajectory tracking control of uncertain magnetic levitation systems. IEEE Access 9, 47698–47712 (2021) 8. Vo, A.T., Truong, T.N., Kang, H.-J.: A novel prescribed-performance-tracking control system with finite-time convergence stability for uncertain robotic manipulators. Sensors 22 (7) (2022) 9. Truong, T.N., Vo, A.T., Kang, H.-J.: Implementation of an adaptive neural terminal sliding mode for tracking control of magnetic levitation systems. IEEE Access 8, 206931–206941 (2020) 10. Nguyen Truong, T., Tuan Vo, A., Kang, H.-J., Le, T.D.: A neural terminal sliding mode control for tracking control of robotic manipulators in uncertain dynamical environments. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Hussain, A. (eds.) ICIC 2021. LNCS, vol. 12837, pp. 207–221. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84529-2_18 11. Feng, Y., Yu, X., Han, F.: On nonsingular terminal sliding-mode control of nonlinear systems. Automatica 49(6), 1715–1722 (2013) 12. Yang, J., Li, S., Su, J., Yu, X.: Continuous nonsingular terminal sliding mode control for systems with mismatched disturbances. Automatica 49(7), 2287–2291 (2013) 13. Polyakov, A., Fridman, L.: Stability notions and Lyapunov functions for sliding mode control systems. J. Franklin Inst. 351(4), 1831–1865 (2014) 14. Cruz-Zavala, E., Moreno, J.A., Fridman, L.M.: Uniform robust exact differentiator. IEEE Trans. Autom. Control 56(11), 2727–2733 (2011) 15. Levant, A.: On fixed and finite time stability in sliding mode control. In: 52nd IEEE Conference on Decision and Control, pp. 4260–4265 (2013) 16. Le, Q.D., Kang, H.-J.: Finite-time fault-tolerant control for a robot manipulator based on synchronous terminal sliding mode control. Appl. Sci. 10(9), 2998 (2020)
An Observer-Based Fixed Time Sliding Mode Controller
543
17. Nguyen, V.-C., Vo, A.-T., Kang, H.-J.: A finite-time fault-tolerant control using nonsingular fast terminal sliding mode control and third-order sliding mode observer for robotic manipulators. IEEE Access 9, 31225–31235 (2021) 18. Won, D., Kim, W., Tomizuka, M.: High-gain-observer-based integral sliding mode control for position tracking of electrohydraulic servo systems. IEEE/ASME Trans. Mechatron. 22(6), 2695–2704 (2017) 19. Nguyen, V.-C., Vo, A.-T., Kang, H.-J.: A non-singular fast terminal sliding mode control based on third-order sliding mode observer for a class of second-order uncertain nonlinear systems and its application to robot manipulators. IEEE Access 8, 78109–78120 (2020) 20. Zuo, Z.: Non-singular fixed-time terminal sliding mode control of non-linear systems. IET Control Theory Appl. 9 (4), 545–552 (2015) 21. Utkin, V.I.: Sliding modes in control and optimization. Springer Science & Business Media (2013) 22. Tran, X.-T., Kang, H.-J.: Continuous adaptive finite-time modified function projective lag synchronization of uncertain hyperchaotic systems. Trans. Inst. Meas. Control. 40(3), 853– 860 (2018) 23. Craig, J.J.: Introduction to robotics: mechanics and control. Pearson Educacion (2005) 24. Zhai, J., Xu, G.: A novel non-singular terminal sliding mode trajectory tracking control for robotic manipulators. IEEE Trans. Circuits Syst. II Express Briefs 68(1), 391–395 (2021)
A Robust Position Tracking Strategy for Robot Manipulators Using Adaptive Second Order Sliding Mode Algorithm and Nonsingular Sliding Mode Control Tan Van Nguyen1 , Cheolkeun Ha2 , Huy Q. Tran3(B) and Nguyen Thi Hoa Cuc1
, Dinh Hai Lam1 ,
1 School of Engineering - Technique, Thu Dau Mot University, Thu Dau Mot, Binh Duong,
Vietnam 2 Robotics and Mechatronics Lab, University of Ulsan, Ulsan 44610, Republic of Korea 3 Faculty of Engineering and Technology, Nguyen Tat Thanh University, Ho Chi Minh City,
Vietnam [email protected]
Abstract. Nowadays, robot manipulators play an essential role in an industrial environment. Along with the expansion of robot applications, robot control has attracted the attention of many researchers. This paper proposes a robust control method that handles robot manipulator systems’ lumped uncertainty and unknown fault. In this paper, a non-singular sliding mode controller is also applied to keep the robot tracking the desired trajectory under the adverse effects of the lumped uncertainty and unknown fault. In addition, we suggest an adaptive second-order sliding mode algorithm to minimize the controller’s chattering phenomenon and eliminate the impact of the above uncertainty and fault. Furthermore, the effectiveness of the suggested controller is obtained via numerical simulations on a 2-link robot. The proposed controller creates a continuous control signal with high accuracy, less chattering problem, finite-time convergence, and more robustness. Keywords: Nonsingular sliding mode control · Second-order sliding mode algorithm · Adaptive control · 2-link robot
1 Introduction Nowadays, along with the expansion of robot applications, control of robot manipulators has attracted many researchers’ attention [1–3]. In this research field, there are two main challenges that need to be overcome. First, robot manipulators are complex dynamic systems; no matter how elaborate modeling methods are used, the model uncertainty always exits. Second, faults are frequently incurred in the system due to the complex structure of modern industrial robots and long-term operations. To solve the challenges of uncertainties of control systems, various methods have been studied and developed by researchers. The sliding mode control (SMC) [4–6], especially the non-singular sliding mode control (NSSMC) - an improved version of SMC, is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 544–554, 2022. https://doi.org/10.1007/978-3-031-13832-4_45
A Robust Position Tracking Strategy for Robot Manipulators
545
a popular approach that has been broadly applied in robot manipulator control problems in recent years [7–9] thanks to the outstanding control properties of the NSSMC, such as simple design procedure, high control precision, finite-time convergence, singularity elimination, and robustness against the effects of the lumped uncertainty and unknown fault [10–13]. The design procedure of the NSSMC includes two steps: 1) constructing a sliding function to describe the desired dynamics, and 2) establishing a discontinuous control law to assist control state variables in achieving the sliding function. However, using discontinuous control law may lead to bias, which is the main disadvantage of NSSMC. In addition, the upper bound of the lumped uncertainty and unknown fault is also necessary for the designed procedure. To decrease the chattering phenomenon, reducing or even eliminating the discontinuous control element in the switching control law is necessary. A promising approach is to use an observer such as an extended state observer [14, 15], sliding mode observer [16], and disturbance observer [17, 18] to approximate the lumped uncertainty and unknown fault in the system. After obtaining estimated signals, the authors design a compensator to reduce its effects. Using this strategy, the selected switching gain in the discontinuous control element can be smaller depending on the estimation accuracy of the observer. As a result, the chattering is decreased. However, using an extra observer makes the algorithm more complicated thus increases the system’s computational time. Another method to reduce the chattering phenomenon is using a continuous control law instead of a discontinuous one, such as saturating approximation [19] or boundary layer technique [20]. Using these methods, the authors reduce the chattering problem; however, as a trade-off, the tracking performance of robot manipulators is also decreased. In another approach, the super-twisting algorithm (STA) [21, 22] is a great way to overcome the chattering phenomenon. The STA has been widely applied based on superior features such as providing continuous control signals and high tracking accuracy. Similarly, an improved version of STA is introduced in [23], called the quadratic sliding mode (QSM) algorithm. Compared with the STA, the QSM algorithm provides a faster convergence speed while maintaining tracking accuracy. From the above issues, we propose a robust control approach that handles the lumped uncertainty and unknown fault of robot manipulator systems. An NSSMC is utilized to keep the robot tracking the desired trajectory under the effects of the lumped uncertainty and unknown fault. Furthermore, an adaptive QSM (AQSM) algorithm is suggested to provide continuous control signals to reduce the chattering phenomenon. An adaptive law also eliminates the lumped uncertainty and unknown fault requirement. The main contributions of this paper are summarized as follows: • Proposing a control algorithm to handle the lumped uncertainty and unknown fault effects of 2-link Robot (2LR). • Reducing the tracking error of the robot system by using continuous control signals. • Presenting the system’s finite-time stability (FTS) based on the Lyapunov function. The remainder of this paper is arranged as follows. We present the dynamic equation of the n-link robot manipulator (n-LRM) in Sect. 2. After that, we design an AQSM algorithm based on NSSMC in Sect. 3. In Sect. 4, numerical simulations of the proposed control algorithm are executed on a 2LR. Finally, Sect. 5 gives some conclusions.
546
T. Van Nguyen et al.
2 Problem Formulation First, we consider a n-LRM under the effects of the modelling uncertainty and unknown fault as follows: ρ¨ = M −1 (ρ)[τ − C(ρ, ρ) ˙ ρ˙ − G(ρ) − F(ρ, ρ) ˙ − τd ] + ϕ(ξ )
(1)
where ρ, ρ, ˙ ρ¨ ∈ n are the position, velocity, and acceleration, respectively; The terms ˙ ∈ n , G(ρ) ∈ n , and F(ρ, ρ) ˙ ∈ n are the inertia matrix, M (ρ) ∈ n×n , C(ρ, ρ) the Centripetal and Coriolis forces, the gravitational force term, and friction vector, respectively; The vectors τd ∈ n , and ϕ(ξ ) ∈ n , and τ ∈ n represent disturbance vector, unknown fault, and control input signal, respectively; ξ is time variable. To be simpler in expression, the robot system (1) is rewritten as: ρ¨ = N (ρ, ρ) ˙ + M −1 (ρ)τ + (ρ, ρ, ˙ ξ)
(2)
˙ − τd ] + ϕ(ξ ) denotes the lumped uncertainty where (ρ, ρ, ˙ ξ ) = M −1 (ρ)[−F(ρ, ρ) ˙ ρ˙ −G(ρ)] denotes the nominal model and unknown fault; N (ρ, ρ) ˙ = M −1 (ρ)[−C(ρ, ρ) of robot. Equation (2) can be converted into state-space as: x˙ 1 = x2 x˙ 2 = N (x) + M −1 (x1 )u(ξ ) + (x, ξ )
(3)
T where x1 = ρx2 = ρx ˙ = x1T x2T , u(ξ ) = τ . Our main purpose in this paper is to design a robust control controller that has capability to handle the effects of lumped uncertainty and the unknown fault with minimum tracking error. In addition, the requirement of lumped uncertainty and unknown fault is eliminated.
3 Controller Design 3.1 Design of the NSSMC Algorithm The tracking error can be expressed as follows: ζ = x1 − xd where xd represents the expected trajectory. A NSSMC switching function is chosen as in [8]: t δ = ζ˙ + κ2 |ζ˙ |β2 sign(ζ˙ ) + κ1 |ζ |β1 sign(ζ ) dt
(4)
(5)
0
where the constants κ1 , κ2 are selected such that the polynomial κ2 p + κ1 is Hurwitz and 2β1 . β1 , β2 can be selected as β1 = (1 − ε, 1), ε ∈ (0, 1) and β2 = 1+β 1
A Robust Position Tracking Strategy for Robot Manipulators
The NSSMC law is designed as follows: u = −M (x1 ) ueq + usw
547
(6)
β ueq = N (x) + κ2 ζ˙ 2 sign ζ˙ + κ1 |ζ |β1 sign(ζ ) − x¨ d
(7)
usw = (D + μ)sign(δ)
(8)
where μ is selected as a small positive constant, D is a positive constant and D follows the condition |(x, ξ )| ≤ D. Theorem 1. From Eq. (3), the system is stable, and the tracking error converges to zero in finite time if the control signal is designed as in (6–8). Proof. Taking the time-derivative of the switching function (5), we obtain: β δ˙ = ζ¨ + κ2 ζ˙ 2 sign ζ˙ + κ1 |ζ |β1 sign(ζ ) β = x˙ 2 − x¨ d + κ2 ζ˙ 2 sign ζ˙ + κ1 |ζ |β1 sign(ζ ) = −¨xd + N (x) + M −1 (x1 )u(ξ ) + (x, ξ ) β + κ2 ζ˙ 2 sign ζ˙ + κ1 |ζ |β1 sign(ζ )
(9)
Substitute the control law (6–8) into (9), we can achieve: δ˙ = −(D + μ)sign(δ) + (x, ξ )
(10)
Considering V function is presented as: V =
1 T δ δ 2
(11)
Taking the time-derivative of Eq. (11) and substituting the result from (10), we can obtain: V˙ = δ T δ˙ = δ T (−(D + μ)sign(δ) + (x, ξ )) = −(D + μ)|δ| + (x, ξ )δ ≤ −μ|δ| √ = − 2μV 1/2
(12)
According to the Theorem 4.2 in [24], we can concluded that the system (3) is stable, and the tracking error converges to zero in finite time. Therefore, the Theorem 1 is proved.
548
T. Van Nguyen et al.
3.2 Design of AQSM Algorithm Based on NSSMC With the NSSMC switching function is chosen in (5), an AQSM algorithm based on NSSMC is proposed as follows (13) u = −M (x1 ) ueq + ua−sosm ueq = N (x) + κ2 |ζ˙ |β2 sign(ζ˙ ) + κ1 |ζ |β1 sign(ζ ) − x¨ d
(14)
ua−sosm = ρ1 (ξ )|δ|1/2 sign(δ) + ρ3 (ξ )δ + ϑ ϑ˙ = ρ2 (ξ )sign(δ) + ρ4 (ξ )δ
(15)
√ where the time-varying parameters are defined as ρ1 (ξ ) = p1 L(ξ ),ρ2 (ξ ) = p2 L(ξ ), ρ3 (ξ ) = p3 L(ξ ), and ρ4 (ξ ) = p4 L2 (ξ ) with the update law: ˙ ) = kL , if |δ| = 0 L(ξ (16) 0, else The parameters p1 , p2 , p3 , p4 , kL are designed positive constant and satisfy the following condition 4p2 p4 > 8p2 p32 + 9p12 p32
(17)
Theorem 2. with Equation of the robot systems (3), and the control signal is designed as (13–15), the system achieves tracking error that converges to zero and is stable in finite time. Proof: The time-derivative of the switching function (5) is rewritten as: δ˙ = −¨xd + N (x) + M −1 (x1 )u(ξ ) + (x, ξ ) α + κ2 ζ˙ 2 sign ζ˙ + κ1 |ζ |α1 sign(ζ )
(18)
Substitute the control law (13–15) into (18), we obtain: 1 δ˙ = −ρ1 (ξ )|δ| /2 sign(δ) − ρ3 (ξ )δ + ϑ d ϑ˙ = −ρ2 (ξ )sign(δ) − ρ4 (ξ )δ + (x, ξ ) dt
(19)
Equation (19) is in the form of QSM algorithm, its stability and finite time convergence is successfully proved in [25]. Therefore, it can be concluded that Eq. (3) reaches a steady state, and the control error signal converges to zero in finite time when the designed control signal is (13–15). Thus, the Theorem 2 is proved.
A Robust Position Tracking Strategy for Robot Manipulators
549
Remark 1. In practical application, the sliding motion condition |δ| = 0 in (16) cannot be reached. Therefore, the parameter L(ξ ) keeps increasing continuously. To make the adaptive algorithm (16) practically implementable, the following adaptive law can be utilized: ˙ = kL , if |δ| ≥ η (20) L(t) 0, else where η is a small positive constant. The block diagram of the proposed algorithm is shown in Fig. 1.
Fig. 1. Block diagram of the proposed algorithm.
4 Simulation Results In this section, we use numerical simulation to demonstrate the performance of the proposed controller on 2LR thanks to a dynamic model, as shown in [26]. The model of the robot is displayed in Fig. 2.
Fig. 2. Structure of 2LR.
550
T. Van Nguyen et al.
The friction and disturbance are assumed as: 2 cos(1.8 ρ˙1 ) F(ρ, ρ) ˙ = 0.5 sin ρ˙2 + π 3
sin 3ρ1 + π 2 − cos(ξ ) τd = −1.2 cos(2ρ2 ) + 0.45 sin(ξ )
(21)
The fault is assumed to be happened at the first joint and the second joint at the 10th and 15th second, respectively. −13sin π t 7 ϕ(ξ ) = (22) 12.7 cos π t 5 + π 2 The expected trajectory is assumed as 1.05 cos π t 6 −1 xd = 1.2 sin π t 7 + π 2 − 1
(23)
In this part, the designed parameters of the NSSMC, the QSM algorithm,
and the = diag(15, 15), κ = diag(10, 10), β = 1 2, β2 = adaptive law are selected as κ 1 2 1
2 3, p1 = 2, p2 = 14, p3 = 2.5, p4 = 30, D = diag(16, 16), kL = 10, η = 0.05. To evaluate the effectiveness of the proposed control algorithm, a comparison is performed between the two controllers: 1) the NSSMC-1, which is designed with the discontinuous switching law in (6–8); and 2) the NSSMC-2, in which the QSM algorithm replaces the discontinuous switching law. The numerical simulation results are illustrated from Fig. 3 and Fig. 4. The position tracking and the tracking error signal of the controllers are shown in Fig. 3 and Fig. 4, respectively. As shown in the results, the NSSMC provides very high tracking performance. Thanks to the implementation of the QSM algorithm, the NSSMC-2 provides higher tracking performance while the convergence speed is the same compared to the NSSMC-1. The proposed controller maintains the tracking performance with the unknown prior knowledge of the lumped uncertainty and unknown fault. However, because of the adaptive law, its convergence speed is a little slower when an abrupt fault happens. The control input results at each joint are shown in Fig. 5. Thanks to the continuous control signal of the QSM algorithm and the AQSM algorithm, the chattering phenomenon is almost eliminated. The achieved adaptive gain is finally shown in Fig. 6.
A Robust Position Tracking Strategy for Robot Manipulators
Fig. 3. Position tracking at each joint.
Fig. 4. Tracking error of compared controllers
551
552
T. Van Nguyen et al.
Fig. 5. Control input torque of compared controllers.
Fig. 6. Estimated gain of adaptive law.
A Robust Position Tracking Strategy for Robot Manipulators
553
5 Conclusions In this paper, we proposed a robust control approach using an AQSM algorithm based on NSSMC for robot manipulators without the lumped uncertainty and unknown fault requirement. The NSSMC kept the robot manipulator tracking the desired trajectory with high accuracy under the harmful effects of the lumped uncertainty and unknown fault. The AQSM algorithm provided a continuous control signal which helps to minimize the chattering phenomenon. In addition, we eliminated the requirement of the lumped uncertainty and unknown fault. Finally, the FTS system was proven using the Lyapunov stability theory. The numerical simulation results showed that the proposed solutions helped to provide a continuous control signal with high accuracy, less chattering problem, finite-time convergence, and more robustness. Acknowledgment. This research was supported by Research Foundation funded by Thu Dau Mot University.
References 1. Elsisi, M., Mahmoud, K., Lehtonen, M., Darwish, M.M.F.: An improved neural network algorithm to efficiently track various trajectories of robot manipulator arms. IEEE Access 9, 11911–11920 (2021) 2. Brahmi, B., Driscoll, M., Laraki, M.H., Brahmi, A.: Adaptive high-order sliding mode control based on quasi-time delay estimation for uncertain robot manipulator. Control Theory Technol. 18(3), 279–292 (2020). https://doi.org/10.1007/s11768-020-9061-1 3. Van, M., Ceglarek, D.: Robust fault tolerant control of robot manipulators with global fixedtime convergence. J. Franklin Inst. 358(1), 699–722 (2021) 4. Utkin, V.I.: Sliding modes in control and optimization. Springer Science & Business Media (2013) 5. Nguyen, V.-C., Le, P.-N., Kang, H.-J.: An active fault-tolerant control for robotic manipulators using adaptive non-singular fast terminal sliding mode control and disturbance observer. Actuators 10(12), 332 (2021) 6. Sai, H., Xu, Z., He, S., Zhang, E., Zhu, L.: Adaptive nonsingular fixed-time sliding mode control for uncertain robotic manipulators under actuator saturation. Isa Trans. 123, 46–60 (2022) 7. Wang, Y., Zhu, K., Chen, B., Jin, M.: Model-free continuous nonsingular fast terminal sliding mode control for cable-driven manipulators. ISA Trans. 98, 483–495 (2020). https://doi.org/ 10.1016/j.isatra.2019.08.046 8. Nguyen, V.C., Vo, A.T., Kang, H.J.: A non-singular fast terminal sliding mode control based on third-order sliding mode observer for a class of second-order uncertain nonlinear systems and its application to robot manipulators. IEEE Access 8, 78109–78120 (2020). https://doi. org/10.1109/ACCESS.2020.2989613 9. Zaare, S., Soltanpour, M.R.: Adaptive fuzzy global coupled nonsingular fast terminal sliding mode control of n-rigid-link elastic-joint robot manipulators in presence of uncertainties. Mech. Syst. Signal Process. 163, 108165 (2022) 10. Van Mavrovouniotis, M., Ge, S.S.: An adaptive backstepping nonsingular fast terminal sliding mode control for robust fault tolerant control of robot manipulators. IEEE Trans. Syst. Man, Cybern. Syst. 49(7), 1448–1458 (2018)
554
T. Van Nguyen et al.
11. Ferrando Chacon, J.L., Kappatos, V., Balachandran, W., Gan, T.H.: A novel approach for incipient defect detection in rolling bearings using acoustic emission technique. Appl. Acoust. 89, 88–100 (2015). https://doi.org/10.1016/j.apacoust.2014.09.002 12. Li, R., Yang, L., Chen, Y., Lai, G.: Adaptive sliding mode control of robot manipulators with system failures. Mathematics 10(3), 339 (2022) 13. Anjum, Z., Zhou, H., Guo, Y.: Self-tuning fuzzy nonsingular proportional-integral-derivative type fast terminal sliding mode control for robotic manipulator in the presence of backlash hysteresis. Trans. Inst. Meas. Control 44(4), 809–819 (2022) 14. Saleki, A., Fateh, M.M.: Model-free control of electrically driven robot manipulators using an extended state observer. Comput. Electr. Eng. 87, 106768 (2020) 15. Razmjooei, H., Shafiei, M.H.: A new approach to design a finite-time extended state observer: uncertain robotic manipulators application. Int. J. Robust Nonlinear Control 31(4), 1288–1302 (2021) 16. Nguyen, V.-C., Vo, A.-T., Kang, H.-J.: A finite-time fault-tolerant control using nonsingular fast terminal sliding mode control and third-order sliding mode observer for robotic manipulators. IEEE Access 9, 31225–31235 (2021) 17. Vo, A.T., Kang, H.-J.: A novel fault-tolerant control method for robot manipulators based on non-singular fast terminal sliding mode control and disturbance observer. IEEE Access 8, 109388–109400 (2020) 18. Razmjooei, H., Shafiei, M.H., Palli, G., Arefi, M.M.: Non-linear finite-time tracking control of uncertain robotic manipulators using time-varying disturbance observer-based sliding mode method. J. Intell. Robot. Syst. 104(2), 1–13 (2022) 19. Li, T.-H.S., Huang, Y.-C.: MIMO adaptive fuzzy terminal sliding-mode controller for robotic manipulators. Inf. Sci. (Ny) 180(23), 4641–4660 (2010) 20. Slotine, J.J.E., Li, W., et al.: Applied Nonlinear Control, vol. 199, no. 1. Prentice Hall, Englewood Cliffs (1991) 21. Davila, J., Fridman, L., Poznyak, A.: Observation and identification of mechanical systems via second order sliding modes. In: International Workshop on Variable Structure Systems, VSS 2006, pp. 232–237 (2006) 22. Davila, J., Fridman, L., Levant, A.: Second-order sliding-mode observer for mechanical systems. IEEE Trans. Automat. Contr. 50(11), 1785–1789 (2005) 23. Moreno, J.A., Osorio, M.: A Lyapunov approach to second-order sliding mode controllers and observers. In: 2008 47th IEEE Conference on Decision and Control, pp. 2856–2861 (2008) 24. Bhat, S.P., Bernstein, D.S.: Finite-time stability of continuous autonomous systems. SIAM J. Control Optim. 38(3), 751–766 (2000) 25. Laghrouche, S., Liu, J., Ahmed, F.S., Harmouche, M., Wack, M.: Adaptive second-order sliding mode observer-based fault reconstruction for PEM fuel cell air-feed system. IEEE Trans. Control Syst. Technol. 23(3), 1098–1109 (2014) 26. Van, M., Kang, H.-J., Suh, Y.-S.: A novel fuzzy second-order sliding mode observer-controller for a TS fuzzy system with an application for robot control. Int. J. Precis. Eng. Manuf. 14(10), 1703–1711 (2013)
Intelligent Data Analysis and Prediction
A Hybrid Daily Carbon Emission Prediction Model Combining CEEMD, WD and LSTM Xing Zhang1(B) and Wensong Zhang2 1 Sinounited Investment Group Corporation Limited, Beijing 102611, China
[email protected] 2 School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Abstract. In order to improve the short-term prediction accuracy of carbon emissions, a new hybrid daily carbon emission prediction model is proposed in this paper, and secondary decomposition is introduced for carbon emission prediction for the first time. First, the data is decomposed into several IMFs by complementary ensemble empirical mode decomposition (CEEMD). Then, the IMF1 is decomposed again by wavelet decomposition (WD), and the rest IMFs are reconstructed according to the sample entropy (SE). Finally, Long Shor Term Memory (LSTM) is used to predict daily carbon emission. In order to verify the validity of the model, the daily carbon emission data of China, United States (US) and World are used for empirical analysis. In the performance comparison experiment, CEEMD-WD-LSTM model proposed in this paper has the best performance among all comparison models, and secondary decomposition of carbon emissions significantly improves the MAPE, R2 and RMSE. The results show that the model proposed in this paper is effective and robust, and can predict daily carbon emissions more accurately. Keywords: Daily carbon emission prediction · CEEMD · WD · LSTM · SE
1 Introduction Since the industrial revolution, coal, oil and gas have gradually become the main energy consumed in human economic activities, which release a large amount of carbon dioxide, leading to greenhouse effect and global warming. According to the Copenhagen Diagnosis, by 2100, global temperatures are likely to rise by 7 °C, sea levels are likely to rise by more than 1 m, and arctic ice is shrinking by 2.7% per decade. Facing the reality of global climate deterioration, all countries in the world are working hard to deal with the climate challenge. At the 75th United Nations General Assembly in 2020, President Xi Jinping announced that China will achieve carbon peak by 2030 and carbon neutral by 2060. The formulation of emission reduction targets and policies requires high-precision prediction results as a reference, so the research on carbon emission prediction is particularly critical. The long-term prediction can provide a basis for the formulation and implementation of long-term policies. However, in order to understand the dynamic trend of carbon emissions and develop short-term mitigation targets, short-term carbon emission prediction play a key role, which will give us enough time to respond. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 557–571, 2022. https://doi.org/10.1007/978-3-031-13832-4_46
558
X. Zhang and W. Zhang
However, due to the volatility of daily carbon emission data, short-term carbon emissions forecasting become very difficult. There are a lot of researches on long-term carbon emission prediction at home and abroad, which can be divided into influencing factor prediction and time series prediction. In the first category, the influencing factors highly correlated with carbon emissions are selected from different angles, and then they are used as inputs to predict carbon emissions [1]. However, the more factors that are considered, the more data need be collected. As the second category, time series forecasting is a method of predicting future development based on past trends. It highlights the role of time in forecasting and ignores the influence of external factors [2]. However, due to the limited sample size of annual carbon emissions, the prediction accuracy of the model will be affected to some extent, as scholars’ research on long-term carbon emissions prediction. In the beginning, the carbon emissions prediction was mainly based on classical time series prediction methods, such as grey prediction method [3]. As the carbon emissions series are highly nonlinear and non-stationary, traditional statistical and quantitative models are often unable to effectively deal with this nonlinear problem. Therefore, scholars have begun to introduce machine learning prediction technology to deal with nonlinear fluctuations of carbon emissions, such as BP [4], SVM [5], ELM [6], LSTM [7] and other prediction models. Carbon emission daily monitoring data are volatile and cyclical, which contains a lot of noise. To solve this problem, many scholars began to use the decomposition method to filter out the unwanted noise effectively, and decompose the complex carbon emission sequence into more simple and regular modes, which is beneficial for prediction. When the data is huge and highly unstable, it is necessary to process the data before prediction and decompose the unstable data into several stable sequences [8]. Empirical mode decomposition (EMD) [9] and wavelet decomposition (WD) [10] are commonly used to decompose multi-scale time series. However, the subsequence generated after first decomposition still has strong non-stationarity and complexity, and there are few literatures on this issue. In order to make up the gap of existing research, a new daily carbon emission prediction model is proposed in this paper, which combines CEEMD, WD and LSTM. The carbon emission time series was preprocessed into several intrinsic mode functions (IMF) by CEEMD, and then IMF1 was decomposed by WD. At last, LSTM is used to predict all sub-sequences, and the final prediction result is obtained under the idea of “decomposition first and then integration”. Due to the nonlinear, non-stationary and complex characteristics of daily carbon emission data, direct prediction will lead to a large prediction error. Considering that CEEMD can effectively reduce the nonstationarity of original data and help improve the prediction accuracy of nonlinear data. In this paper, CEEMD is used for the first decomposition. Since IMF1 generated by CEEMD still has strong non-stationarity and complexity, this paper carries out second decomposition by WD for its highly adaptable. In order to verify the effectiveness of WD, this paper compares WD with variational mode decomposition (VMD) and fast ensemble empirical mode decomposition (FEEMD). The main innovations and contributions of this paper are as follows:
A Hybrid Daily Carbon Emission Prediction Model
559
(1) In this paper, the secondary decomposition algorithm is used to preprocess the original data to reduce the difficulty of carbon emission prediction. The method of “CEEMD-WD” is used to process the time series of daily carbon emission successively, and a series of sub-series with low complexity and strong regularity are obtained, which is helpful to improve the prediction accuracy. (2) LSTM was innovatively introduced into daily carbon emissions prediction, and the superiority of LSTM was verified by comparing with BP and ELM. (3) To verify the robustness of CEEMD-WD-LSTM model proposed in this paper, empirical analysis on three different types of daily carbon emission data sets are conducted respectively, including China, United States (US) and World.
2 Methods 2.1 Complementary Ensemble Empirical Mode Decomposition (CEEMD) Empirical Mode Decomposition (EMD) can decompose the original time series adaptively into several IMFs with different scales [11, 12]. In view of the mode aliasing phenomenon existing in EMD, Ensemble Empirical Mode Decomposition (EEMD) added Gaussian white noise into the whole time-frequency space for many times, and then EMD decomposition is performed to obtain multiple mean IMF components as the final decomposition result [13, 14]. In order to reduce the residual auxiliary noise of EEMD, complementary ensemble empirical mode decomposition (CEEMD) firstly adds k pairs of positive and negative random Gaussian white noise to the original sequence, and then EMD is used to decompose the sequence after adding white noise. The white noise added by CEEMD is independent identically distributed and has opposite signs, which will cancel each other. Hence, CEEMD can reduce the residual auxiliary noise in the original signal and ensure the small reconstruction error after decomposition [15, 16]. The succinct calculation procedures of CEEMD are as follows. (1) Add a pair of standard white noise of the same magnitude and 180° phase angle difference to the original signal x t , two kinds of new signals xp+ (t) and xp− (t) are obtained. xp+ (t) = xt + λp (t) p = 1, 2, · · · , P xp− (t) = xt − λp (t)
(1)
+ − (2) Execute EMD on xp+ (t) and xp− (t) respectively, and Ci,p (t) and Ci,p (t) are generated. (3) Repeat steps (1) and (2) for P times until a smooth decomposition signal is acquired. The ensemble pattern of all corresponding IMFs is computed by the following Eq. (3).
Ci =
P 1 + − Ci,p (t) + Ci,p (t) 2P
(2)
p=1
where, P is the iteration time. C i is the i-th final IMF component derived from CEEMD.
560
X. Zhang and W. Zhang
2.2 Wavelet Decomposition (WD) Wavelet decomposition is a multi-scale analysis of time series signals in time and frequency domain simultaneously. The original time signal is filtered by low pass filter and high pass filter, thus the corresponding low-frequency components and high-frequency components are obtained. The low-frequency component is the main part of the original signal, which contains the overall variation trend, while the high-frequency component contains the details of the signal, and often reflects the short-term fluctuation [17, 18]. Carbon emission data are discrete data, so discrete wavelet transform is introduced, which is expressed as follows: xt = LK (t) +
K
Hk (t)
k=1
=
N
cK,n ϕK,n (t) +
n=1
N K
dk,n ψk,n (t)
(3)
k=1 n=1
where, xt is the time series data in t period, N is the length of time series data, and K represents the number of discrete wavelet transform.LK (t) and Hk (t) respectively represent the low-frequency and high-frequency component respectively in t period; cK,n (t) and dk,n (t) respectively represent the decomposition coefficients corresponding to low-frequency and high-frequency components. ϕK,n (t) 和 ψk,n (t) represent low-pass filter and high-pass filter respectively. 2.3 Long Short Term Memory (LSTM) LSTM is derived from recurrent neural network (RNN) improved by Hochreater and Schmidhuber, which solve effectively the problem of long-distance dependency. LSTM introduces the concepts of Cell State and Gate, which makes it more adaptable than RNN. In LSTM model, the information at time t will be combined with the output at time t − 1 and the memory unit at time t − 1, and processed through three gated structures: forget gate, input gate and output gate. The forget gate will discard or retain the information of the last moment. The input gate will store the effective information of the current moment, and the output gate will process the information that can be input at the next moment [19, 20]. (1) Forget gate layer. The information ft is filtered according to the output ht-1 of the previous moment and the input x t of the current moment. ft = σ ωf · ht−1 , xt + bf (4) where, ω represents the weight, b represents the bias, and σ represents the Sigmoid function. (2) Input gate layer
A Hybrid Daily Carbon Emission Prediction Model
561
The Sigmoid function is used to determine the values to be updated. it = σ ωi · ht−1 , xt + bi
(5)
the tanh function is used to generate candidate values C˜ t . C˜ t = tanh ωC · ht−1 , xt + bC
(6)
(3) The cell state Cell states are determined by the results of forget gate and input gate. The cell state can control the transmission of information to the next moment, discarding unwanted information and updating new information. Ct = ft · Ct−1 + it · C˜ t
(7)
(4) Output layer The initial output Ot is obtained by Sigmoid function, and the C t value is scaled by tanh function. The final output can be obtained by multiplying the two. ot = σ ωo · ht−1 , xt + bo (8) ht = ot · tanh(Ct )
(9)
2.4 The Process of the Proposed Model The overall processing process of CEEMD-WD-LSTM model proposed in this paper is shown in Fig. 1, with specific contents as follows: (1) CEEMD was used to decompose the original time series of daily carbon emissions into several sub-series, including IMF1 , IMF2 ,…, IMFn (2) WD further decomposed IMF1 sequence into several sub-sequences, including A1 , A2 ,…, Am (3) The remaining IMFs are reconstructed according to sample entropy, and V1 , V2 ,…, Vk are obtained. (4) The partial autocorrelation function (PACF) is used to select the historical data which has the highest correlation with the target data as the input of LSTM. (5) All the sub-sequences are predicted by LSTM model, including A1 , A2 ,…, Am and V1 , V2 ,…, Vk (6) The final prediction result can be obtained by adding the predicted values of each sub-sequences.
3 Data Source and Evaluation Indicators This paper selects carbon emission daily monitoring data of China, US and World from January 1, 2019 to December 31, 2021 for empirical analysis. Figure 2 shows the basic
562
X. Zhang and W. Zhang
characteristics of the three datasets. All the data are from CEADs website (www.ceads. net.cn/). Each data set was divided into training set and test set before prediction. The training set is used to train the model and the test set is used to verify the model performance. In this study, the samples from January 1, 2019 to March 12, 2021 are classified as training set accounting for about 80%, and the samples from March 13, 2021 to December 31, 2021 are classified as test set accounting for about 20%. Three common error evaluation indexes are used to comprehensively evaluate the prediction performance of the model, including mean absolute percentage error (MAPE), root mean square error (RMSE) and goodness of fit (R2 ). The smaller MAPE and RMSE, and the closer R2 is to 1, the better prediction performance of the model is.
Fig. 1. Flow chart of the CEEMD-WD-LSTM model
A Hybrid Daily Carbon Emission Prediction Model
563
Fig. 2. Daily carbon emission data in China, US and World
R =1− 2
n t=1
y t − yt
2 n t=1
2 yt − yt
1 n
yt
y t − yt t=1 n 2 1 n RMSE = y t − yt t=1 n
MAPE =
(10) (11)
(12)
where n represents the data amount of test set, yt and y t are the actual and predicted values respectively.
4 Empirical Analysis 4.1 Data Preprocessing Here, China’s daily carbon emission data are taken as an example to illustrate the process of data preprocessing. Due to space constraints, the data processing results for US and World are not shown in this paper. (1) the First Decomposition. The original sequence is decomposed by CEEMD, and eight sub-sequences and PACF analysis results can be obtained, as shown in Fig. 3.
564
X. Zhang and W. Zhang
Fig. 3. PACF results of CEEMD sequences in China datasets
(2) the Second Decomposition. Considering the complexity of IMF1 , the second decomposition of IMF1 was performed
Fig. 4. PACF results of WD sequences in China datasets
A Hybrid Daily Carbon Emission Prediction Model
565
by WD, two sub-sequences and PACF results can be obtained, as shown in Fig. 4. Just for comparison, VMD and FEEMD are also used for the second decomposition of IMF1 , the decomposition results and PACF analysis results can be obtained, as shown in Figs. 5 and 6.
Fig. 5. PACF results of VMD sequences in China datasets
Fig. 6. PACF results of FEEMD sequences in China datasets
(3) Sequence Reconstruction. Calculate the sample entropy of IMFs obtained by CEEMD except for IMF1 , and the SE results are shown as Fig. 7.
566
X. Zhang and W. Zhang
2 1.5 1 0.5 0 IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 IMF8 Fig. 7. The sample entropy of China dataset
According to the result of sample entropy (SE), the subsequences with high similarity are integrated into a new sequence. The new sequence is denoted by Vn , and the original 7 IMFs were reconstructed to obtain 4 new sequences. The reconstruction results are summarized in Table 1. Table 1. Sequence reconstruction result of China dataset New sequences
V1
V2
V3
V4
Original IMFs
IMF2
IMF3 , IMF4
IMF5
IMF6 , IMF7 , IMF8
(4) Input Determination. According to the results of PACF, the lag order of each sequence is determined. The inputs of LSTM are summarized in Table 2. Table 2. Model input of China dataset Series
Lag
Series
Lag
Original
xt−1
A1
xt−2 , xt−3
IMF1
xt−2
A2
xt−2
IMF2
xt−1 , xt−2
V1
xt−2 , xt−5
IMF3
xt−1
V2
xt−2 , xt−4
IMF4
xt−1
V3
xt− , xt−4
IMF5
xt−1
V4
xt−1
IMF6
xt−1
F1
xt−2
IMF7
xt−1
F2
xt−1 , xt−2 , xt−
IMF8
xt−
F3
xt−1 , xt−2 , xt−3
A Hybrid Daily Carbon Emission Prediction Model
567
4.2 Comparative Experiments In this paper, three datasets are used for empirical analysis and three error evaluation indexes are used to compare the performance of seven prediction models. As the parameters setting may influence the prediction accuracy, it is necessary to specify the models’ parameters and the specifications are shown in Table 3. Table 3. Parameters of the proposed model and its comparison models Model
Parameters
LSTM
L = 10; Batch-size = 20; Time-step = 20; Learning rate = 0.0006; Training epoch = 500
BPNN
L = 10; learning rate = 0.0004
ELM
L = 10; g(x) =‘sig’;
where, L represents the hidden layer neuron number, g (x) represents the hidden layer activation function. The values of each parameter in Table 3 are repeatedly adjusted through the simulation process to finally obtain a satisfactory value. The prediction results comparison of seven models in the three datasets is shown in Fig. 8. A partially zoomed-in curve and a curve with one-time lag were added to show that the method is not lagged.
Fig. 8. Test set prediction results comparison in China dataset
Then, the error evaluation indexes MAPE, RMSE and R2 of each model in the three datasets are calculated and the bar chart is drawn, as shown in Fig. 9, 10 and 11.
568
X. Zhang and W. Zhang
In order to verify the effectiveness and robustness of CEEMD-WD-LSTM model, comparative experiments were constructed from three aspects, and seven prediction models are compared. The specific structure is shown in the Fig. 12. Part 1 is used to determine which prediction method is best for carbon emission prediction. Part 2 is used to prove whether first decomposition is necessary, Part 3 is used to prove whether second decomposition is necessary, and which decomposition method is more advantageous.
Fig. 9. Model error evaluation results in China dataset
Fig. 10. Model error evaluation results in US dataset
Fig. 11. Model error evaluation results in World dataset
(1) Experiment 1: Comparison of single models The performance of BPNN, ELM and LSTM models without data pretreatment was compared. It can be seen that LSTM model has the best performance in all three data sets. Compared with BPNN and ELM, The MAPE and RMSE of LSTM model are the minimum, and R2 are the maximum. Hence, the learning ability of LSTM model is stronger than that of BP and ELM, it can well capture the characteristics of complex and highly nonlinear of daily carbon emission data.
A Hybrid Daily Carbon Emission Prediction Model
569
(2) Experiment 2: Comparison between single decomposition model and single model According to experiment 1, LSTM is the optimal single prediction model. Therefore, in experiment 2, only single model (LSTM) was compared with single decomposition model (CEEMD-LSTM). It can be seen that, compared with LSTM, the MAPE, RMSE and R2 of CEEMD-LSTM are improved by 57.36%, 59.25% and 15.94% respectively in Chinese dataset, 58.38%, 58.18% and 39.75% respectively in US dataset, 58.23%, 57.57% and 21.11% respectively in World dataset. The experimental results show that data decomposition is necessary and effective for complex daily carbon emission prediction.
Fig. 12. Comparative experimental structure
(3) Experiment 3: Comparison between secondary decomposition model and single decomposition model In this experiment, the performance of secondary decomposition models (CEEMD-WD-LSTM, CEEMD-VMD-LSTM, CEEMD-FEEMD-LSTM) were compared with that of single decomposition model (CEEMD-LSTM). It can be seen that in the three datasets, all the secondary decomposition models are superior to single decomposition model, which proves that the second decomposition of carbon emission data contributes to the improvement of prediction accuracy. In addition, the effects of CEEMD-WD-LSTM, CEEMD-VMD-LSTM and CEEMDFEEMD-LSTM were compared. The results showed that the prediction effect of
570
X. Zhang and W. Zhang
CEEMD-WD-LSTM model was optimal from any evaluation criteria in the three datasets. At the same time, it is found that the prediction results obtained by the three secondary decomposition methods show a fixed order: WD > VMD > FEEMD. This indicates that compared with VMD and FEEMD, WD is the most suitable method for the second decomposition of carbon emission data.
5 Conclusion In this paper, CEEMD-WD-LSTM model combining secondary decomposition and deep learning is constructed for carbon emissions short-term prediction. Three data sets were constructed by collecting daily carbon emission monitoring data from China, US and World. Three comparative experiments were designed for empirical analysis. The CEEMD-WD-LSTM model was compared with other six prediction models by error evaluation indexes MAPE, RMSE and R2 , and the following conclusions can be drawn: (1) Compared with BPNN and ELM, the prediction error of LSTM model is smaller. (2) Compared with single model, the prediction performance of single decomposition model is better. (3) Compared with single decomposition model, the prediction accuracy of secondary decomposition model is higher. (4) Compared with VMD and FEEMD, WD is the most suitable method for carbon emission data secondary decomposition. To sum up, CEEMD-WD-LSTM model proposed in this paper can achieve highprecision short-term carbon emission forecasting. This model can not only provide reference and guidance for emission reduction policies and targets, but also enable the government to formulate countermeasures more quickly according to the change trend of carbon emissions, which has strong practical value. Acknowledgement. This paper is supported by Key Project of National Social Science Foundation of China “Research on Platform Enterprise Governance” (Grant No. 21AZD118), the Fundamental Research Funds for the Central Universities (Project No. 2021MS105) and Social Science Fund project of Hebei Province (HB20GL027).
References 1. Yi, T., Qiu, M.H., Liu, J.P.: Multi-perspective influence mechanism analysis and multiscenario prediction of China’s carbon emissions. Int. J. Glob. Warm. 20(1), 61–79 (2020) 2. Li, Y.: Forecasting Chinese carbon emissions based on a novel time series prediction method. Energy Sci. Eng. 8(7), 2274–2285 (2020) 3. Zhou, W.H., Zeng, B., Liu, X.Z.: Forecasting Chinese carbon emissions using a novel grey rolling prediction model. Chaos Solitons Fract. 147 (2021) 4. Wen, L., Yuan, X.: Forecasting CO2 emissions in Chinas commercial department, through BP neural network based on random forest and PSO. Sci. Total Environ. 718 (2020)
A Hybrid Daily Carbon Emission Prediction Model
571
5. Wei, S.W., Wang, T., Li, Y.B.: Influencing factors and prediction of carbon dioxide emissions using factor analysis and optimized least squares support vector machine. Environ. Eng. Res. 22(2), 175–185 (2017) 6. Sun, W., Sun, J.Y.: Prediction of carbon dioxide emissions based on principal component analysis with regularized extreme learning machine: the case of China. Environ. Eng. Res. 22(3), 302–311 (2017) 7. Huang, Y., Shen, L., Liu, H.: Grey relational analysis, principal component analysis and forecasting of carbon emissions based on long short-term memory in China. J. Clean. Prod. 209, 415–423 (2018) 8. Chen, H., Qi, S., Tan, X.: Decomposition and prediction of China’s carbon emission intensity towards carbon neutrality: from perspectives of national, regional and sectoral level. Sci. Total Environ. 825 (2022) 9. Bokde, N.D., Tranberg, B., Andresen, G.B.: Short-term CO2 emissions forecasting based on decomposition approaches and its impact on electricity market scheduling. Appl. Energy 281 (2020) 10. Kassouri, Y., Bilgili, F., Ku¸skaya, S.: Wavelet-based model of world oil shocks interaction with CO2 emissions in the US. Environ. Sci. Policy 127, 280–292 (2021) 11. Boudraa, A.O., Cexus, J.C.: EMD-based signal filtering. IEEE Trans. Instrum. Meas. 56(6), 2196–2202 (2007) 12. Feldman, M.: Analytical basics of the EMD: two harmonics decomposition. Mech. Syst. Sig. Process. 23(7), 2059–2071 (2009) 13. Wang, W.-C., Chau, K.-W., Xu, D.-M., Chen, X.-Y.: Improving forecasting accuracy of annual runoff time series using ARIMA based on EEMD decomposition. Water Resour. Manag. 29(8), 2655–2675 (2015). https://doi.org/10.1007/s11269-015-0962-6 14. Sun, W., Ren, C.: Short-term prediction of carbon emissions based on the EEMD-PSOBP model. Environ. Sci. Pollut. Res. 28(40), 56580–56594 (2021). https://doi.org/10.1007/s11 356-021-14591-1 15. Zhang, X.Q., Wu, X.L., He, S.Y., Zhao, D.: Precipitation forecast based on CEEMD-LSTM coupled model. Water Supply 21(8), 4641–4657 (2021) 16. Liu, D., Sun, K.: Short-term PM2.5 forecasting based on CEEMD-RF in five cities of China. Environ. Sci. Pollut. Res. 26, 1–14 (2019). https://doi.org/10.1007/s11356-019-06339-9 17. Udaiyakumar, S., Victoire, T.A.A.: Week ahead electricity price forecasting using artificial bee colony optimized extreme learning machine with wavelet decomposition. Tehnicki VjesnikTech. Gazette 28(2), 556–567 (2021) 18. Seo, Y., Kim, S., Singh, V.P.: Daily water level forecasting using wavelet decomposition and artificial intelligence techniques. J. Hydrol. 520, 224–243 (2015) 19. Graves, A., Schmidhuber, J.: Frame wise. Int. Neural Netw. 18(5–6), 602–610 (2005) 20. Greff, K., Srivastava, R.K., Schmidhuber, J.: LSTM: a search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28(10), 2222–2232 (2017)
A Hybrid Carbon Price Prediction Model Based on VMD and ELM Optimized by WOA Xing Zhang1(B) and Wensong Zhang2 1 Sinounited Investment Group Corporation Limited, Beijing 102611, China
[email protected] 2 School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
Abstract. Carbon trading market control and reduce greenhouse gas emissions by market mechanism, which is an important policy tool to achieve carbon peak and carbon neutrality. Carbon price is a core element of carbon market, its accurate prediction is of great significance for carbon market risk management. A novel carbon price prediction model is proposed in this paper. First, the data is decomposed into several intrinsic mode functions (IMF) by variational mode decomposition (VMD). Then, the IMFs are reconstructed according to the sample entropy (SE). The lag order of historical carbon price is determined by Partial auto-correlation function (PACF), and the other pilot carbon price is selected according to pearson correlation (PC), both the chosen historical and other pilot carbon prices are used as the input of prediction model. Finally, extreme learning machine (ELM) optimized by whale optimization algorithm (WOA) is used to predict carbon price. In order to verify the validity of the model, the Hubei carbon price data are used for empirical analysis. The results show that the PC-VMD-WOA-ELM model proposed in this paper is effective and robust, and can predict carbon price more accurately. Keywords: Carbon price prediction · Variational mode decomposition · Whale optimization algorithm · Extreme learning machine · Pearson correlation
1 Introduction Carbon price is a direct reflection of the effectiveness of carbon market mechanism design and management system. Excessive carbon price volatility will bring high risk, which may make carbon market losing its appeal, while low volatility shows that market activity may be too low. Hence, carbon price forecasting mechanism should be established and price regulation measures should be performed in order to improve risk prevention ability of carbon trading market. At present, carbon price prediction has become a hot research topic for scholars at home and abroad. At first, carbon price prediction was mainly based on classical time series prediction methods, such as autoregressive integrated moving average (ARIMA) [1], generalized autoregressive conditional heteroskedasticity (GARCH) [2] and other methods. However, due to the highly non-linear and non-stationary nature of carbon © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 572–588, 2022. https://doi.org/10.1007/978-3-031-13832-4_47
A Hybrid Carbon Price Prediction Model
573
price time series, traditional econometric models are unable to effectively deal with the nonlinear fluctuations. Therefore, scholars began to introduce nonlinear methods. machine learning and artificial intelligence have developed rapidly, such as support vector machine (SVM) [3], back propagation neural network (BPNN) [4], extreme learning machine (ELM) [5] and so on. Carbon price data contains a lot of noise, which may reduce the prediction accuracy. To solve this problem, scholars began to use the multi-frequency decomposition method to analyze the fluctuations of carbon price, effectively filter out the unwanted noise, restore the most real market price changes, and decompose the complex price sequence into more simple and regular modes, which is beneficial to improve the prediction accuracy. Wavelet decomposition [6], Empirical modal decomposition (EMD) [7] and Variational mode decomposition (VMD) [8] are commonly used to decompose multi-scale time series. The above studies are all time series forecasting methods, considering the carbon price time series only. However, carbon price is affected by many factors. Therefore, many scholars began to study the influencing factors of carbon price, mainly including economic development, energy price, policy design, information disclosure, extreme weather, etc. [9]. However, there is rare literature that considers carbon prices in other carbon markets as influencing factor and the input of prediction model. In order to make up the gap of existing research, a novel carbon price prediction model is proposed in this paper, which combines VMD, Whale optimization algorithm (WOA) and ELM. Due to the nonlinear, non-stationary and complex characteristics of carbon price data, direct prediction will lead to large prediction error. The carbon price time series was pre-processed into several intrinsic mode functions (IMF) by VMD, and then, the IMFs are reconstructed according to Sample Entropy (SE). The lag order of historical carbon price is determined by partial autocorrelation function (PACF), and the other pilot carbon price is selected according to Pearson correlation (PC), both the chosen historical and other pilot carbon prices are used as the input of prediction model. At last, ELM optimized by WOA is used to predict each sub-sequences, and the final prediction result is obtained by adding the sub-sequence’s prediction results. The main innovations and contributions of this paper are as follows: (1) In this paper, “decomposition-reconstruction” pre-process method is introduced to filter out the unwanted noise of carbon price. VMD is used for data decomposition and SE is used for sequence reconstruction, and then sub-series with low complexity and strong regularity are obtained. (2) This paper innovatively chooses carbon prices in Guangdong, Beijing, Shenzhen, Shanghai by PC for each reconstruction sub-series of Hubei as input of its prediction model, which is helpful to improve the prediction accuracy. (3) ELM optimized by WOA was innovatively introduced into carbon price prediction, and the superiority of WOA-ELM was verified by comparing with PSO-ELM.
574
X. Zhang and W. Zhang
2 Preliminaries 2.1 Variational Mode Decomposition (VMD) VMD is a new adaptive decomposition method for non-linear and non-stationary signals proposed by Dragomiretskiy et al. in 2014 [10]. The core of this method is to transfer the process of signal decomposition to the variational, and determine the center frequency and bandwidth of each IMF component by searching for the optimal variational solution. The IMF can be expressed as follows: uk (t) = Ak (t) cos(φk (t))
(1)
where, Ak (t) is the instantaneous amplitude of uk (t), and wk (t) is the instantaneous frequency of uk (t): wk (t) = φk (t) = d φ(t)/dt
(2)
VMD is mainly including two processes: constructing variational problem and solving the variational problem [11]. 2.2 Whale Optimization Algorithm (WOA) WOA proposed by Mirjalili and Lewis in 2016 [12], is a new metaheuristic algorithm inspired by the movement of whales in the process of hunting. This algorithm has the advantages of stronger searching ability, faster convergence and fewer adjusting parameters, and is often used to solve various optimization problems [13]. there are three phases in WOA, including encircling prey, bubble attack and food exploration. 2.3 Extreme Learning Machine (ELM) ELM is a single hidden layer feed-forward neural network proposed by Huang et al. (2006) [14], which has the advantages of fast training speed and strong generalization ability. Compared with the traditional BPNN, the input weight and the threshold value of hidden neurons in the learning process of ELM is given randomly. based on the predefined network structure, the optimal output weights can be quickly obtained [15]. The specific procedures of ELM are described as follows: The connection weights both between input layer and hidden layer and between hidden layer and output layer, as well as the hidden layer neuron threshold, are showed in the following: ω = [ωi1 , ωi2 , · · · , ωin ]L×n
(3)
where ω is the connection weights between input layer and hidden layer, n is the input layer neuron number and L is the hidden layer neuron number. β = [βi1 , βi2 , · · · , βim ]L×m
(4)
A Hybrid Carbon Price Prediction Model
575
where β is the connection weights between hidden layer and output layer and m is the output layer neuron number. (5) X = xi1 , xi2 , · · · , xiQ n×Q Y = yi1 , yi2 , · · · , yiQ m×Q where X is the input vector and Y is the corresponding output vector. ⎡ ⎤ g(ω1 x1 + b1 ) g(ω2 x1 + b2 ) · · · g(ωl x1 + bl ) ⎢ ⎥ ⎢ g(ω1 x2 + b1 ) g(ω2 x2 + b2 ) · · · g(ωl x2 + bl ) ⎥ ⎢ ⎥ H =⎢ ⎥ .. .. .. ⎢ ⎥ . . . ⎣ ⎦ g ω1 xQ + b1 g ω2 xQ + b2 · · · g ωl xQ + bl
(6)
(7)
where H is the hidden layer output matrix, b is the bias which is generated randomly in the process of network initialization and g(x) is the activation function of ELM. The ELM model with L hidden nodes is expressed as: L
βi g ωi xj + bi = yj , j = 1, 2, · · · , Q
(8)
i=1
Equation (8) can be simply written as: H β = Y; β = H + Y
(9)
where H + is the Moore-Penrose generalized inverse of matrix H.
−1 β = H T HH T Y
(10)
Thus, the output function of ELM can be obtained:
−1 f (x) = h(x)H T HH T Y
(11)
3 The Proposed Model The carbon price prediction model combining PC, VMD, WOA and ELM was proposed in this paper. The basic frame diagram of the PC-VMD-WOA-ELM model is shown in Fig. 1. The specific content of the system is described below. The first part, the decomposition. The original carbon price data were input into the model, and several IMF components with lower complexity and nonlinearity will be obtained through VMD, including IMF 1 , IMF 2 ,…, IMF n. The second part, the reconstruction. Sample Entropy (SE) is a new method to measure the complexity of time series, which is proposed by Richman and has higher accuracy
576
X. Zhang and W. Zhang
than approximate entropy. Hence, SE is applied to quantify the similarity of IMFs and integrate the sub-sequences with a high degree of similarity, and V 1 , V 2 ,…, V k are obtained. The third part, input selection. The partial autocorrelation function (PACF) is a commonly used method that describes the structural characteristics of stochastic process, which gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags. Hence, PACF is used to determine the lag order of historical carbon price, and the other pilot carbon price is selected by Pearson correlation (PC) for each sequence V k. The fourth part, the prediction. The selected historical and other pilot carbon prices were input into ELM optimized by WOA to obtain the predicted results of each subsequence V k . The final prediction results are obtained by adding the predicted results of each sub-sequence V k .
Fig. 1. Flow chart of the prediction model
A Hybrid Carbon Price Prediction Model
577
4 Data Source and Evaluation Indicators Since 2013, China has launched pilot carbon trading in eight provinces and cities, sorted by carbon trading amount, the top five is Hubei, Guangdong, Beijing, Shenzhen, Shanghai in turn. So, this paper selected the carbon price of the five market for empirical analysis, as shown in Fig. 1. Hubei carbon price, as the largest carbon market in China, is chosen as prediction object, and Guangdong, Beijing, Shenzhen, Shanghai carbon price as its influencing factors. All the data are from the China Carbon Emission Trading Network (http://www.tanpaifang.com/). The period of the data is from April 2, 2014 to March 23, 2021, and each data set contains 1894 samples, which was divided into training set and test set before prediction. The training set is used to train the model and the test set is used to verify the model performance. In this study, the sample number of training set is 1394 and the test set is 500. To quantify the prediction effect, the performance of the model will be evaluated by the commonly used error evaluation indexes, which are the goodness of fit (R2 ), mean absolute percentage error (MAPE) and root mean square error (RMSE), respectively. The smaller MAPE and RMSE, and the closer R2 is to 1, the better prediction performance of the model is. 2 n n 2 2 R =1− y t − yt yt − yt (12) t=1
t=1
1 n yt y t − yt t=1 n 2 1 n RMSE = y t − yt t=1 n
MAPE =
(13)
(14)
where n represents the data amount of test set, yt and y t are the actual and predicted values respectively (Fig. 2).
X. Zhang and W. Zhang
100 90 80 70 60 50 40 30 20 10 0
2014/4/2 2014/5/22 2014/7/11 2014/8/30 2014/10/19 2014/12/8 2015/1/27 2015/3/18 2015/5/7 2015/6/26 2015/8/25 2015/11/12 2016/1/22 2016/4/11 2016/6/20 2016/8/29 2016/11/16 2017/1/26 2017/4/17 2017/6/29 2017/9/7 2017/11/23 2018/2/2 2018/4/24 2018/7/6 2018/9/14 2018/12/3 2019/2/20 2019/5/7 2019/7/17 2019/9/26 2019/12/12 2020/3/9 2020/5/22 2020/8/4 2020/10/21 2020/12/30 2021/3/18
Carbon price / Yuan
578
Hubei
Guangdong
Beijing
Shenzhen
Shanghai
Fig. 2. Daily carbon price data in China
5 Empirical Analysis 5.1 Data Preprocessing Based on VMD (1) Decomposition. The original carbon price sequence is decomposed by VMD, and five sub-sequences and its corresponding spectrum can be obtained, as shown in Fig. 3. Compared with the original carbon price series, the five decomposed sub-sequences are more regular. The five sequences are progressively more complex. IMF 1 is expected to present the main fluctuation of the carbon price, which has the minimal complexity. IMF 5 with the highest complexity contained the spikes and stochastic volatilities.
(2) Reconstruction. The sample entropy of IMFs obtained by VMD are calculated, and the SE results are shown as Fig. 4.
A Hybrid Carbon Price Prediction Model
579
Fig. 3. VMD decomposition series and its corresponding spectrum
0.5
SE
0.4 0.3 0.2 0.1 0 IMF1 IMF2 IMF3 IMF4 IMF5 Fig. 4. The sample entropy of VMD decomposition series
According to SE results, the sub-sequences with high similarity are integrated into a new sequence. The original 5 IMFs were reconstructed to obtain 3 new sequences. The reconstruction results are summarized in Table 1.
Table 1. VMD decomposition series reconstruction New sequences
V1
V2
V3
Original IMFs
IMF 1
IMF 2 , IMF 4
IMF 3 , IMF 5
580
X. Zhang and W. Zhang
(3) Input selection. PACF is used to determine the lag order of historical carbon price, and PACF results can be obtained, as shown in Fig. 5.
Fig. 5. PACF results of VMD reconstruction series
PC is used to select the other pilot carbon price for each reconstruction sequences, the correlation analysis results are shown as Table 2. Where, the asterisk represents the significance level, and * represents p < 0.05, ** represents p < 0.01. According to the results of PACF and PC, both the chosen historical and other pilot carbon prices are used as the input of prediction model, as shown in Table 3. Where, the symbol - means that there is no other pilot carbon price as input for this series. Table 2. Correlation analysis between VMD reconstruction series and other pilot carbon price
V1 V2 V3
Influencing factor
Guangdong
Beijing
Shenzhen
Shanghai
Pearson correlation
0.337**
0.482**
-0.254**
0.383**
Significance (bilateral)
0
0
0
0
Pearson correlation
0
0
-0.002
0.007
Significance (bilateral)
0.998
0.985
0.939
0.772
Pearson correlation
0.001
0.002
0
0.001
Significance (bilateral)
0.983
0.94
0.984
0.956
A Hybrid Carbon Price Prediction Model
581
Table 3. Input of VMD reconstruction series prediction model Series
Prediction model input Historical carbon price
Other pilot carbon price
V1
x t-1
Guangdong, Beijing, Shenzhen, Shanghai
V2
x t-1 , x t-4
-
V3
x t-3 , x t-4
-
5.2 Data Preprocessing Based on EMD (1) Decomposition. Just for comparison, EMD is also used for data decomposition, the decomposition results can be obtained, as shown in Fig. 6. The original carbon price data series is decomposed into eight sub-sequences, and their complexity is incremental. IMF 1 offers a smooth form, while INF 8 depicts the high frequency components.
Fig. 6. EMD decomposition series
(2) Reconstruction. Calculate the sample entropy of IMFs obtained by EMD, and the SE results are shown as Fig. 7.
582
X. Zhang and W. Zhang
0.4 0.2 0
IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 IMF8
SE
0.6
Fig. 7. The sample entropy of EMD decomposition series
According to the result of SE, the original 8 IMFs were reconstructed to obtain 3 new sequences. The reconstruction results are summarized in Table 4.
Table 4. EMD decomposition series reconstruction New sequences
E1
E2
E3
Original IMFs
IMF 1 , IMF 2 , IMF 3
IMF 4 , IMF 5
IMF 6 , IMF 7 , IMF 8
(3) Input selection. PACF results of each reconstruction sequences are obtained, as shown in Fig. 8.
A Hybrid Carbon Price Prediction Model
583
Fig. 8. PACF results of EMD reconstruction series
According to Pearson Correlation, the other pilot carbon price is selected for each reconstruction sequences, the correlation analysis results are shown as Table 5. The asterisk represents the significance level, and * represents p < 0.05, ** represents p < 0.01. According to the results of PACF and PC, both the chosen historical and other pilot carbon prices are used as the input of prediction model, which are summarized in Table 6. Where, the symbol - means that there is no other pilot carbon price as input for this series. Table 5. Correlation analysis between EMD reconstruction series and other pilot carbon price Series
Influencing factor
Guangdong
E1
Pearson correlation
0.333**
0.489**
−0.246**
0.386**
Significance (bilateral)
0.000
0.000
0.000
0.000
Pearson correlation
0.004
−0.046*
Significance (bilateral)
0.870
0.046
Pearson correlation
0.007
−0.014
Significance (bilateral)
0.758
0.541
E2 E3
Beijing
Shenzhen
−0.043
Shanghai
−0.029
0.063
0.218
−0.003
0.000
0.885
0.992
584
X. Zhang and W. Zhang Table 6. Input of EMD reconstruction series prediction model
Series
Prediction model input History carbon price
Other pilot carbon price
E1
x t-1
Guangdong, Beijing, Shenzhen, Shanghai
E2
x t-1 , x t-2
Beijing
E3
x t-1
-
5.3 Comparative Experiments In this paper, ELM optimized by WOA is used as prediction model, the convergence curve of the WOA is shown as Fig. 9. It is obvious that as the times of iterations increased, the fitness curve sloped downward and tended to stabilize at the 50th generation iteration. The best fitness curve is steeper, which shows that WOA has strong search ability and operates ideally when finding the best parameters.
Fig. 9. The convergence curve of WOA.
In order to verify the effectiveness and robustness of the proposed model, the comparative experiments were constructed from four aspects, and eight prediction models are compared. The specific structure is shown in the Fig. 10. Part 1 is used to determine which prediction method is best for carbon price prediction. Part 2 is used to prove whether it is necessary to optimize the parameters of the prediction model, and which optimization algorithm has the best effect. Part 3 is used to prove whether data decomposition is necessary, and which decomposition method is more advantageous. Part 4 is used to prove whether it is necessary to use carbon price data from other carbon markets for carbon price prediction. As the parameters setting may influence the prediction accuracy, it is necessary to specify the models’ parameters and the specifications are shown in Table 7.
A Hybrid Carbon Price Prediction Model
585
Table 7. Parameters of the proposed model and its comparison models Model
Parameters
BP
L = 10; learning rate = 0.0004
SVM
L = 10; γ = 50; σ 2 = 2
ELM
L = 10; g(x) =’sig’;
PSO
N = 30; N _iter = 300, c1 = c2 = 2; w = 1.5; rand = 0.8
WOA
N = 30; N _iter = 300, lb = -10, ub = 10, wmax = 0.9, wmin = 0.4
where, L represents the hidden layer neuron number, γ represents regularization parameter, σ 2 represents kernel parameter, g (x) represents the hidden layer activation function, N represents Initial population size, N _iter is the maximum number of iterations, and c1 and c2 are acceleration factors. W is inertia weight, and rand is generated uniformly in the interval [0, 1]. lb and ub are the lower and upper bounds of variables respectively, wmax and wmin are the maximal and minimal inertia weight respectively. The values of each parameter in Table 7 are repeatedly adjusted through the simulation process to finally obtain a satisfactory value.
Fig. 10. Comparative experimental structure
The prediction results comparison of eight models on test set is shown in Fig. 11. Three error evaluation indexes MAPE, RMSE and R2 are used to compare the performance of eight prediction models. The error bar chart is drawn, as shown in Fig. 12. (1) Experiment 1: Comparison of prediction models
586
X. Zhang and W. Zhang
In experiment 1, the performance of SVM, BP and ELM models without data pre-treatment was compared. As shown in Figs. 11 and 12, it can be seen that ELM model has the best performance. Compared with SVM and BP, The MAPE and RMSE of ELM model are the minimum, and R2 are the maximum. Hence, the learning ability of ELM model is stronger than that of BP and SVM.
Fig. 11. Test set prediction results comparison
Fig. 12. Model error evaluation results
(2) Experiment 2: Is optimization necessary? In experiment 2, ELM was compared with PSO-ELM and WOA-ELM. It can be seen from Figs. 11 and 12 that, compared with ELM, the MAPE, RMSE and R2 of PSO-ELM are improved by 5.13%, 3.52% and 7.27% respectively, the MAPE, RMSE and R2 of WOA-ELM are improved by 9.7%, 9.51% and 14.25% respectively, which improve that optimization is necessary for improving the prediction performance of ELM model. Further, WOA-ELM perform better than PSO-ELM. (3) Experiment 3: Is decomposition necessary? In experiment 3, WOA-ELM was compared with EMD-WOA-ELM and VMDWOA-ELM. It can be seen from Figs. 11 and 12 that, compared with WOA-ELM,
A Hybrid Carbon Price Prediction Model
587
the MAPE, RMSE and R2 of EMD-WOA-ELM are improved by 10%, 13% and 23% respectively, the MAPE, RMSE and R2 of VMD-WOA-ELM are improved by 13.55%, 24.6% and 33% respectively, which improve that data decomposition is necessary and effective for complex carbon price prediction. Further, VMD-WOAELM perform better than EMD-PSO-ELM. (4) Experiment 4: Is other pilot carbon price helpful? In this experiment, VMD-WOA-ELM was compared with PC-VMD-WOAELM. It can be seen from Figs. 11 and 12 that, compared with VMD-WOAELM, the MAPE, RMSE and R2 of PC-VMD-WOA-ELM are improved by 1.31%, 17.49% and 7.03%, This indicates that other pilot carbon price is helpful for improving carbon price prediction accuracy.
6 Conclusion In this paper, PC-VMD-WOA-ELM model combining VMD and WOA-ELM considering other pilot carbon price is constructed for carbon price short-term prediction. Four comparative experiments were designed for empirical analysis. The PC-VMDWOA-ELM model was compared with other seven prediction models by error evaluation indexes MAPE, RMSE and R2 , and the following conclusions can be drawn: (1) Compared with BP and SVM, the prediction error of ELM model is smaller. (2) Optimization is necessary for improving the prediction performance of ELM model, and WOA performs better than PSO. (3) Decomposition is necessary and effective for complex carbon price prediction, and VMD performs better than EMD. (4) Other pilot carbon price is beneficial for improving carbon price prediction accuracy. However, the external influencing factors of carbon price are not considered in this paper. The future work of this paper is to introduce such factors as policy system, macro economy, energy price and weather into the carbon price prediction model. Acknowledgement. This paper is supported by Key Project of National Social Science Foundation of China “Research on Platform Enterprise Governance” (Grant No. 21AZD118), the Fundamental Research Funds for the Central Universities (Project No. 2021MS105) and Social Science Fund project of Hebei Province (HB20GL027).
References 1. Zhu, B., Wei, Y.: Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology. Omega 41(3), 517–524 (2013) 2. Huang, Y.M., Dai, X.Y., Wang, Q.W., Zhou, D.Q.: A hybrid model for carbon price forecasting using GARCH and long short-term memory network. Appl. Energy 285 (2021) 3. Zhu, B., Chevallier, J.: Carbon price forecasting with a hybrid ARIMA and least squares support vector machines methodology. Omega 41(3), 517–524 (2017)
588
X. Zhang and W. Zhang
4. Sun, W., Huang, C.C.: A carbon price prediction model based on secondary decomposition algorithm and optimized back propagation neural network. J. Clean. Prod. 243 (2020) 5. Zhang, X., Zhang, C.C., Wei, Z.Q.: Carbon price forecasting based on multi-resolution singular value decomposition and extreme learning machine optimized by the moth-flame optimization algorithm considering energy and economic factors. Energies 12(22) (2019) 6. Liu, H., Shen, L.: Forecasting carbon price using empirical wavelet transform and gated recurrent unit neural network. Carbon Manag. 11(1), 25–37 (2019) 7. Zhu, B.Z.: A novel multiscale ensemble carbon price prediction model integrating empirical mode decomposition, genetic algorithm and artificial neural network. Energies 5(2), 355–370 (2012) 8. Sun, G.Q., Chen, T., Wei, Z.N., Sun, Y.H., Zang, H.X., Chen, S.: A Carbon Price Forecasting Model Based on Variational Mode Decomposition and Spiking Neural Networks. Energies 9(1) (2016) 9. Sun, W., Wang, Y.W.: Factor analysis and carbon price prediction based on empirical mode decomposition and least squares support vector machine optimized by improved particle swarm optimization. Carbon Manag. 11(3), 315–329 (2020) 10. Dragomiretskiy, K., Zosso, D.: Variational mode decomposition. IEEE Trans. Sig. Process. 62, 531–544 (2014) 11. Sun, G., Chen, T., Wei, Z., et al.: A carbon price forecasting model based on variational mode decomposition and spiking neural networks. Energies 9(1), 54 (2016) 12. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016) 13. Sun, W., Zhang, C.: Analysis and forecasting of the carbon price using multi dresolution singular value decomposition and extreme learning machine optimized by adaptive whale optimization algorithm. Appl. Energy 231, 1354–1371 (2018) 14. Huang, G.-B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006) 15. Ding, S., Zhao, H., Zhang, Y., Xu, X., Nie, R.: Extreme learning machine: algorithm, theory and applications. Artif. Intell. Rev. 44(1), 103–115 (2013). https://doi.org/10.1007/s10462013-9405-z
A Comparative Study of Autoregressive and Neural Network Models: Forecasting the GARCH Process Firuz Kamalov1(B) , Ikhlaas Gurrib1 , Sherif Moussa1 , and Amril Nazir2 1 Canadian University Dubai, Dubai, United Arab Emirates
[email protected] 2 Zayed University, Abu Dhabi, United Arab Emirates
Abstract. The Covid-19 pandemic has highlighted the importance of forecasting in managing public health. The two of the most commonly used approaches for time series forecasting methods are autoregressive (AR) and deep learning models (DL). While there exist a number of studies comparing the performance of AR and DL models in specific domains, there is no work that analyzes the two approaches in the general context of theoretically simulated time series. To fill the gap in the literature, we conduct an empirical study using different configurations of generalized autoregressive conditionally heteroskedastic (GARCH) time series. The results show that DL models can achieve a significant degree of accuracy in fitting and forecasting AR-GARCH time series. In particular, DL models outperform the AR-based models over a range of parameter values. However, the results are not consistent and depend on a number of factors including the DL architecture, AR-GARCH configuration, and parameter values. The study demonstrates that DL models can be an effective alternative to AR-based models in time series forecasting. Keywords: Time series forecasting · Neural networks · Deep learning · ARIMA · GARCH
1 Introduction Time series analysis is the bedrock of pandemic forecasting. There exist two major approaches to time series forecasting: autoregression and neural networks. Autoregression is the traditional approach based on joint probability distribution models describing the underlying stochastic processes that govern the time series. It has been used in a wide range of applications including pandemic forecasting [11, 18]. One of the popular autoregressive models is the generalized autoregressive conditionally heteroskedastic (GARCH) model. Together with the basic autoregressive (AR) model, the AR-GARCH model forms the backbone of many time series models in a wide variety of applications. More recently, neural networks have emerged as a viable alternative to AR-GARCH models. In particular, multi-layer perceptron (MLP) is a neural network design that is capable of fitting a variety of nonlinear patterns [1, 6]. A number of authors compared © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 589–603, 2022. https://doi.org/10.1007/978-3-031-13832-4_48
590
F. Kamalov et al.
the performance of AR-based models against MLP in different applications. However, there does not exist any study based on theoretically simulated data. In this paper, we attempt to fill this gap in the literature by comparing the performance of AR-GARCH and MLP models based on simulated AR-GARCH time series. The AR-GARCH model is a widely used method in time series analysis. It is employed in a range of applications including finance, meteorology, epidemiology, energy, and others. The model has a well-understood mathematical structure and is implemented in a number of statistical packages making it easy to use. However, the AR-GARCH model possesses two significant drawbacks: relatively narrow usage and intricate order selection. The theoretical assumptions underlying the AR-GARCH model apply to a relatively restricted class of time series. To utilize the AR-GARCH model, the current value of the time series must be linearly dependent on its past values together with a conditionally heteroskedastic noise term. The model may be applied even if the theoretical assumptions are not strictly satisfied but with diminished efficacy. Selecting the correct order of the AR-GARCH model can also pose a challenge. Although there exist various heuristics for selecting the order of the AR-GARCH model none of the approaches can guarantee success. The narrow application and involved order selection process have prompted researchers to look for alternative forecasting approaches such as neural networks. The availability of large amounts of training data together with increased computing power over the last decade have drastically improved the performance of neural networks. The success of neural networks in other fields motivated their application in time series analysis [9, 10]. In particular, MLPs provide a simple yet powerful approach to modeling a variety of linear and non- linear phenomena. Unlike the AR-GARCH model, MLP does not require any assumptions about the data such as linearly lagged dependencies or conditional variance. MLP has the capacity to learn from data without any predetermined theoretical restrictions. Popular deep learning libraries such as Keras and Pytorch have made MLP-based time series analysis as easy as the traditional autoregression. As a result, MLP offers a viable alternative to the traditional AR-based time series modeling. Although MLP has the ability to learn complex nonlinear patterns, the resulting model remains behind the scenes. Due to the large number of parameters and activation functions in a neural network, the resulting model is nearly impossible to interpret by a human expert. Given the advantages and disadvantage of MLP and AR-GARCH models, the choice of the right model for time series analysis may not always be clear. In our paper, we compare the performance of the two approaches based on simulated data. Our experiments consist of two main parts: 1. generate sample data based on the AR-GARCH stochastic process, 2. fit AR-GARCH and MLP models to the sample time series. Note that the AR-GARCH statistical model is designed to fit time series generated via AR- GARCH stochastic process. Therefore, the comparison between MLP and ARGARCH models based on the time series generated via AR-GARCH stochastic process is not entirely fair with respect to the MLP model. Nevertheless, the results of the numerical experiments show that MLP is capable of achieving a high degree of accuracy. In the
A Comparative Study of Autoregressive and Neural Network Models
591
experiments, we consider three different MLP models and three different autoregressive models. The models are tested under different configurations of the AR-GARCH time series and a range of parameter values (Table 2). We find that MLP models are capable of fitting and forecasting AR-GARCH processes with significant accuracy. However, the results are not consistent and depend on a number of factors including the MLP architecture, AR-GARCH process, and parameter values. Our study contributes to the current state-of-the-art by demonstrating the efficacy of neural networks, and MLP in particular, in modeling AR-GARCH stochastic processes. In the future, the scope of the experiments can be extended to include other configurations of the AR-GARCH stochastic process and the selection the optimal MLP architecture for a given autoregressive stochastic process can be further investigated. Our paper is structured as follows. In Sect. 2, we provide a brief overview of the existing literature. Section 3 contains the descriptions of the autoregressive and neural network models. In Sect. 4, we present the details and results of the numerical experiments. Section 5 concludes the paper with final observations.
2 Literature Review There exist a number of studies in the literature that compare the performance of autoregression and neural networks. However, our work differs from the existing studies in two important ways. First, while the majority of the studies in the literature evaluate the autoregressive integrated moving average (ARIMA) model, our study focuses specifically on the AR model. Second, while the majority of studies are related to a particular field of application, we study the general case for linearly lagged time series. One of the earlier comparisons between ARIMA and neural network (NN) models was done in [8]. The authors studied repairable system failure forecasting and showed that the ARIMA model outperformed the NN model. The authors in [12] studied the performance of AR and NN models on linearly lagged time series. In [5], the authors compared the performance of ARIMA and NN in forecasting the wind-based energy production. The authors found that ARIMA outperformed ANN. The authors in [24] studied traffic flow prediction. Since traffic flow can be volatile and irregular, the authors compared ARIMA-GARCH model with the wavelet neural network. A number of studies were done in price forecasting. A comparison of ARIMA and NN models in copper price forecasting showed that the NN models provide better performance [14]. In [21], the authors consider forecasting the price of crude oil and learn that ARIMA model yields more accurate results than NN. In [15], the authors compare the ARIMA and NN models in forecasting the price of gold and discovered that the NN model provides a better performance. The difference between the models was evaluated even in the case of horticultural products [22]. The authors learned that the ARIMA model is good for average monthly forecasting, while the NN model is more suitable for predicting daily, weekly, and monthly trend of price fluctuations. A number of authors have also compared the performance of ARIMA and long shortterm memory (LSTM) network. The performance of ARIMA and LSTM was compared in [2] to predict cellular network traffic. Although LSTM model showed better performance, ARIMA was close behind with a less complex model. The authors in [23]
592
F. Kamalov et al.
predicted cash flow using ARIMA and a number of machine learning models including LSTM. More recently, the authors in [17] compared the performance of ARIMA and LSTM models in Covid-19 forecasting. The results are mixed depending on the country. Another Covid-19 comparative study between ARIMA and LSTM models was performed in [13]. The authors forecasted the spread of the Covid-19 infection in India, Italy, Japan and the UK with mixed results. Similarly, in [4] the authors compared ARIMA, LSTM, S-LSTM and Prophet model and found that S-LSTM achieved the lowest error. On the other hand, the authors in [19] found that LSTM and AR provide similar forecasting performance.
3 Model Descriptions 3.1 AR-GARCH Models An AR-GARCH model consists of two parts: AR and GARCH. An AR process xt is a linear stochastic process, where the current value of xt depends on a linear combination of its previous values together with random noise. In particular, an AR process of order p is given by the equation xt = μ +
p
φxt−i + wt ,
(1)
i=1
where μ is the mean, φi are constant coefficients, and wt ∼ N (0, σw2 ) is white noise. Note that an AR process has a constant conditional variance, i.e. p φxt−1 + wt |xt−1 , xt−2 , · · · = σw2 (2) Var(xt |xt−1 , xt−2 , . . . ) = Var i=1
However, in many applications such as finance the time series variance is not constant and depends on the prior values of the series. To accommodate non-constant variance in time series models, Eq. 1 is modified by replacing the white noise term wt with heteroskedastic noise term rt given by the equation rt = σt ε, where εt ∼ N (0, 1), σt2 = α0 +
s
2 αi rt−i +
i=1
q
2 βi σt−i
(3)
i=1
Thus, the complete AR(p)-GARCH (s, q) model is given by the equation xt = μ +
p
φi xt−i + rt ,
(4)
i=1
where rt is described in Eq. 3. To determine the values of the model coefficients in Eq. 4, the principal of maximum likelihood estimation is applied. Note that s 2 2 αi rt−i + βi σt−i ) rt | rt−1 , · · · , rt−u , σt−1 , · · · , σt−q ∼ N (0, α0 + q
i=1
i=1
(5)
A Comparative Study of Autoregressive and Neural Network Models
593
Using the Bayes theorem the likelihood function can be expressed as a product of conditional probabilities given by Eq. 5. Since the conditional probabilities are normal, the likelihood function is a product of exponentials. Then standard calculus techniques can be applied to find the optimal value of the logarithmic transform of the likelihood function. Further information about AR and GARCH models can be found in [7]. 3.2 MLP Models Multi-layer perceptron also known as feedforward neural network - is one of the most popular algorithms in machine learning. Its design is loosely based on the structure of neurons inside a biological brain. An MLP consists of several layers: the input layer, hidden layers, and the output layer. An example of an MLP architecture with two hidden layers is presented in Fig. 1. In this example, the input layer consists of 3 nodes which correspond to input features. The first and second hidden layers contain 3 and 2 nodes, respectively. Although there is no rule regarding the choice of MLP architecture, it is often constructed in decreasing width. The hidden layers are designed to process information at different levels of abstractness - from general to specific. The final layer represents the predicted value of the target variable. An MLP is a versatile learning model that is capable of approximating any continuous function with desired accuracy. Consequently, it is capable of learning complex nonlinear patterns that exist within data. The optimal parameters of an MLP model are obtained by minimizing the difference between the true and predicted values of the time series. The optimization procedure is carried out via a backpropagation algorithm whereby the partial derivatives are calculated from the output to input layer. Further details about MLP can be found in [16].
Fig. 1. An example of MLP design with two hidden layers.
In our experiments, we employ three different MLP designs. The architectures of the MLP models are presented in Table 1. The three MLP models contain of 1, 2, and 3 hidden layers. In MLP 2 and MLP 3, each hidden layer is followed by a Dropout layer which is used as a regularizer to prevent overfitting. We use the adaptive moment estimation (ADAM) to perform gradient descent with respect to the loss function. In theory, MLPs can approximate any continuous function including AR-GARCH models. Although AR-GARCH is a relatively complex model, its best predictor is a 2
linear function. In general, given X , the best predictor that minimizes E Y − Yˆ
594
F. Kamalov et al.
is E[Y |X ]. In the case of AR-GARCH process, the best predictor that minimizes the squared error is
xt = E xt−1 , xt−2 , · · · =
p
φi xt−i
(6)
i
Table 1. Details of the MLP models: number of nodes in each hidden layer (HL), dropout regularization, and optimization algorithm. HL 1
HL 2
HL 3
Dropout rate
Optimizer
MLP 1
4
-
-
0
Adam
MLP 2
4
2
-
0.2
Adam
MLP 3
8
4
2
0.2
Adam
So an MLP only needs to learn a linear function in order to optimally forecast the values of an AR-GARCH series. However, in practice, the stochastic component of the AR-GARCH model complicates the learning process. Nevertheless, as demonstrated by the numerical experiments, MLP models are capable of fitting and forecasting ARGARCH time series with great accuracy. Indeed, our study shows that even a simple MLP with 1 or 2 hidden layers can effectively model AR(2)-GARCH process which illustrates the power of neural networks.
4 Numerical Experiments In this section, we present the results of numerical experiments designed to test the ability of MLP to model AR-GARCH time series. To this end, we consider various configurations of the AR- GARCH process as shown in Table 2. The parameter values of AR-GARCH series are bounded in the interval (−1, 1) due to the theoretical properties of the random process. In particular, the expected value of the AR-GARCH series diverges if the parameter values are outside the interval. For our experiments, we choose a range of representative parameter values within the theoretical bounds. The selected values should illustrate the performance of MLP for time series modeling in a more general setting. For each configuration, we generate a sample time series and measure the accuracy of MLP to fit and forecast the series. To benchmark the performance of MLP, we employ autoregressive models AR and AR-GARCH. Although the autoregressive models are designed specifically to deal with AR-GARCH time series, the MLP models produce competitive results. The basic step in each experiment is as follows. Given fixed values of AR-GARCH parameters φi , αj , and βk , we simulate a sample time series of length n = 800. The initial 80% of the series is used to fit the model, i.e. estimate the model coefficients. Then, the trained model is used to make 1-step ahead predictions on the test set. Finally,
A Comparative Study of Autoregressive and Neural Network Models
595
we calculate the mean squared error (MSE) between the actual and forecasted values on the test set. The numerical experiments are implemented in Python using statsmodels package [20] for the autoregressive models and Keras [3] for the MLP models. Due to the large number of experiments all the hyperparameters were kept in the default settings for both statsmodels and Keras. 4.1 Modeling AR(1)-GARCH Process In this subsection, we investigate the performance of MLP models in fitting and forecasting AR(1)-GARCH time series. To illustrate our experiment, consider a sample time series of length n = 800 generated according to the AR(1)-GARCH(1,1) process with values. Table 2. Details of the experimental datasets Model
φ
α
β
AR(1)-GARCH(1,0) AR(1)-GARCH(2,0) AR(1)-GARCH(1,1)
0.1–0.7 0.1–0.7 0.1–0.7
0.5 (0.5, 0.3) 0.5
0 0 0.3
AR(1)-GARCH(1,0) AR(1)-GARCH(2,0) AR(1)-GARCH(1,1)
(0.1–0.7, −0.5) (0.1–0.7, −0.5) (0.1–0.7, −0.5)
0.5 (0.1–0.7, −0.5) 0.5
0 0 0.3
GARCH(1,0) GARCH(2,0) GARCH(1,1)
0 0 0
0.1–0.7 (0.1–0.7, −0.5) 0.5
0 0 0.1–0.7
φ = 0.5, α0 = 1, α1 = 0.5, β1 = 0.3
(7)
We use the first 640 timesteps of the series to train the models. Then use the remaining 160 timesteps to make 1-step ahead forecasts. The test MSE for MLP 1 is 3.46, while the test MSE for the autoregression AR(1)-GARCH(1,1) model is 3.43. It is important to stress that the sample series is simulated according to AR(1)-GARCH(1,1) process so the autoregressive model has inherent advantage over the MLP model in fitting the sample series. Nevertheless, the test MSE for MLP is nearly identical to that of the autoregressive model which illustrates the efficacy of neural networks. The original and fitted values of the sample time series are presented in Fig. 2. As shown in the figure, the two models produce very similar forecasts.
596
F. Kamalov et al.
Fig. 2. The original and fitted values of AR(1)-GARCH(1,1) sample series generated via Eq. 7.
In the main experiment of the subsection, we compare the performances of several MLP and autoregressive models on three different configurations of the AR(1)-GARCH process. The details of parameter values are provided in Table 2. The results of the experiments are presented in Fig. 3. Figure 3a shows the accuracy of the models on sample time series generated with values α0 = 1, α1 = 0.5, β1 = 0
(8)
and a range of values φ1 . The primary reference in Fig. 3a is the graph of AR(1)GARCH(1,0). It can be seen that the performance of the MLP models depends on their architecture. MLP 1 and MLP 2 produce solid results. The MSE for MLP 1 steadily decreases and dips below the benchmark at φ = 0.3. On the other hand, the MSE of MLP 2 steadily increases. MLP 2 outperforms the benchmark over the interval [0.1, 0.6] before losing to the benchmark at φ = 0.7. MLP 3 shows poor performance for all values of φ. The performance of the models on AR(1)-GARCH(2,0) process is similar to the performance on AR(1)-GARCH(1,0) process. Figure 3b shows the accuracy of the models on sample time series generated with values α0 = 1, α1 = 0.5, α2 = 0.3, β1 = 0
(9)
and a range of values φ1 . The primary reference in this case is the graph of AR(1)GARCH(2,0). It can be seen that MLP 1 and MLP 2 produce solid results while MLP 3 produces poor results. MLP 1 outperforms the benchmark over the interval [0.2, 0.7], while MLP 2 outperforms the benchmark over the interval [0.1, 0.5]. Finally, the performance of the models on AR(1)-GARCH(1,1) process is similar to the previous two cases. Figure 3c shows the accuracy of the models on sample time series generated with values and a range of values φ1 . α0 = 1, α1 = 0.5, β1 = 0.3
(10)
As before, MLP 1 and MLP 2 perform well, while MLP 3 produces poor results.
A Comparative Study of Autoregressive and Neural Network Models
597
The numerical experiments on AR(1)-GARCH data show that simple MLP models are capable of producing accurate time series forecasts. In particular, MLP 1 and MLP 2 models outperform the benchmark autoregressive models over a range of values φ1 . We also observe that more complex neural networks such as MLP 3 struggle to correctly fit the AR(1)-GARCH. One possible explanation is overfitting. Since MLP 3 has a large number of parameters, it may overfit the relatively simple structure of AR(1) model. It is interesting to note that MLP 1 and MLP 2 produce opposite MSE graphs with respect to φ1 . We hypothesize that a combination of MLP 1 and MLP 2 may produce a better overall model.
(a) Results based on AR(1)-GARCH(1,0) process.
(b) Results based on AR(1)-GARCH(2,0) process.
(c) Results based on AR(1)-GARCH(1,1) process Fig. 3. Model error rates in forecasting different configurations of AR(1)-GARCH time series.
598
F. Kamalov et al.
4.2 Modeling AR(2)-GARCH Process In this subsection, we consider AR(2)-GARCH time series. The AR(2) model is more involved than AR(1) so it would be informative to study the accuracy of MLP in this case. To illustrate our experiment, consider a sample time series of length n = 800 generated according to the AR(2)-GARCH(1,1) process with values φ1 = 0.5, φ2 = −0.5, α0 = 1, α1 = 0.5, β1 = 0.3
(11)
The original and fitted values of the sample time series are presented in Fig. 4. As shown in the figure, the two models produce very similar forecasts. The test MSE for MLP 1 is 3.41, while the test MSE for AR(2)-GARCH(1,1) model is 3.33. We emphasize again that the autoregressive model has an inherent advantage over the MLP model in this case. Nevertheless, the test MSE for MLP is close to that of the autoregressive model which demonstrates the efficacy of neural networks. In our experiments, we compare the performances of several MLP and autoregressive models on three different configurations of the AR(2)-GARCH process. The details of parameter values are shown in Table 2. The results of the experiments are presented in Fig. 5. The main difference between the results based on AR(2)-GARCH and AR(1)GARCH processes is the poor performance of MLP 1 and improved performance of MLP 3.
Fig. 4. The original and fitted values of AR(1)-GARCH(1,1) sample series generated via Eq. 11.
On the other hand, MLP 2 produces good performance particularly for larger values of φ. Figure 5a shows the accuracy of the models on sample time series generated with values α0 = 1, α1 = 0.5, β1 = 0, φ2 = −0.5
(12)
and a range of values φ1 . The primary reference in Fig. 5a is the graph of AR(2)GARCH(1,0). It can be seen that the accuracy of the MLP models improves as the value of φ1 increases. MLP 2 yields the smallest MSE among the three neural networks. In particular, MLP 2 outperforms the benchmarks over the interval [0.4, 0.8]. Unlike the
A Comparative Study of Autoregressive and Neural Network Models
599
experiments based on AR(1)-GARCH process, MLP 1 does not do well in fitting the AR(2)-GARCH process. The performance of the models on AR(1)-GARCH(2,0) process is relatively better than the performance on AR(2)-GARCH(1,0). Figure 5b shows the accuracy of the models on sample time series generated with values α0 = 1, α1 = 0.5, α2 = 0.3, β1 = 0, φ2 = −0.5
(13)
and a range of values φ1 . The primary reference in this case is the graph of AR(2)GARCH(2,0). The performance of all three MLP models is improved. In particular, MLP 2 provides better accuracy than the benchmarks over the interval [0.2, 0.8], while MLP 1 and MLP 3 show better accuracy for higher values of φ1 . Finally, the performance of the models on AR(2)-GARCH(1,1) process is similar to the previous two cases. Figure 5c shows the accuracy of the models on sample time series generated with values and a range of values φ1 . As before, MLP 2 perform well, while MLP 1 produces poor results. α0 = 1, α1 = 0.5, β1 = 0.3, φ2 = −0.5
(14)
The results of the experiments on AR(2)-GARCH series are somewhat different than AR(1)- GARCH. The biggest difference is the poor performance of MLP 1 which can be attributed to underfitting. On the other hand, MLP 2 performed well producing overall better accuracy than the benchmark autoregressive models. It appears that the 2-hidden layer architecture is well suited to fit AR(2)-GARCH series. We also observe that the accuracy of all three MLP models was better for higher values of φ1 . 4.3 Modeling GARCH Process In our last case study, we consider pure GARCH models without the AR component. The details of the GARCH processes used in the experiment are provided in Table 2. As above, we compare the performance of MLP models against the benchmark autoregressive models. The results of the experiments show that the performances of the models are relatively same in each case. The most significant observation is the performance of MLP 1 which beats the autoregressive benchmark models across all values of α and β. The performance of MLP 1 on GARCH series is in line with its performance on AR(1)GARCH series. In both cases the underlying stochastic process has a simple structure which is better suited a simple MLP architecture. It also appears MLP 2 produces solid results in case of GARCH(1,0) and GARCH(2,0). We observe that the accuracy of the models deteriorates with an increase in the values of α and β. It can be explained by the increased volatility resulting from larger values of α and β.
600
F. Kamalov et al.
1.
Results based on AR(2)-GARCH(1,0) process
2.
Results based on AR(2)-GARCH(2,0) process
3.
Results based on AR(2)-GARCH(1,1) process
Fig. 5. Model error rates in forecasting different configurations of AR(2)-GARCH time series.
4.4 Analysis The results of the experiments reveal that MLP models are capable of achieving significant accuracy in fitting and forecasting AR-GARCH time series. However, the results are not consistent across different AR-GARCH processes. In case of AR(1)-GARCH process, MLP 1 and MLP 2 models produce accurate forecasts beating the benchmark autoregressive models over a range of parameter values. Similarly, in case of pure GARCH processes, MLP 1 produced the most accurate forecasts compared to all benchmark models. It can be explained by the fact that the simple structure of MLP 1 with one hidden layer is better suited for the simple structure of AR(1)-GARCH and pure GARCH series. On the other hand, in case of AR(2)-GARCH, MLP 1 showed poor results which could be due to insufficient number of parameters to model the series. MLP 2 provided more consistent results across all the AR-GARCH processes. With 2
A Comparative Study of Autoregressive and Neural Network Models
601
hidden layers, MLP 2 structure is close enough to both AR(1) and AR(2)-based series. It often produced MSE close to if not better than the benchmarks. We summarize our findings below: 1. MLP models are capable of fitting and forecasting AR-GARCH processes with significant accuracy. However, the results are not consistent and depend on a number of factors including the MLP architecture, AR-GARCH process, and parameter values. 2. In general, MLP 3 performed worse than MLP 1 and MLP 2 suggesting that overfitting may be an issue. 3. It must be stressed that the benchmark autoregressive models are designed specifically to fit AR-GARCH time series. Therefore, outperforming the benchmark methods is a nontrivial achievement for MLP models. 4. The experiments show that it is necessary to test several different MLP architectures to find the best model.
5 Conclusion Although there exist a number of studies comparing the efficacy of neural network and autoregressive models in different practical application, there is no study that compares the two approaches in the context of theoretically simulated AR-GARCH sample time series. Thus our manuscript fills a gap in the literature. To carry out our comparison, we performed a number of numerical experiments using different configurations of ARGARCH time series. The results show that MLP models can achieve significant degree of accuracy in fitting and forecasting AR-GARCH time series. However, the results are not consistent and depend on a number of factors including the MLP architecture, AR-GARCH process, and parameter values. Even though AR-GARCH models are tailor made to fit linearly lagged heteroskedastic processes, choosing the correct model order can be very challenging in practice. As alternative we can employ MLP models to fit the time series. The above experiments show that MLP models produce great results compared even to the native AR-GARCH models. The MLP model provides a flexible, nonlinear framework that can fit a variety of situations. The experiments show that MLP models may be susceptible to overfitting. Further numerical experiments can help shed more light on the best choice of MLP architecture for particular AR-GARCH time series. In addition, future work should include other neural network varieties such recurrent neural networks and long short-term memory models in the context of AR-GARCH time series. Despite some limitations of the present study, we believe that it provides a valuable insight about the performance of MLP models in time series forecasting.
References 1. Absar, N., Uddin, N., Khandaker, M.U., Ullah, H.: The efficacy of deep learning based LSTM model in forecasting the outbreak of contagious diseases. Infect. Disease Model. 7(1), 170– 183 (2022)
602
F. Kamalov et al.
2. Azari, A., Papapetrou, P., Denic, S., Peters, G.: Cellular traffic prediction and classification: a comparative evaluation of LSTM and ARIMA. In: Kralj Novak, P., Šmuc, T., Džeroski, S. (eds.) DS 2019. LNCS (LNAI), vol. 11828, pp. 129–144. Springer, Cham (2019). https://doi. org/10.1007/978-3-030-33778-0_11 3. Chollet, F., et al.: Keras (2015). https://keras.io 4. Devraj, J., et al.: Forecasting of COVID-19 cases using deep learning models: is it reliable and practically significant? Results Phys. 21, 103817 (2021) 5. Dmitru, C.D., Gligor, A.: Wind energy forecasting: a comparative study between a stochastic model (ARIMA) and a model based on neural network (FFANN). Procedia Manuf. 32, 410– 417 (2019) 6. Elsheikh, A.H., et al.: Deep learning-based forecasting model for COVID-19 outbreak in Saudi Arabia. Process Saf. Environ. Protect. (2021) 7. Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (2020) 8. Ho, S.L., Xie, M., Goh, T.N.: A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Comput. Ind. Eng. 42(2–4), 371–375 (2002) 9. Kamalov, F.: Forecasting significant stock price changes using neural networks. Neural Comput. Appl. 32(23), 17655–17667 (2020). https://doi.org/10.1007/s00521-020-04942-3 10. Kamalov, F., Smail, L., Gurrib, I.: Forecasting with deep learning: S&P 500 index. In: 2020 13th International Symposium on Computational Intelligence and Design (ISCID), pp. 422– 425. IEEE, December 2020 11. Kamalov, F., Thabtah, F.: Forecasting Covid-19: SARMA-ARCH approach. Heal. Technol. 11(5), 1139–1148 (2021). https://doi.org/10.1007/s12553-021-00587-x 12. Kamalov, F., Gurrib, I., Thabtah, F.: Autoregressive and neural network models: a comparative study with linearly lagged series. In: 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), pp. 175–180. IEEE, September 2021 13. Kumar, M., Gupta, S., Kumar, K., Sachdeva, M.: Spreading of COVID-19 in India, Italy, Japan, Spain, UK, US: a prediction using ARIMA and LSTM model. Digit. Gov. Res. Pract. 1(4), 1–9 (2020) 14. Lasheras, F.S., de Cos Juez, F.J., Sánchez, A.S., Krzemie´n, A., Ferna´ndez, P.R.: Forecasting the COMEX copper spot price by means of neural networks and ARIMA models. Resour. Policy 45, 37–43 (2015) 15. Mombeini, H., Yazdani-Chamzini, A.: Modeling gold price via artificial neural network. J. Econ. Bus. Manag. 3(7), 699–703 (2015) 16. Nielsen, M.A.: Neural Networks and Deep Learning, vol. 25. Determination Press, San Francisco (2015) 17. Papastefanopoulos, V., Linardatos, P., Kotsiantis, S.: Covid-19: a comparison of time series methods to forecast percentage of active cases per population. Appl. Sci. 10(11), 3880 (2020) 18. Rajab, K., Kamalov, F., Cherukuri, A.K.: Forecasting COVID-19: vector autoregression-based model. Arab. J. Sci. Eng. 47, 1–10 (2022). https://doi.org/10.1007/s13369-021-06526-2 19. Rguibi, M.A., Moussa, N., Madani, A., Aaroud, A., Zine-dine, K.: Forecasting Covid-19 transmission with ARIMA and LSTM techniques in Morocco. SN Comput. Sci. 3(2), 1–14 (2022). https://doi.org/10.1007/s42979-022-01019-x 20. Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference, vol. 57, p. 61, June 2010 21. Sharma, S., Yadav, M.: Analyzing the robustness of ARIMA and neural networks as a predictive model of crude oil prices. Theor. Appl. Econ. 27(2(623), Summer), 289–300 (2020) 22. Weng, Y., Wang, X., Hua, J., Wang, H., Kang, M., Wang, F.Y.: Forecasting horticultural products price using ARIMA model and neural network based on a large-scale data set collected by web crawler. IEEE Trans. Comput. Soc. Syst. 6(3), 547–553 (2019)
A Comparative Study of Autoregressive and Neural Network Models
603
23. Weytjens, H., Lohmann, E., Kleinsteuber, M.: Cash flow prediction: MLP and LSTM compared to ARIMA and Prophet. Electron. Commer. Res. 21(2), 371–391 (2019). https://doi. org/10.1007/s10660-019-09362-7 24. Yao, R., Zhang, W., Zhang, L.: Hybrid methods for short-term traffic flow prediction based on ARIMA-GARCH model and wavelet neural network. J. Transp. Eng. Part A Syst. 146(8), 04020086 (2020)
A Novel DCT-Based Video Steganography Algorithm for HEVC Si Liu, Yunxia Liu(B) , Cong Feng, and Hongguo Zhao College of Information Science and Technology, Zhengzhou Normal University, Zhengzhou, China [email protected]
Abstract. With the increasing preference for information exchange in the form of video, steganography technology based on video will have huge application requirements in military communications, industrial Internet and other fields. High Efficiency Video Coding (HEVC) has received extensive attention for its high compression efficiency since its release in 2013, and after recent years of development, it has been applied to many products, so it is of great significance to research the video steganography algorithm based on the HEVC coding standard. This paper presents a HEVC video steganography method based on intra-frame QDCT coefficient. The secret information is embedded into the coupling coefficient of the selected 8 × 8 luminance QDST blocks to avert the intra-frame distortion drift. And the (7, 4) Hamming code is used to reduce the modification of these embedded blocks. The experimental results show that the proposed algorithm can effectively increase the visual quality and get good embedding capacity. Keywords: HEVC · Video steganography · QDCT · Hamming code · Coupling coefficient
1 Introduction Steganography is a technique and science about information hiding [1]. The so-called information hiding refers to preventing anyone other than the intended recipient from knowing the transmission events or the content of the secret information. Text, audio, image and video are common carriers of steganography, and steganography algorithms corresponding to different types of carriers have different characteristics and application scenarios. Among them, the notable features of video are the large amount of data and more diverse content, such as video calls, video conferences, live broadcasts and other forms have been favored by people. As of June 2021, the number of online video (including short video) users in China reached 944 million, accounting for 93.4% of the total netizens. It can be seen that information exchange through video has become increasingly popular, and the spread of video is getting wider and wider. Therefore, using video as a carrier to research steganography technology has important application value. The huge data volume of the original video brings a lot of inconvenience to storage and transmission, so the video usually needs to be compressed [2]. HEVC/H.265 (High © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 604–614, 2022. https://doi.org/10.1007/978-3-031-13832-4_49
A Novel DCT-Based Video Steganography Algorithm for HEVC
605
Efficiency Video Coding) [3] was released in 2013. Compared with the encoding standard H.264/AVC (Advanced Video Coding) released by the previous generation, HEVC can save about half of the code stream and can be encoded Ultra HD video up to 8K resolution, so it has received more and more attention from researchers and developers in recent years, and has been applied in many products. And with people’s pursuit of faster transmission and higher-definition video, HEVC will be more widely used. Therefore, it is of great practical significance to research the video steganography algorithm in the compressed domain based on HEVC. Many modules in the HEVC video compression coding process can be used as the carrier of video steganography, such as intra prediction, motion estimation, DCT/DST transform, etc. [4, 5]. For the intra-frame QDCT/QDST (Quantized DCT/DST) transform of the HEVC standard, some literatures have proposed corresponding video steganography algorithms [6–15]. Chang et al. [6] proposed a DCT/DST-based error propagation-free data hiding algorithm for HEVC intra-coded frames. Zhao et al. [13] proposed a novel and efficient video steganography method based on transform block decision for HEVC. In order to improve the visual quality of carrier video, the embedding error of data hiding is analyzed with modifying partitioning parameters of CB, PB and TB, and the transform block decision is modified to embed secret message and update corresponding residuals synchronously. In order to limit embedding error, an efficient embedding mapping rule is utilized which can embed N (N > 1) bits message and at most modify one bit transform partitioning flag. Liu et al. [15] proposed a HEVC video steganography method based on QDST coefficient. The secret message is embedded into the multi-coefficients of the selected 4 × 4 luminance QDST blocks to avert the intra-frame distortion drift. And the matrix encoding technology is used to reduce the modification of these embedded blocks. In this paper, two sets of coupling coefficients are used to avert the intra-frame distortion drift in 8 × 8 luminance QDCT blocks. In order to further improve the visual quality of the proposed algorithm, the (7, 4) Hamming code is used to reduce the modification of these embedded blocks. Experimental results show that the proposed algorithm has both good visual quality and high embedding capacity. The rest of the paper is organized as follows. Section 2 describes the theoretical framework of the proposed algorithm. Section 3 describes the proposed algorithm. Experimental results are presented in Sect. 4 and conclusions are in Sect. 5.
2 Theoretical Framework 2.1 Intra-frame Prediction HEVC uses transform coding of the prediction error residual in a similar manner as in H.264. The residual block is partitioned into multiple square transform blocks. The supported transform block sizes are 4 × 4, 8 × 8, 16 × 16, and 32 × 32. A prediction block of HEVC intra prediction method is formed based on previously encoded adjacent blocks. The 64 pixels in the 8 × 8 block are predicted by using the boundary pixels of the upper and left blocks which are previously obtained, which use a prediction formula corresponding to the selected optimal prediction mode, as shown in Fig. 1. Each 8 × 8 block has 33 angular prediction modes (mode 2–34).
606
S. Liu et al.
Fig. 1. Labeling of prediction samples
2.2 Intra-frame Distortion Drift Distortion drift refers to that embedding the current block not only causes the distortion of the current block, but also causes the distortion of its adjacent blocks. As illustrated in Fig. 2, we assume that the intra-frame prediction block is Bi,j , the value of each pixel in Bi,j is calculated by referring to the boundary pixels (gray part) of its adjacent blocks. The embedding induced errors in blocks Bi−1,j−1 , Bi,j−1 , Bi−1,j , and Bi-1,j+1 would propagate to Bi,j because of the intra-frame prediction process.
Fig. 2. The intra-frame prediction block Bi,j and its adjacent encoded blocks
For convenience, we give several definitions, the 8 × 8 block on the right of the current block is defined as right-block; the 8 × 8 block under the current block is defined as
A Novel DCT-Based Video Steganography Algorithm for HEVC
607
under-block; the 8 × 8 block on the left of the under-block is defined as under-left-block; the 8 × 8 block on the right of the under-block is defined as under-right-block; the 8 × 8 block on the top of the right-block is defined as top-right-block, as shown in Fig. 3. The embedding induced errors in the current block would transfer through the boundary pixels to these five adjacent blocks.
Fig. 3. Definition of adjacent blocks
2.3 (7, 4) Hamming Code Hamming code is a linear block code that can correct single-bit error. (7, 4) Hamming code is composed of 4 data bits (set as d1 , d2 , d3 , d4 ) and 3 check bits (set as p1 , p2 , p3 ), and the codeword form is [p1 , p2 ,d1 , p3 , d2 , d3 , d4 ]. The relationship between the data bits and the check bits of the (7, 4) Hamming code is shown in (1), ⊕ stands for XOR operation. p1 = d 1 ⊕ d 2 ⊕ d 4 p2 = d 1 ⊕ d 3 ⊕ d 4 p3 = d 2 ⊕ d 3 ⊕ d 4
(1)
The main principle of the (7, 4) Hamming code being able to correct single-bit error is that the receiver uses the check matrix H and the received codeword to calculate the check vector s, and through the check vector s can determine whether the received codeword is correct. If s is equal to (0, 0, 0), it means that the received codeword is correct; otherwise, it can be judged that the received codeword is incorrect. When the received codeword is wrong, the location G (1 represents the bit-error position) corresponding to the error code can be found according to Table 1, and the single -bit error correction can be performed.
608
S. Liu et al. Table 1. Corresponding relationship between check vector and bit-error position s
G
000
0000000
001
1000000
010
0100000
011
0010000
100
0001000
101
0000100
110
0000010
111
0000001
⎤ 0001111 ⎥ ⎢ H = ⎣ 0110011⎦ 1010101 T s = H × RTCW mod 2 ⎡
(2)
(3)
CCW = RCW ⊕ G
(4)
The specific calculation process is shown in (2) to (4), where the RCW represents the received codeword, T represents the matrix transpose operation, and CCW represents the corrected codeword. We can use the (7, 4) Hamming code to take the first row or column of the 8 × 8 QDCT block as the embedding position, and modify 1-bit data to embed 3-bit secret a00
a00
a01
a02
a03
a04
a05
a10 a20 a30 a40 a50 a60
a
b
Fig. 4. (a) Examples of the first column (b) Examples of the first row
a06
A Novel DCT-Based Video Steganography Algorithm for HEVC
609
information. In this paper, we choose to take the first 7 coefficients of the first row (a00 to a06 ) or column (a00 to a60 ) of the 8 × 8 QDCT block as the embedding positions, as shown in Fig. 4.
3 Proposed Scheme 3.1 Embedding Because the boundary pixels of the current block will be used for intra-frame prediction reference by its adjacent blocks, if the embedding error just changed the other pixels of the current block instead of the boundary pixels used for intra-frame prediction reference, then the intra-frame distortion drift can be avoided. Based on this idea, two conditions are proposed to prevent the intra-frame distortion drift. Condition1: Right-mode ∈ {2–25}, Under-right-mode ∈ {11–25}, Top-right-mode ∈ {2–9}. Condition2: Under-left-mode ∈ {27–34}, Under-mode ∈ {11–34}. The “mode” refers to the prediction mode used by the five adjacent blocks of the current block. If the current block meets Condition1, the pixel values of the last column should not be changed in the following intra-frame prediction. If the current block meets Condition 2, the pixel values of the last row should not be changed in the following intra-frame prediction. If the current block meets the Condition 1 and 2 at the same time, the current block should not be embedded. If both the Condition 1 and 2 cannot be satisfied, the current block can be arbitrarily embedded where the induced errors won’t transfer through the boundary pixels to the five adjacent blocks, that means the distortion drift won’t happen, but in this paper we don’t discuss this situation, the current block should also not be embedded. Two sets of coupling coefficients are proposed to meet the above conditions when embedded in. The coupling coefficients can be defined as a two-coefficient combination (C1 , C2 ), C1 is used for bit embedding, and C2 is used for distortion compensation. The specific definitions are as follows, coupling coefficients of VS applies to condition 1, and coupling coefficients of HS applies to condition 2: VS(Vertical Set) = (C i0 = 1, C i4 = −1), (C i0 = −1, C i4 = 1) (i = 0,1,2,3). HS(Horizontal Set) = (C 0j = 1, C 4j = −1), (C 0j = −1, C 4j = 1) (j = 0,1,2,3). For example, add 1 to a00 in an 8 × 8 QDCT block, and then subtract 1 from a04 . This modification will not change the pixel value in the last column of this block, as shown in Fig. 5 (a). Similarly, subtract 1 from a00 in an 8 × 8 QDCT block, and then add 1 to a40 . This modification will not change the pixel value of the last row of the block, as shown in Fig. 5(b).
610 a00+1
S. Liu et al. a04-1
a00+1
a40-1
a
b
Fig. 5. (a) Examples of VS (b) Examples of HS
Assume that the 3-bit secret information to be embedded is m(m1 , m2 , m3 ), the first 7 coefficients of the first row (a00 to a06 ) or column (a00 to a60 ) of the 8 × 8 QDCT block. If the 8 × 8 QDCT block meets Condition 1, then the first 7 coefficients of the column (a00 to a60 ) are used to embed secret information. If the 8 × 8 QDCT block meets Condition 2, then the first 7 coefficients of the first row (a00 to a06 ) are used to embed secret information. Take the coefficients (a00 , a10 , a20 , a30 , a40 , a50 , a60 ) applicable to Condition 1 as an example. For Condition 2, the embedding process is similar. T s = H × (a00 , a10 , a20 , a30 , a40 , a50 , a60 )T mod 2 ⊕ m
(5)
Based on the principle of (7, 4) Hamming code, after calculating the value of s according to (5), we can look up the Table 1 to get the corresponding G, and then coupling coefficients modification is made to the coefficient at position G. Specifically, For Condition 1, if the coefficient ai0 ≥ 0, then ai0 = ai0 + 1, ai4 = ai4 − 1; if ai0 < 0, then ai0 = ai0 − 1, ai4 = ai4 + 1 (i = 0, 1, 2, 3). For Condition 2, if the coefficient a0j ≥ 0, then a0j = a0j + 1, a4j = a4j − 1; if a0j < 0, then a0j = a0j − 1, a4j = a4j + 1 (j = 0, 1, 2, 3). As mentioned above, ai4 and a4j are used to prevent distortion drift. For ease of understanding, a specific example is given here. Assume that the 3-bit secret information m = (101), the coefficients (a00 , a10 , a20 , a30 , a40 , a50 , a60 ) = (4, 3, 0, 2, 1, 1, 0), as shown in Fig. 6. According to (5), s = (100). According to Table 1, G = (0001000). This means that the fourth coefficient a30 needs to be modified, because a30 ≥ 0, so a30 = a30 + 1 = 3. According to coupling coefficients modification rule, a34 is used to prevent distortion drift, and a34 = a34 − 1.
A Novel DCT-Based Video Steganography Algorithm for HEVC
611
4 3 0 2+1
a34-1
1 1 0
Fig. 6. Examples of embedding
By using the aforementioned (7, 4) Hamming code, embedding 3-bit secret information only needs to modify at most 2 coefficients. If the conventional line-by-line embedding method is used, at most 6 coefficients need to be modified to embed the 3-bit secret information. After the original video is entropy decoded, we get the intra-frame prediction modes and 8 × 8 QDCT block coefficients. By using the (7, 4) Hamming code, we embed the secret information by the coupling coefficients into the selected 8 × 8 QDCT blocks which meet the conditions. Finally, all the 8 × 8 QDCT block coefficients are entropy encoded to get the carrier video. 3.2 Data Extraction After entropy decoding of the HEVC carrier video, the extraction operations are performed on the coefficients of the first row or first column in the selected 8 × 8 luminance QDCT blocks which meet the corresponding condition. The extraction operation is repeated until the number of extraction bits is equal to the secret information bits notified in advance, thereby extracting the embedded secret information. Assume that the 3-bit secret information to be extracted is m = (m1 , m2 , m3 ). Take the received coefficients (a00 , a10 , a20 , a30 , a40 , a50 , a60 ) applicable to Condition 1 as an example. For Condition 2, the extraction process is similar. T (6) m = H × (a00 , a10 , a20 , a30 , a40 , a50 , a60 )T mod 2
4 Experiments and Results Analyses The proposed method has been implemented in the HEVC reference software version HM16.0. In this paper we take “BQSquare” (416 * 240), “BlowingBubbles” (416 * 240),
612
S. Liu et al.
“BQMall” (832 * 480) and “Johnny” (1280 * 720) as test video. The GOP size is set to 1 and the values of QP (Quantization Parameter) are set to be 16, 24 and 32. The method in [6] is used for performance comparisons. As shown in Table 2, the PSNR (Peak Signal to Noise Ratio) of our method is better than the method proposed in [6] in each video sequences. Because compared to the method proposed in [6], our method uses (7, 4) Hamming code to further reduce the modification of the QDCT embedding block. Table 2. PSNR (dB) of embedded frame in each video sequences Sequences
Method
QP = 16
QP = 24
QP = 32
BQSquare
In this paper
47.82
42.78
37.33
In [6]
46.43
41.31
36.01
BlowingBubbles
In this paper
48.10
42.63
37.45
In [6]
46.76
41.15
36.25
In this paper
47.88
43.17
36.73
In [6]
46.75
41.85
35.22
In this paper
47.53
42.88
37.52
In [6]
46.33
41.29
36.21
BQMall Johnny
In terms of embedding capacity, as shown in Table 3, the embedding capacity of our method is lower than the method in [6] of average per frame. Because the method proposed in [6] can embed 8-bit secret information in most eligible 8 × 8 QDCT blocks, our method can embed 3-bit secret information in each eligible 8 × 8 QDCT block. Although the embedding capacity is reduced to improve visual quality, it is still acceptable. Table 3. Embedding capacity (bit) of embedded frame in each video sequences QP = 16
QP = 24
QP = 32
In this paper
2943
3036
2712
In [6]
6165
6324
5537
In this paper
2880
2946
2682
In [6]
5788
6227
5469
BQMall
In this paper
8709
8913
8541
In [6]
18234
19660
18366
Johnny
In this paper
16884
17652
17121
In [6]
35225
38427
36423
Sequences
Method
BQSquare BlowingBubbles
A Novel DCT-Based Video Steganography Algorithm for HEVC
613
Fig. 7. (a) Method in this paper (b) Method in [6].
Fig. 8. (a) Method in this paper (b) Method in [6].
Figures 7 and 8 are the video frame screenshots of two steganography methods. Since our method reduces the modification to the 8 × 8 QDCT blocks, it has a higher visual quality.
5 Conclusion This paper proposed a novel HEVC video steganography method based on intra-frame QDCT coefficient. Two sets of coupling coefficients are used to avert the intra-frame distortion drift in 8 × 8 luminance QDCT blocks. In order to further improve the visual quality of the proposed algorithm, the (7, 4) Hamming code is used to reduce the modification of these embedded blocks. Experimental results demonstrate the feasibility and superiority of the proposed method. Acknowledgment. This paper is sponsored by the National Natural Science Foundation of China (NSFC, Grant 61572447), Henan International Joint Laboratory of Blockchain and Audio/Video Security, Zhengzhou Key Laboratory of Blockchain and CyberSecurity.
614
S. Liu et al.
References 1. Liu, Y., Liu, S., Wang, Y., et al.: Video steganography: a review. Neurocomputing, 335(MAR.28), 238–250 (2019) 2. Liu, Y., Liu, S., Wang, Y., et al.: Video coding and processing: a survey. Neurocomputing (2020) 3. Sze, V., Budagavi, M., Sullivan, G.J.: High efficiency video coding (HEVC). Integrated Circuit and Systems, Algorithms and Architectures, vol. 39, p. 40. Springer, Cham (2014). https:// doi.org/10.1007/978-3-319-06895-4 4. Liu, Y., Li, Z., Ma, X., et al.: A robust without intra-frame distortion drift data hiding algorithm based on H. 264/AVC. Multimedia Tools Appl. 72(1), 613–636 (2014) 5. Liu, Y., Hu, M., Ma, X., et al.: A new robust data hiding method for H. 264/AVC without intra-frame distortion drift. Neurocomputing 151, 1076–1085 (2015) 6. Chang, P.-C., Chung, K.-L., Chen, J.-J., Lin, C.-H., et al.: A DCT/DST-based error propagation-free data hiding algorithm for HEVC intra-coded frames. J. Vis. Commun. Image Represent. 25(2), 239–253 (2013) 7. Liu, S., Liu, Y., Lv, G., Feng, C., Zhao, H.: Hiding bitcoin transaction information based on HEVC. In: Qiu, M. (ed.) SmartBlock 2018. LNCS, vol. 11373, pp. 1–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05764-0_1 8. Gaj, S., Kanetkar, A., Sur, A., Bora, P.K.: Drift-compensated robust watermarking algorithm for H. 265/HEVC video stream. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 13(1), 11 (2017) 9. Liu, Y., et al.: A robust and improved visual quality data hiding method for HEVC. IEEE Access 2018(6), 53984–53997 (2018) 10. Liu, S., Liu, Y., Feng, C., Zhao, H.: A reversible data hiding method based on HEVC without distortion drift. In: Huang, D.-S., Hussain, A., Han, K., Gromiha, M.M. (eds.) ICIC 2017. LNCS (LNAI), vol. 10363, pp. 613–624. Springer, Cham (2017). https://doi.org/10.1007/ 978-3-319-63315-2_53 11. Zhao, H., Pang, M., Liu, Y.: An efficient video steganography scheme for data protection in H.265/HEVC. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Bevilacqua, V. (eds.) ICIC 2021. LNCS, vol. 12836, pp. 358–368. Springer, Cham (2021). https://doi.org/10.1007/9783-030-84522-3_29 12. Liu, S.., Liu, Y., Feng, C., Zhao, H., Huang, Y.: A HEVC steganography method based on QDCT coefficient. In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 624–632. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60796-8_54 13. Zhao, H., Liu, Y., Wang, Y., et al.: A video steganography method based on transform block decision for H. 265/HEVC. IEEE Access 9, 55506–55521 (2021) 14. Liu, Y., Liu, S., Zhao, H., Liu, S.: A new data hiding method for H. 265/HEVC video streams without intra-frame distortion drift. Multimedia Tools Appl. 78(6), 6459–6486 (2019) 15. Liu, S., Liu, Y., Feng, C., Zhao, H.: An efficient video steganography method based on HEVC. In: Huang, D.-S., Jo, K.-H., Li, J., Gribova, V., Bevilacqua, V. (eds.) ICIC 2021. LNCS, vol. 12836, pp. 327–336. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-84522-3_26
Dynamic Recurrent Embedding for Temporal Interaction Networks Qilin Liu1(B) , Xiaobo Zhu1 , Changgan Yuan2 , Hongje Wu3 , and Xinming Zhao4 1 Institute of Machine Learning and Systems Biology, School of Electronics and Information
Engineering, Tongji University, Shanghai 201804, China [email protected] 2 Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning 530007, China 3 School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China 4 Institute of Science and Technology for Brain Inspired Intelligence, Fudan University, Shanghai 200433, China
Abstract. Temporal network data in the form of sequential interactions are constantly generated from various domains. Modelling and application of such data require effective approaches to capture temporal and topological information. Graph representation learning transforms the complex data into low-dimensional embedding, which can be utilized in multiple subsequent tasks. However, few methods have managed to merge the temporal information with the graph embedding. In this work, we propose a model to generate temporal graph embedding, employing a gated recurrent neural network as an updater, capturing the time feature with time encoding, and using the embedding in future interaction prediction tasks, and has achieved performance comparable to state-of-art networks in two real-world datasets in both transductive and inductive settings. The ablation studies validated the effectiveness of node embedding and time encoding. Keywords: Temporal graph · Graph embedding · Node embedding · Time feature · Future interaction prediction
1 Introduction A massive number of interactive data are generated continuously in various domains every day [4, 11, 24, 37, 38], which are stored in the form of interactive event sequences in databases, composing graph structure data with time features [8, 20, 22, 23]. Given its dynamic nature and multi-dimensional structure [5, 31], dealing with such data requires methods to represent its entities with low-dimensional dynamic embedding [10, 12, 26, 32, 35]. Graph representation learning has recently become a powerful approach to generating low-dimensional embedding of entities [3, 7, 22], the classic static approach to generate embedding computes snapshots of the whole graph [29, 30, 34, 36] with graph neural networks, which require expensive hardware and computing resources. While the dynamic approach generates the embedding with a recurrent neural network, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 615–625, 2022. https://doi.org/10.1007/978-3-031-13832-4_50
616
Q. Liu et al.
which has achieved state-of-art in sequential data processing in multiple domains, is recently adopted to generate embedding for temporal graphs [9, 12]. Meanwhile, modelling and representing interactions change and increase over time face multiple challenges. If the embedding of a certain entity is generated or updated only when the interactive events happen [13, 20, 22, 23, 26, 32], the embedding would remain the same whenever the next event happens, which may not be accurate in domains like e-commerce, hence a customer may not purchase anything in a long time, but its intent could have changed in such period. Contribution. First, we propose a model that operates on continuous-time dynamic graphs, which are composed of sequences of time-stamped interactions, to generate temporal graph embedding that maintains the topological and temporal information of the graph. Second, we adopted our model for the future interaction prediction task, which is a classic application in recommendation systems and has achieved performance comparable to state-of-art networks in two datasets. Third, we performed a detailed ablation study of different parts of our embedding to validate the effectiveness of our model. The Related works section briefly reviews the graph representation learning methods and the shortcoming of the previous research. The Model section introduces the embedding and training methods of our model. The Experiment section analyses the performance of our work and performs an ablation study.
2 Related Works In the early research, the temporal graphs are processed in the form of snapshots, which information can be aggregated by static methods. Recent research on graph neural networks has provided a variety of high-precision methods. DeepWalk [36] uses a random walk algorithm to extract some vertex sequences from the graph, regarding the sentence composed of words, then the natural language processing tool such as word2vec [35] generates the embedding. Node2Vec [33] is an extension to DeepWalk, combining the deep-first search algorithm (DFS) with the breadth-first search algorithm (BFS) to the random walk algorithm. Graph autoencoder (GAE) and variational GAE (VGAE) [34] introduced the conception of an encoder-decoder structure that works on the adjacency matrix and feature matrix of the graph. GraphSAGE [28] optimizes the sampling method of GNN, reducing the run-time on large scale data. Graph attention (GAT) [30] introduced an attention mechanism into graph learning. The main disadvantage of the static methods is that they are not designed for temporal tasks. The embedding was calculated only when the snapshots were taken, increasing its complexity while the temporal information between shots was lost. The dynamic methods process the temporal graph in the form of timestamped sequence, using mature methods of recurrent neural networks. Time long short term memory (Time-LSTM) [31] introduced time interval (time difference in our work) into LSTM approaches and applied the model to the recommendation system. LatentCross [23] also mentioned the time feature is crucial to improving model performance. Deep
Dynamic Recurrent Embedding for Temporal Interaction Networks
617
Fig. 1. An overview of the computation of temporal graph embedding with one interaction.
Coevolutionary Network [32] proposed a framework, which can update the node representation iteratively according to the interaction process. JODIE [12] used time features in the projection of embedding with a coevolutionary method, but the time features lack non-linearity. Temporal graph attention (TGAT) [9] using the time coding technique to capture the time features. Temporal graph networks (TGN) [5] are composed of an encoder and a decoder, where the encoder is responsible for encoding each node of the dynamic graph into a vector and the decoder calculates the prediction attributes from the encoded vector, but its temporal embedding is separated from the node embedding as two sub-networks and wasn’t properly combined.
3 Model In this section, we first introduce the problem definition for the sake of discussion, and then we propose our model methods in three parts. In part one, we discuss the method for generating the states of nodes and how they are updated. In part two, we discuss the method for introducing the time feature into our embedding. In the last part, we discuss our training methods. 3.1 Problem Definition Temporal Graph. A temporal graph G can be modelled as an edge set in form of an nsized sequence of events s with time stamps t k , G = {s(t 1 ), s(t 2 ),…}, considering there may be more than one edge exist between two nodes (i.e. customers purchase same merchandise again) it is necessary to distinguish different events with its timestamp. Each event s(t k ) consists of two or more nodes in the graph, which represent the entities that generate or receive the interactive event, and the interaction is represented as a temporal edge that can be directed or undirected, thus can be represented as a quadruple s(t k ) = (vi ,vj ,t k ,wk ), where the nodes in the event are denoted as vi and vj , and weight of the edge is denoted as wk , which is optional due to different datasets. Hence most tasks consider one entity interacting with another, we mainly discuss the scenario consisting of two nodes in this work. Considering there might be more than one
618
Q. Liu et al.
edge between two nodes, it is necessary to introduce another representation for events in a period. For convenience, we denote the temporal edge as ei j(t), hence the events happened before time stamp t k can be represented as M(t k ) = {(i,j)|eij (t) ∈ G,t < t k }, which represents all the timestamps in the temporal graph G. Temporal Embedding. Given the denote of event sequence representation we discussed above, now we can say with a given temporal network G = {s(t 1 ), s(t 2 ),…} and a time point t k , for node vi in G, we can abstract its relative events before t k from M(t k ), denoted as M i (t k ) = {(vi ,vj )|eij (t) ∈ M(t k )} + {(vj ,vi )|eji (t) ∈ M(t k )}. Define Gtk = {s(t 1 ), s(t 2 ),…,s(t k )} Purpose of temporal graph embedding is to seek a function Embedding(G,t k ) = f (Gtk , M(t k )) which maps the graph topological and temporal information into an m-dimensional embedding.
3.2 Update the Node Embedding As we discussed previously in the problem definition subsection, the node temporal embedding at a time point t is computed with its previous interactions M(t). It is not practical to recalculate the embedding of the entire graph every time the data batch goes into the model, thus our model updates the embedding for each data batch. Several recent models employ recurrent neural networks (RNNs) and variants (i.e. LSTMs and gated recurrent units (GRUs)) to implement their temporal node embedding [6, 11, 21, 27, 31]. Figure 1 illustrates the overview of the computation of temporal graph embedding. Computation on the top describes the generating and updating of node embedding, which captures the graph structural information, below describes the time encoding. Time encoding and node embedding are merged with the initial node embedding as the temporal graph embedding. One interaction contains two nodes, and the embedding of both nodes will be updated. GRU contains two gates which solve the gradient problems of regular RNNs while having a simpler structure than LSTMs. Our model employs GRU instead of RNNs or their variants, given they are experimentally similar and may cause worse results. To reduce the number of trainable parameters, we only discuss the scenario of GRU. For one interaction, we can update the embedding for its source node and destination node, the relative interactions for both nodes can be denoted as following quadruple: Ri (t) = vi , vj , t, eij (t) Rj (t) = vj , vi , t, eij (t) (1) Here, t − is the last timestamp of node vi before the interaction at timestamp t joined the embedding computation, the time of the previous interactive event involving node vi . Δt = t − t − . To interactively update the embedding of both vi and vj , the calculations of updating both nodes embedding are: N (vi , t) = GRU ([N (vi , t − )||f (t)||N (vj , t − )||ti ]) N (vj , t) = GRU ([N (vj , t − )||f (t)||N (vi , t − )||tj ])
(2)
where N(vi ,t) is the node embedding of node i at the time point t. N(vi ,t − ) contains the information of interactions before t. f (t) is the feature of the event at t, || is the
Dynamic Recurrent Embedding for Temporal Interaction Networks
619
concatenation operator. For each data batch containing, we update both source and destination nodes at the same time, a detailed updating computation is presented in Fig. 2. 3.3 Generate the Temporal Embedding If the embedding of a node only gets updated when interactions involving it happened, it is known that a disadvantage called memory staleness may occur [12]. A majority of research generate or update embedding that may cause such problem [20, 22, 23, 25, 26, 32], while our embedding module is intended to avoid such problem. Node vi may not be involved in any interactive event in a relatively long term, but the embedding will remain the same no matter how long the node has been stopped its interaction (i.e. If a Wikipedia editor due to some reason stopped his editing work for some time, but his work might have already been finished by others, so he won’t be editing same work when he returns). Thus we can introduce the time difference into the embedding. Some previous research have made much effort on time encoding of temporal graphs [5, 12, 27, 31], we use the time encoding technique (Xu et al., 2020) in our model, due to its simplicity and performance. It takes a single Δt as input and calculates a n-dimensional output as the encoding of time, denoted as Φ(Δt). With the time encoding joins the embedding, we can make modifications to our embedding method:
(3)
where → indicates linear transformation and ⇒ indicates linear transformations with non-linear activation functions. N(0) is the initial embedding for nodes in the graph,
Fig. 2. The updating function. For each node at time t, the input consists of nodes embedding both source node destination node at former time points (N(vi ,t i − ), N(vj ,t j − ) for the source node, in the case of destination node the order is reversed), the feature of the event f (t), and the time difference (omitted in the illustration). GRU appeared twice for the convenience of illustrating. Only one GRU cell is deployed in the model, shared by all nodes.
620
Q. Liu et al.
functioning as a static embedding advised in some previous network [27, 31]. Compared with methods that directly multiply the time encoding to the interactive information embedding (source embedding) [12], the projection operation is lack non-linearity. We first chose to concatenate the time encoding with the node embedding, which preserves more information, and then we used a non-linear function to merge the node embedding and time encoding, adding non-linearity to the temporal embedding. Further discussion of the modules in the temporal embedding will be in Sect. 4. 3.4 Training The task of our problem is to predict the future link of the temporal graph, which is a self-supervised task. The original data provides a list of interactions associated with timestamps, and we need to predict how nodes will interact with others in the future by training the model with those past-time interactions. The merge function we used in the model consists of two MLPs and one sigmoid function to add non-linearity. (4) L 1 converts the concatenation of node embedding, time encoding, and initial node embedding (static node feature) into a lower-dimensional embedding, and Sigmoid adds non-linearity to the embedding, with MLP1 : Rdi+dt+di → Rd and MLP2 : Rd → Rd . N(t) is a promotion form of N(vi ,t) in Eq. (2), representing all nodes in the training batch. To calculate the loss, we used the negative sampling technique, which is widely used in networks associated with recurrent neural networks [6, 11, 32]. Assuming every batch the model consumes is a peek into the future interactions. Then before learning the batch, we make our model predict if there will be an interactive edge between the given source nodes and both positive or negative nodes, in the form of edge probability. Next is to compute the loss. The prediction operation is similar to the Merge operation in the previous section, except the output is a 1-dimensional score, to calculate the loss for both positive and negative samples: Score(Vi , Vj ) = MLP2 (Sigmoid (MLP1 ([Emb(Vi )||Emb(Vj )])))
(5)
Here the input of prediction operation is the concatenation of the source nodes embedding Emb(V i ) and destination nodes embedding Emb(V j ), where V i is source node set and V j is destination node set. MLP1 : Rd+d → Rd and MLP2 : Rd → R1 . The score is calculated for both positive and negative samples as the input X of the loss function, which is binary cross entropy, with mean reduction: lN (X , Y ) = −
n 1 wi yi · log(xi ) + (1 − yi ) · log(1 − xi ) n
(6)
i=0
where the input Y are ones when X are positive sampling scores, zeros if the otherwise, n is the number of nodes in the sample. Losses of positive and negative samplings are added together as the loss of the model.
Dynamic Recurrent Embedding for Temporal Interaction Networks
621
4 Experiments Datasets and Experiment Settings. We use two datasets in our experiments: Wikipedia, and Reddit (Kumar et al., 2019). Wikipedia edit is a public dataset, consisting of one month of edits made to Wikipedia pages by editors [3], containing 157,474 interactions; And Reddit post consists of one month of posts made by users on subreddits, containing 672,447 interactions. Table 1. Model hyper-parameters Hyper-parameters
Value
Interactive embedding dimension
172
Time encoding dimension
100
Temporal graph embedding dimension
100
Learning rate
0.0001
Loss function
BCE
Optimizer
ADAM
Table 2. Average percentage (%) of future edge prediction task in transductive and inductive settings. * denotes static graph method. - denotes those who do not support inductive settings Dataset
Wikipedia
Reddit
Transductive
Inductive
Transductive
Inductive
GAE*
91.44 ± 0.1
–
93.23 ± 0.3
–
VAGE*
91.34 ± 0.3
–
92.92 ± 0.2
–
DeepWalk*
90.71 ± 0.6
–
83.10 ± 0.5
–
Node2Vec*
91.48 ± 0.3
–
84.58 ± 0.5
–
GAT*
94.73 ± 0.2
91.27 ± 0.4
97.33 ± 0.2
95.37 ± 0.3
GraphSAGE*
93.56 ± 0.3
91.09 ± 0.3
97.65 ± 0.2
96.27 ± 0.2
CTDNE
92.17 ± 0.5
-
91.41 ± 0.3
-
Jodie
94.62 ± 0.5
93.11 ± 0.4
97.11 ± 0.3
94.36 ± 1.1
TGAT
95.34 ± 0.1
93.99 ± 0.3
98.12 ± 0.2
96.62 ± 0.3
DyRep
94.59 ± 0.2
92.05 ± 0.3
97.98 ± 0.1
95.68 ± 0.2
Our model
96.90 ± 0.14
93.90 ± 0.16
98.61 ± 0.02
96.61 ± 0.12
622
Q. Liu et al.
We study both the transductive and inductive settings. In the transductive task, we predict future links of the nodes observed during training, whereas in the inductive tasks we predict future links of nodes never observed before. Our experimental setup follows the TGAT, conducting both transductive and inductive settings during our experiment. In the transductive setting, the model can observe all nodes before the validation, while in the inductive setting, the validation set is composed completely of new nodes that the model has never observed in training. Performance. Table 2 presents the average percentage of future edge prediction, with several state-of-art previous models as baselines. As shown in the table, our model outperforms most of the baselines in both transductive and inductive settings. All results of our model were averaged over 10 runs, and the hyper-parameters we used in our experiment are presented in Table 1, where BCE represents the binary cross entropy loss function and ADAM represents the adaptive moment estimation optimizer. Results for GAE (Kipf & Welling, 2016), VGAE (Kipf & Welling, 2016), DeepWalk (Perozzi et al., 2014), Node2Vec (Grover & Leskovec, 2016), GAT (Velickovic et al., 2018) and GraphSAGE (Hamilton et al., 2017), CTDNE (Nguyen et al., 2018) and TGAT (Xu et al., 2020) are taken directly from the TGAT paper (Xu et al., 2020), Jodie (Kumar et al., 2019) and DyRep (Trivedi et al., 2019) are from the TGN paper (Rossi et al., 2020). To make a convincing comparison, the hyper-parameters of the dynamic comparative methods are the same. As demonstrated in Table 2, our model outperforms all the static methods in both settings and all the dynamic methods in transductive settings. The AP performance in inductive settings is only second to TGAT and the differences are less than 0.001, and our model has higher stability. Compared to JODIE which uses linear time projection, our model with non-linearity has a better performance. Ablation Study. In Sect. 3 we discussed the modules for temporal graph embedding. It consists of three parts: the current node embedding, the time encoding, and the initial node embedding. In the ablation study, we conduct the test on the Wikipedia dataset, to validate the effectiveness of these three parts in our model. We will test the node embedding and time encoding individually as the input to the merged layer (since the initial node embedding won’t change it cannot be tested alone) and at last test the embedding without initial node embedding. The hyper-parameters remain the same as Table 1, we compare both the average percentages (AP) and areas under the ROC curves (AUC) in the transductive setting. results are presented in Table 3, which validated that all three parts have contributed certain effectiveness to our model. The ablation studies proved that the topology information that node embedding captured is most essential to the model, and the time features rather significantly improved the performance. Noticeably the initial embedding does have minor improvement to AP and AUC but also has increased instability.
Dynamic Recurrent Embedding for Temporal Interaction Networks
623
Table 3. AP (%) and AUC (%) of the ablation studies. In transductive setting. Node embedding
Time encoding
Initial node embedding
✓ ✓
✓
✓ ✓
✓
✓
✓
✓
✓
✓
AP
AUC
83.08 ± 1.23
85.12 ± 1.25
83.63 ± 0.76
85.07 ± 0.63
95.27 ± 0.31
94.87 ± 0.27
95.40 ± 0.35
95.06 ± 0.36
96.78 ± 0.08
96.54 ± 0.07
96.90 ± 0.14
96.71 ± 0.15
5 Conclusions In this work, we proposed a model using a gated recurrent neural network to generate temporal graph embedding that maintains the topological and temporal information of the graph. Our model learns to predict the future interaction of the temporal graph and has achieved performance comparable to state-of-art networks in two real-world datasets. And the detailed ablation studies validated the effectiveness of our model, showing the importance of node embedding that captures the temporal and topological information, as well as the importance of time features that increase the effectiveness. We envision the applications of our model in different tasks such as node classification, and graph classification, and explore more domains, such as recommendation systems, traffic prediction, and social sciences. Acknowledgements. This work was supported by the grant of National Key R&D Program of China (No. 2018YFA0902600 & 2018AAA0100100) or partly supported by National Natural Science Foundation of China (Grant nos. 61732012, 62002266, 61932008, and 62073231), or the Introduction Plan of High-end Foreign Experts (Grant no. G2021033002L) neither, respectively, supported by the Key Project of Science and Technology of Guangxi (Grant no. 2021AB20147), Guangxi Natural Science Foundation (Grant nos. 2021JJA170204 & 2021JJA170199) nor Guangxi Science and Technology Base and Talents Special Project (Grant nos. 2021AC19354 & 2021AC19394).
References 1. Huang, J., et al.: Deep reinforcement learning-based trajectory pricing on ride-hailing platforms. ACM Trans. Intell. Syst. Technol. (TIST) 13(3), 1–19 (2022) 2. Wu, D., et al.: Attention deep model with multi-scale deep supervision for person reidentification. IEEE Trans. Emerg. Top. Comput. Intell. 5(1), 70–78 (2021) 3. Kazemi, S.M., Goel, R.: Representation learning for dynamic graphs: a survey. J. Mach. Learn. Res. 21(70), 1–73 (2020) 4. Li, Z., et al.: License plate detection and recognition technology for complex real scenarios. In: Huang, D.-S., Bevilacqua, V., Hussain, A. (eds.) ICIC 2020. LNCS, vol. 12463, pp. 241–256. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60799-9_21 5. Rossi, E., et al.: Temporal graph networks for deep learning on dynamic graphs. arXiv preprint arXiv:2006.10637 (2020)
624
Q. Liu et al.
6. Wang, X., et al.: Reinforced negative sampling over knowledge graph for recommendation. In: Proceedings of The Web Conference 2020, pp. 99–109 (2020) 7. Wu, Y., et al.: Person reidentification by multiscale feature representation learning with random batch feature mask. IEEE Trans. Cogn. Dev. Syst. 13(4), 865–874 (2020) 8. Wu, Y., et al.: Position attention-guided learning for infrared-visible person re-identification. In: Huang, D.-S., Bevilacqua, V., Hussain, A. (eds.) ICIC 2020. LNCS, vol. 12463, pp. 387– 397. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60799-9_34 9. Xu, D., et al.: Inductive representation learning on temporal graphs. arXiv preprint arXiv: 2002.07962 (2020) 10. Zhang, Y., Zhang, Q., Yuan, C., Qin, X., Wu, H., Zhao, X.: Predicting in-vitro transcription factor binding sites with deep embedding convolution network. In: Huang, D.-S., Jo, K.-H. (eds.) ICIC 2020. LNCS, vol. 12464, pp. 90–100. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-60802-6_9 11. Ding, J., et al.: Reinforced negative sampling for recommendation with exposure data. In: IJCAI, pp. 2230–2236 (2019) 12. Kumar, S., Zhang, X., Leskovec, J.: Predicting dynamic embedding trajectory in temporal interaction networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1269–1278 (2019) 13. Li, B., et al.: Robust dimensionality reduction via feature space to feature space distance metric learning. Neural Netw. 112, 1–14 (2019) 14. Liang, X., Wu, D., Huang, D.-S.: Image co-segmentation via locally biased discriminative clustering. IEEE Trans. Knowl. Data Eng. 31(11), 2228–2233 (2019) 15. Wu, D., et al.: A deep model with combined losses for person re-identification. Cogn. Syst. Res. 54, 74–82 (2019) 16. Wu, D., et al.: A novel deep model with multi-loss and efficient training for person reidentification. In: Neurocomputing 324, 69–75 (2019) 17. Wu, D., et al.: Deep learning-based methods for person re-identification: a comprehensive review. Neurocomputing 337, 354–371 (2019) 18. Wu, D., et al.: Omnidirectional feature learning for person re-identification. IEEE Access 7, 28402–28411 (2019) 19. Wu, D., et al.: Random occlusion recovery for person re-identification. J. Imaging Sci. Technol. 63(3), 30405–1 (2019) 20. You, J., et al.: Hierarchical temporal convolutional networks for dynamic recommender systems. In: The World Wide Web Conference, pp. 2236–2246 (2019) 21. Yuan, C., et al.: An effective image classification method for shallow densely connected convolution networks through squeezing and splitting techniques. Appl. Intell. 49(10), 3570– 3586 (2019) 22. Zhang, S., et al.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. (CSUR) 52(1), 1–38 (2019) 23. Beutel, A., et al.: Latent cross: making use of context in recurrent recommender systems. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 46–54 (2018) 24. Kumar, S., et al.: Community interaction and conflict on the web. In: Proceedings of the 2018 World Wide Web Conference, pp. 933–943 (2018) 25. Wu, Y., et al.: Convolution neural network based transfer learning for classification of flowers. In: 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP), pp. 562– 566. IEEE (2018) 26. Zhou, L., et al.: Dynamic network embedding by modeling triadic closure process. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1 (2018)
Dynamic Recurrent Embedding for Temporal Interaction Networks
625
27. Baytas, I.M., et al.: Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74 (2017) 28. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017) 29. Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584 (2017) 30. Veliˇckovi´c, P., et al.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017) 31. Zhu, Y., et al.: What to do next: modeling user behaviors by TimeLSTM. In: IJCAI, vol. 17, pp. 3602–3608 (2017) 32. Dai, H., et al.: Deep coevolutionary network: embedding user and item features for recommendation. arXiv preprint arXiv:1609.03675 (2016) 33. Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016) 34. Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016) 35. Goldberg, Y., Levy, O.: word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014) 36. Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014) 37. Liyanagunawardena, T.R., Adams, A.A., Williams, S.A.: MOOCs: a systematic study of the published literature 2008–2012. Int. Rev. Res. Open Distrib. Learn. 14(3), 202–227 (2013) 38. Iba, T., et al.: Analyzing the creative editing behavior of Wikipedia editors: through dynamic social network analysis. Procedia-Soc. Behav. Sci. 2(4), 6441–6456 (2010)
Deep Spatio-Temporal Attention Network for Click-Through Rate Prediction Xin-Lu Li1,2(B) , Peng Gao1,2 , Yuan-Yuan Lei1,2 , Le-Xuan Zhang1,2 , and Liang-Kuan Fang1,2 1 School of Artificial Intelligence and Big Data, Hefei University, Hefei 230601, China
[email protected] 2 Institute of Applied Optimization, Hefei University, Hefei 230601, China
Abstract. In online advertising systems, predicting the click-through rate (CTR) is an important task. Many studies only consider targeted advertisements in isolation, but do not focus on its relationship with other ads that may affect the CTR. We look at a variety of additional elements that can help with CTR prediction for tailored advertisements. We consider supplementary ads from two different angles: 1) the spatial domain, where contextual adverts on the same page as the target advertisements are considered and 2) from the perspective of the temporal component, where we assume people have previously clicked unclicked advertisements. The intuition is that contextual ads shown with targeted ads may influence each other. Also, advertisements that are connected reflect user preferences. Ads that are not connected may indicate to some extent what the user is not interested in. We propose a deep spatio-temporal neural network (DSTAN) for CTR prediction to use these auxiliary data effectively. Our model can decrease the noise in new data, learn the interaction between varied extra data and targeted advertisements, and fuse heterogeneous data into a coherent framework, highlighting important hidden information. Offline experiments on two public datasets show that DSTAN outperforms several of the most common methods in CTR prediction. Keywords: Click-through rate · Attention · Auxiliary information
1 Introduction Click-through rate (CTR) prediction estimates the probability that a user will interact with a candidate item. In the commercial scene, the CTR can indicate if a user clicks the unique advertising or not. It can then predict whether interest is being placed in the advertisement. Since online advertising is one of the primary sources of income for internet companies, the ability to forecast the CTR is critical in commercial decisionmaking. Improved CTR prediction is crucial in determining the system’s revenue. As a result, both academics and business have given CTR modeling and CTR prediction a lot of attention. Extensive models have been proposed to provide accurate CTR prediction and achieve successful application. Modeling important low-order and high-order feature © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 626–638, 2022. https://doi.org/10.1007/978-3-031-13832-4_51
Deep Spatio-Temporal Attention Network
627
interactions effectively is at the heart of many CTR techniques. In the early years, Logistic Regression (LR) [1] and Factorization Machines (FM) [2] were widely explored. Inspired by the success of deep learning in computer vision and natural language processing, a number of experts set out to apply this technique in our field: deep neural networks (DNNs) are used for CTR prediction. They can automatically learn feature representations and capture high-order feature interactions, which enhances the model capability. Many works extend second-order feature interaction to higher-order interaction, such as DeepFM [3], xDeepFM [4], Wide&Deep [5], FNN [6], DeepCross [7], and PNN [8, 9]. These models are designed to better exploit the multi-categorical characteristics of the data by emphasizing feature interactions. Nonetheless, such models overlook user behavior trends over time. Besides, these models focus on the relationship between the user and a single ad, neglecting the potential impact between related advertisements. The temporal dynamics of user behavior play a key role in predicting the CTR [10– 12]. A practical model should capture material interests from the historical behavior sequence. In this paper, we consider the historical series containing the user’s behaviors (e.g., clicked or unclicked ads). Clicked ads may represent a user’s tastes, whereas unclicked ads may reveal what a person dislikes. Furthermore, the contextual ads presented above or below the target ads on the same page may have a considerable impact on the estimated CTR. Motivated by the above observations, we propose a Deep Spatio-Temporal Attention Network (DSTAN) for CTR prediction task. In DSTAN, we adopt temporal and spatial relationships as auxiliary information to extract high-order feature interactions among feature fields to enhance the efficiency of the CTR prediction. First, the amounts of each type of auxiliary data may vary, causing the lengths of the corresponding list of embedding vectors to be variable. To convert the list of embedding vectors into a fixed-length vector, we employ a sum pooling layer [5, 11, 13]. Second, in many cases, the auxiliary information is not necessarily relevant to the target ads, which is caused by the cold-start problem when new users or new advertisements are recommended. So how extracting meaningful information from auxiliary data while suppressing noise is a critical challenge. We use self-attention to capture the inner interaction or correlation of auxiliary information. Besides, the spatio-temporal information contains much heterogeneous data and each kind of information has a different influence on the CTR task. We adopt interactive attention to fuse auxiliary data and differentiate their contributions. The significant contributions of our work can be summarized as follows. 1. We introduce contextual advertisements as spatial information, and userclicked/unclicked advertisements as temporal information. 2. We adopt self-attention to reduce the noise of the auxiliary information and use interaction attention to emphasize confidential details. 3. We integrate heterogeneous auxiliary information data in a unified framework. In the remaining parts of this paper, we first briefly review the related work on CTR prediction in Sect. 2. Then, the detailed DSTAN model is provided in Sect. 3. Next, Sect. 4 presents our experimental results and analyses. Finally, we conclude the paper in Sect. 5.
628
X.-L. Li et al.
2 Related Work It is worth noting that there are few research works [10–12, 14–16] that discuss spatial or temporal deep learning methods for the CTR task. DIN [14] introduced an attention mechanism to weight the user’s historical behavior based on the traditional deep learning recommendation system. DIEN [11] uses the auxiliary loss to obtain the current interest from the user’s historical behavior. It then uses the structure of the sequence model to capture the user’s interest over different times in order to improve the prediction accuracy. DSIN [16] is a session-based recommendation that divides the user’s historical behavior into multiple sessions as auxiliary information according to time. Self-attention is used to extract different interests and behaviors in the same session. The sequential click prediction for sponsored search using recurrent neural networks [10] approach evaluates how users have clicked on advertisements in the past, including which ads they clicked on, which ads they ignored, which ads they questioned, and how much time they spent on the ads [17]. According to these observations, the article [10] develops the framework of RNN [11] to increase the click-through rate. The Deep Position-wise Interaction Network for CTR Prediction [17] assumes that the position of advertisements will affect the click-through rate. For example, users are often more inclined to click on the advertisements in the top part. Therefore, the article will model the location of the ads. There are two drawbacks to the above deep learning models for CTR prediction. To begin, they either enumerate all feature interactions or require human intervention to identify critical feature interactions. Additional interactions might create unnecessary noise and make the training process more difficult. Second, they learn feature interactions using the same operation or network structure.
3 Deep Spatio-Temporal Attention Network As illustrated in Fig. 1, our model consists of several consecutive stages: feature representation, self-attention and interaction attention of features, fully connected layer and prediction. 3.1 Feature Representation DSTAN divides the features into three categories, namely univalent features (e.g., user ID), multivalent features (e.g. title), and numerical features (e.g., age). We use one-hot to represent [18] univalent features. We divide multivalent features into one-tuples or two-tuples to obtain a fixed-length aggregation vector after sum pooling. We also adopt Discretization [19] to map numerical features to embedding vectors. Here we introduce the symbols which refer in Fig. 1. The target to the features nc , the historical records are ad be xt ∈ RDt , the location-related ads are xci ∈ RDc i=1 nl nu , and the unclicked ads are xuq ∈ RDu q=1 . Additionally, nc , nl , nu are xij ∈ RDl j=1 the number of contextual ads, historical records, and unclicked ads, respectively, and D∗ (∗ ∈ {t, c, l, u}) is the dimension of the vectors. The representation x of each advertisement instance is obtained by concatenating all the characteristics.
Deep Spatio-Temporal Attention Network Prediction
629
FC layers
Sigmoid
Concat
Interaction Attention
Self-attention
Concat Concat Share Embeddings Layer
Targeted Ads
Contextual Ads
Clicked Ads
Unclicked Ads
Fig. 1. Model architecture diagram
3.2 Self-attention Model In the CTR task, some of the auxiliary information is useful for predicting the target advertisement, while some of the information is noise. For example, the user may click on some advertisements accidentally and these advertisements may affect accuracy of prediction. One problem that we need to solve is how to suppress the noise in the auxiliary information. Inspired by the great achievement in machine translation, we leverage self-attention [20] in the model to highlight more significant data (Figs. 2 and 3).
softmax
Fig. 2. Calculating the weights of the self-attention
630
X.-L. Li et al.
Fig. 3. After mathematical modelling
is obtained
Taking contextual ads as an example, the mathematical language modeling of contextual advertising is expressed as (1) where αci =
ci ) nexp(β , βci c i=1 exp(βci )
= f (xci )
(2)
Here we transform xci through f (·) into a function of the scalar βci . Then we can calculate the weight αci . From Eq. (1), we find that we can reduce the noise in the auxiliary advertisement through the weight αci . The model can reduce the influence of noise on the target advertisement. The f (·) we use in Eq. (2) can be a multilayer perceptron. 3.3 Interaction Attention Inspired by Attentional Factorization Machines [21] and Aspect-Level Sentiment Classification [22], we consider that different auxiliary information may have other effects on the target advertisement. For example, the contextual ad is a mobile phone or computer, and the user’s history contains a fruit. When the target advertisement is an apple, the history is that the contribution of the target advertisement is relatively significant, and the contribution of the contextual ad to the target advertisement is relatively small. To highlight the contribution of auxiliary information, we interact with various additional data with the target advertisement (Figs. 4 and 5). We model the aggregated vector of location-related adverts as follows: ∼ xc
=
n c
i=1 αci (xt , xci )xci , αci
= exp hT ReLU (Wtc [xt , xci ] + btc1 ) + btc2
(3)
Through Eq. (3), we can see that αci is a function of target advertising and contextual advertising. At the same time, we learn the weight αci through a multilayer perceptron
Deep Spatio-Temporal Attention Network
631
xt ReLU
α ci (x t , x ci )
x ci
MLP
Fig. 4. Calculate the weight of the interaction attention part
∼
Fig. 5. x c is obtained after aggregation.
with a hidden layer and the ReLU activation function. The weight αci can adjust the impact of contextual advertisements on target advertisements. Among them, in (3), h, Wtc , btc1 , btc2 is a parameters of the model. 3.4 Fully Connected Layer We use fully connected layers to integrate the information from Self-attention module and Interaction attention module. Here we use mathematical language to express the integration of information (4) ∼
∼
∼
v2 = Wt xt + Wc x c + Wl x l + Wu x u + b2
(5)
Here, Wt ∈ RDv ×Dt , Wc ∈ RDv ×Dc , Wl ∈ RDv ×Dl , Wu ∈ RDv ×Du is the weight matrix and b1 ∈ RDv , b2 ∈ RDv is the offset parameter. From the above equations, we can see that we use weights to fuse inputs with different information. We aggregate the auxiliary information in the self-attention module to
632
X.-L. Li et al.
get the auxiliary information in the interaction attention ; aggregate ∼ ∼ ∼ module to get m2 = xt , x c , x l , x u . We then integrate m1 and m2 to obtain m = [m1 , m2 ]. Finally, we get the following formula b = b1 + b2 , v = v1 + v2 , v = Wm + b
(6)
Among them, W ∈ RDv ×(Dt +Dc +Dl +Du ) the purpose of this is to simplify the model, connect all the representations as much as possible, and then use m to represent it. Next is to let m pass through several fully connected layers, where the fully connected layer is defined as follows z1 = ReLU(Wm + b) z2 = ReLU(W2 z1 + b2 ) ··· zL = ReLU(WL zL−1 + bL ) Wl and bl are the weight matrix and bias vector of the l th layer, respectively, and L is the number of hidden layers. Finally, we get the predicted click-through rate of the target advertisement through the sigmoid function.
y=
1 1+exp[−(WT zL +b)]
(7)
Here W and b are learning parameters and bias parameters. All the parameters of our model are known by minimizing the average logistic loss on the training set.
loss = − |Y1 | y∈Y ylogy + (1 − y)log 1 − y (8)
where y ∈ {0, 1} is the actual label of the target ad, and corresponds to the estimated click-through rate y, and Y is the set of tags.
4 Experiments In this section, we present our results on three public datasets. We conduct a comprehensive experiment to evaluate our proposed model and compare it to several of the most common CTR methods. 4.1 Datasets and Metrics We conduct experiments on three public datasets, namely the Avito1 , Taobao2 , and Amazon3 datasets. Avito comes from a sample of advertising information from the 1 https://www.kaggle.com/competitions/avito-context-ad-clicks. 2 https://tianchi.aliyun.com/dataset/dataDetail?dataId=56. 3 http://jmcauley.ucsd.edu/data/amazon/index_2014.html.
Deep Spatio-Temporal Attention Network
633
largest comprehensive classification website in Russia. We select the data from 201504-28 to 2015-05-18 as the training set, the data from 2015-05-19 as the validation set, and the data from 2015-05-20 as the test set. Taobao is a dataset of the click-through rate of a display advertisement provided by Alibaba. We also select one of the datasets of the Amazon product reviews and metadata: the grocery dataset. To measure the quality of the models, we use AUC and Logloss as evaluation metrics. AUC: The area under the ROC curve bounded by the coordinate axis is known as the AUC. It is widely employed in CTR prediction. The basic meaning of AUC is that the likelihood of the model correctly predicting a positive sample as a positive selection is larger than the chance of correctly predicting a negative sample as a positive selection for any combination of positive and negative samples. In addition, it is shown that the improvement of AUC will help the online CTR. Logloss: The log loss is computed based on Eq. (8). The smaller the loss on the test set, the better.
4.2 Parameter Settings During the experiment, the dimension of the embedding vector of each feature is set to 10. In addition, we consider the influence of the dimension and number of the fully connected layers on the experimental results. We keep the fully connected layer consistent with the comparison model, setting the number of fully connected layers to 3 and the dimensions to 512, 256, and 1, respectively. We set the dropout ratio to 0.5 [23]. Since both GRU and our model contain hidden layers, the concealed layer’s dimension was set at 128. All methods we use are implemented with TensorFlow and optimized using the Adagrad algorithm [24]. We choose a batch size of 128. 4.3 Effectiveness All results are summarized in Table 1. It can be seen that the AUC of proposed DSTAN demonstrates significant improvement than other baseline. This can prove that the use of Spatio-Temporal information in this paper, and the simultaneous use of self-attention and interaction attention play a role. The model can capture the user’s preference through temporal auxiliary details, that is, the advertisements that the user has clicked and the advertisements that have not been clicked; in addition, the additional spatial details, that is, the advertisements appearing on the same page as the target advertisement, will affect the user’s click on the target advertisement. From Table 1, we can see that under the Grocery dataset, the performance of DSTAN is better than that of DUMN. This means that the self-attention we use can reduce the noise of the data, and the interaction attention we use can better discover some helpful information. In DUMN, the similarity between users is found through the user matching layer, and then the potential interests are inferred. However, the DUMN model does not consider that what users need is personalized recommendations. For example, the products clicked by user A include beer, diapers, and milk; the products bonded by user
634
X.-L. Li et al. Table 1. Test AUC and Logloss on three public datasets
Algorithm
Avito
Taobao
Grocery
AUC
Logloss
AUC
RI
AUC
Logloss
LR [1]
0.7556
0.05918
0.6394
0.00%
-
-
FM [2]
0.7802
0.06094
-
-
-
-
SVD++ [25]
-
-
-
-
0.6385
0.8306
DNN [13]
0.7816
0.05655
-
-
-
-
Wide&Deep [5]
0.7817
0.05595
0.6408
0.22%
0.6823
0.6634
DeepFM [3]
0.7819
0.05611
-
-
-
-
CRF [26]
0.7722
0.05989
-
-
-
-
GRU [27]
0.7835
0.05554
-
-
-
-
DeepMCP [28]
0.7927
0.05518
-
-
-
-
PNN [8]
-
-
0.6415
0.33%
0.7033
0.6324
DIEN [11]
-
-
0.6420
0.41%
0.7141
0.6249
DMR [29]
-
-
0.6447
0.83%
-
-
GRU4Rec [30]
-
-
-
-
0.7949
0.5360
DUMN [31]
-
-
-
-
0.8107
0.5159
DSTAN
0.8117
0.0141
0.6483
1.39%
0.8134
0.5013
DSTAN-pooling
0.8083
0.0186
0.6432
0.594%
0.8053
0.5176
B include beer, milk, and potato chips. The similarity between user A and user B is very high, and DUMN will recommend it to user A based on the potato chips that user B has clicked on. However, what user A may want more are children’s toys. Therefore, the DSTAN proposed in this paper is based on the products that the user has clicked, and then uses self-attention to reduce noise, and uses interaction attention to calculate the clicks of the clicked and unclicked products on the target product, so that after two different attentions. By using the force mechanism, we can increase the click of the target product. Therefore, the model designed in this paper is more suitable for the user’s behavior, to be able to recommend the products they need more to the user. 4.4 Analysis of Results Self-attention removes noise in auxiliary information and emphasizes more critical information in additional input. Interaction attention enables better interaction between target advertisements and auxiliary advertisements, so that more information can be extracted. Finally, we spliced the information processed by self-attention and interaction attention to retain more information and make the model perform better. Attention removes noise in auxiliary input and emphasizes more critical information in additional input. Interaction attention enables better interaction between target advertisements and auxiliary
Deep Spatio-Temporal Attention Network
635
advertisements, so that more information can be extracted. Finally, we spliced the information processed by self-attention and interaction attention to retain more information and make the model perform better. In Table 1, we report the AUC and Logloss of different models in the Avito advertising dataset and Taobao advertising dataset. We can observe that the AUC of Wide&Deep is slightly higher than that of DNN and LR. Furthermore, DeepFM has higher AUC compared to FM and DNN. These results show that including the breadth part of the depth part in the model can improve the prediction ability of the model. We can also observe that on the Avito dataset, CRF performs much better than LR because CRF corrects LR predictions by the similarity to surrounding advertisements. GRU is significantly better than these models compared with LR, FM, DNN, Wide&Deep, and DeepFM. GRU is a recurrent neural network that can use clicked ads. The model DSTAN proposed in this paper is significantly better than GRU, because there may be some continuous behaviors in the user behavior sequence that are not related. For example, a user clicked on an ad for fruit two weeks ago, but more recently on an advertisement for clothes and shoes. The next click, then, might be an ad related to clothes and shoes. Some users’ behaviors are to click on the corresponding advertisements according to their own needs, and they will not be related to the previously clicked advertisements. Therefore, the ordering of user actions does not necessarily improve the performance of predictions. In addition, the model proposed in this paper uses information from contextual ads and unclicked ads to improve the prediction performance. After removing the self-attention and interaction attention modules, we rename the model to DSTAN-Pooling to express the role of self-attention and interaction attention. When we compare the performance between DSTAN and its variants, we can find that DSTAN outperforms its variants. This shows that the self-attention module can reduce information noise and better emphasize helpful information. Interaction attention can better interact with the target and auxiliary advertisements, thereby obtaining more information (Fig. 6).
2
3
4
AVITO
0.6784
0.6658
0.6611
0.6589
0.8127
0.8117
0.8111
0.8019
1
TAOBAO
Fig. 6. Testing the accuracy of different numbers of hidden layers on the Avito dataset and Taobao dataset
636
X.-L. Li et al.
4.5 Effect of the Number of Hidden Layers
Logloss
Considering the influence of the number of layers and dimensions of the fully connected layer on DSTAN, we set the parameters of the fully connected layer to 1 layer, respectively, and the corresponding size is 256; the related dimensions of the two layers are 512 and 256; the corresponding size of the three layers is 1024, 512 and 256. From the above figure, we can find that increasing the number of fully connected layers can achieve better performance, but adding more fully connected layers may cause a slight decline in model performance. The decrease in model performance may increase number of layers leading to an increase in the difficulty of model training.
0.512 0.511 0.51 0.509 0.508 0.507 0.506 128
256
512
1024
epoch
AUC
Fig. 7. Average loss at different epochs during testing on the Avito dataset
0.812 0.8118 0.8116 0.8114 0.8112 0.811 0.8108 0.8106 128
256
512
1024
epoch Fig. 8. Model accuracy at different epochs in the testing phase on the Avito dataset
From Figs. 7 and 8, we can see that as the number of epochs increases, the number of weight updates of the neural network also increases, and the curve changes from fitting to overfitting. In order to achieve the best effect of the model without overfitting, we can set the epoch to 256 when setting the parameters, so that the loss on the test set is the minimum value, and overfitting will not occur.
Deep Spatio-Temporal Attention Network
637
5 Conclusion In this paper, we propose DSTAN, a new model to solve the problem of CTR prediction. Compared with the commonly used CTR prediction models, we use auxiliary advertisements with spatial-temporal information. We use contextual ads to highlight targeted ads by capturing the users’ historical behavior using clicked and unclicked ads. Self-attention can reduce the noise in the auxiliary data and interaction attention can highlight which additional information is more beneficial for targeted advertisement. To retain the news as much as possible, we splice the data processed by these two attention mechanisms, and finally obtain forecast results. We conduct offline tests on the Avito dataset and Taobao dataset. We find that DSTAN is more effective than state-of-the-art methods. Acknowledgment. This work was supported by the grant of the University Natural Science Research Project of Anhui KJ2019A0835, in part by the grant of Hefei College Postgraduate Innovation and Entrepreneurship Project, No. 21YCXL20.
References 1. Richardson, M., Dominowska, E., Ragno, R.: Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th International Conference on World Wide Web (2007) 2. Rendle, S.: Factorization machines. In: 2010 IEEE International Conference on Data Mining. IEEE (2010) 3. Guo, H., et al.: DeepFM: a factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017) 4. Lian, J., et al.: xdeepfm: Combining explicit and implicit feature interactions for recommender systems. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018) 5. Cheng, H.T., et al.: Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 7–10, September 2016 6. Zhang, W., Du, T., Wang, J.: Deep learning over multi-field categorical data. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 45–57. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-30671-1_4 7. Wang, R., et al.: Deep & cross network for ad click predictions. In: Proceedings of the ADKDD 2017, pp. 1–7 (2017) 8. Qu, Y., et al.: Product-based neural networks for user response prediction. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE (2016) 9. Qu, Y., et al.: Product-based neural networks for user response prediction over multi-field categorical data, 37(1), 1–35 (2018) 10. Ouyang, W., et al.: Deep spatio-temporal neural networks for click-through rate prediction. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019) 11. Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019) 12. Xiao, Z., et al.: Deep multi-interest network for click-through rate prediction. In: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (2020)
638
X.-L. Li et al.
13. Covington, P., Adams, J., Sargin, E.: Deep neural networks for Youtube recommendations. In: Proceedings of the 10th ACM Conference on Recommender Systems (2016) 14. Zhou, G., et al.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018) 15. Xu, W., et al.: Deep interest with hierarchical attention network for click-through rate prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020) 16. Feng, Y., et al.: Deep session interest network for click-through rate prediction. arXiv preprint arXiv:1905.06482 (2019) 17. Huang, J., et al.: Deep position-wise interaction network for CTR prediction. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1885–1889, July 2021 18. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013) 19. Guo, H., et al:. An embedding learning framework for numerical features in CTR prediction. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2021) 20. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) 21. Xiao, J., et al.: Attentional factorization machines: learning the weight of feature inter actions via attention networks. arXiv preprint arXiv:1708.04617 (2017) 22. Ma, D., et al.: Interactive attention networks for aspect-level sentiment classification. arXiv preprint arXiv:1709.00893 (2017) 23. Srivastava, N., et al.: Dropout: a simple way to prevent neural networks from overfit ting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 24. Duchi, J.C., Hazan, E., Singer, Y.J.J.o.M.L.R.: Adaptive subgradient methods adaptive subgradient methods for online learning and stochastic optimization, 12, 2121–2159 (2011) 25. Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2008) 26. Xiong, C., et al.: Relational click prediction for sponsored search. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining (2012) 27. Chung, J., et al.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014) 28. Ouyang, W., et al.: Representation learning-assisted click-through rate prediction. arXiv preprintarXiv:1906.04365 (2019) 29. Lyu, Z., et al.: Deep match to rank model for personalized click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020) 30. Hidasi, B., et al.: Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015) 31. Huang, Z., Tao, M., Zhang, B.: Deep user match network for click-through rate prediction. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)
A Unified Graph Attention Network Based Framework for Inferring circRNA-Disease Associations Cun-Mei Ji1(B) , Zhi-Hao Liu1 , Li-Juan Qiao1 , Yu-Tian Wang1 , and Chun-Hou Zheng2(B) 1 School of Cyber Science and Engineering, Qufu Normal University, Qufu, China
[email protected]
2 School of Artificial Intelligence, Anhui University, Hefei, China
[email protected]
Abstract. Researchers have identified a large number of circular RNAs (circRNAs). More and more studies have shown that circRNAs play crucial roles in regulating gene expression and function in distinct biological processes. CircRNAs are highly stable and conservative, which are suitable used as diagnostic biomarkers for many human diseases. However, experimental verification of relationship between circRNAs and diseases is time-consuming and laborious, resulting in few known associations. The computational methods have subsequently been introduced to predict potential disease related circRNAs. Existing methods are not good enough in feature extraction and prediction performance. In this paper, we design a unified Graph Attention Network (GAT) framework to infer unknown circRNA-disease links. Our method unifies the feature extraction encoder and predictive decoder. To be specific, GAT based encoder with additional information are applied to learn representations of circRNAs and diseases. Then, several decoders, such as dot decoder, bilinear decoder, Neural Networks (NN) based decoder are implemented as predictors. Furthermore, we conduct detailed experimental analysis on the benchmark datasets based on 5-fold cross-validation. The evaluation metrics, including accuracy, precision, F1 score, AUC and AUPR values, and case studies of experimental results indicate that our method is efficient and robust for predicting circRNA-disease associations. Keywords: circRNA · Disease · circRNA-disease association prediction · Graph attention networks
1 Introduction In recent years, an increasing number of studies have shown that circular RNAs (circRNAs), a subgroup of noncoding RNAs, play a crucial role in regulating gene expression and function in distinct biological processes [8, 31]. With the development of sequencing technology, researchers have been found that circRNAs have stable closed structures without 5 and 3 polyadenylation tails [13]. Due to these properties, they can participate © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 639–653, 2022. https://doi.org/10.1007/978-3-031-13832-4_52
640
C.-M. Ji et al.
in the regulation of various signaling pathways and gene expression, which are closely related to various diseases such as diabetes [28], cancers [14], and neurological disorders [6]. Therefore, circRNAs are potential targets for disease treatment and drug discovery. However, biological experiments for circRNA-disease associations are expensive and time-consuming. As we all known, biological data has become more and more abundant in recent years, and effective computing models can replace small-scale biological experiments. The establishment of disease-circRNA association databases such as circRNADisease [30] and CircR2Disease [3] provides a prerequisite for computer simulation of biological experiments. These current intelligent computational methods can be divided into two categories: complex network-based methods and machine learning-based methods [16]. The first category of approaches is to construct heterogeneous networks from a large number of known relationships for predicting circRNAs-disease associations. For example, Lei et al. [10] constructed a heterogeneous network consisting of three subnetworks. Each circRNA-disease pair was determined based on the association score of the pathways connecting them in a heterogeneous network whether the circRNAdisease pairs were related. Fan et al. [4] constructed a heterogeneous network and used the KATZ model to predict the relationship between human circRNA and disease. Qiu et al. [25] developed a model named MRLDC to predict disease associated circRNAs. They constructed a heterogeneous circRNA-disease bilayer network and proposed a weighted low-rank approximation optimization framework. Ge et al. [7] used LocalityConstrained Linear Coding (LLC) on the known association matrix to reconstructed similarity networks. Then, they used the label propagation method to get four relevant score matrices. The second category of machine learning-based methods to learn the association between circRNAs and diseases. Machine learning can be classified to four types, supervised machine learning, unsupervised machine learning, semisupervised machine learning and reinforcement machine learning. Supervised methods or unsupervised methods are often used to mine the deep features of circRNA and disease data. For example, Yan et al. [27] used DWNN (decreasing weight k-nearest neighbor) method is adopted to calculated the initial relational score for new circRNAs and diseases. And then, they used regular least square method based on the Kronecker product to predict the novel circRNA-disease association. Wang et al. [21] proposed a model based on the deep Generative Adversarial Network (GAN) algorithm combined with the multi-source similarity information to predict the circRNA-disease association. GAN is used to extract the hidden features of fusion information objectively and effectively in the way of confrontation learning. Qiu et al. [26] proposed a model named iCDA-CMG which employed a graph learning model based on collective matrix completion to prioritize the most promising disease-associated circRNAs. Wang et al. [18] used convolutional neural network (CNN) to get the deep feature from circRNA similarity network and disease semantic similarity network. In addition, prediction models based on machine learning is utilized to search for predicting circRNA-disease associations. Zeng et al. [12] used multi-layer neural networks to extract the non-linear associations and grasp the complex structure of data. It uses a projection layer to automatically learn latent representations for circRNAs and diseases, taking into account both explicit and implicit feedback. Wei et al. [22] proposed
A Unified Graph Attention Network
641
a model named iCircDA-MF to extract cicRNA and disease similarity features from disease semantic information, circRNA information and gene disease information. And then, they calculated the circRNA-disease association scores through matrix decomposition. However, these methods have certain limitations. On the one hand, most prediction models use only a part of RNA and disease information and can be affected because of scattered biological data, which cannot fully extract the complex feature between circRNAs and disease. On the other hand, traditional matrix factorization methods only learn linear features between RNA and disease. But the true association between RNA and disease is too complex for MF to capture this association. Recently, Graph Neural Networks (GNNs) [32] are proposed for extracting node features from their neighborhood, which are efficient and powerful, wildly applied in biological networks, e.g. predicting protein interface, analysis of molecular mechanism, drug design and so on [24]. GNN based methods have been used to predict potential circRNA-disease associations. Li et al. introduced a DeepWalk based method to learn hidden node representations, followed by a Network Consistency Projection (NCP) for prediction [11]. GCNCDA uses FastGCN to capture graph information and inferences by a Forest classifier with penalizing attributes, achieves AUC value of 0.9090 on circR2Disease dataset [19]. Bian et al. proposed a graph attention (GAT) based method for feature extraction, applied a dot decoder to calculate scores, and obtained AUC value of 0.9011 [2]. In this paper, we propose an efficient GAT based method to predict unknown links between circRNAs and diseases. We design a unified GAT based framework that makes it easy and convenient to use different additional information and predictors. Our framework consists of three parts: data processing, graph feature learning, and prediction. Specially, in data processing step, additional information, such as disease ontology, circRNA sequence, and known circRNA-disease associations are collected and calculated for disease and circRNA similarities. After that, we apply GAT based feature extraction to learn the representation of circRNAs and diseases. Furthermore, we introduce several decoders for predicting the potential links between cicRNAs and diseases. Extensive experimental results show that our approach obtains better inferring performance, compare with other methods, on the benchmark dataset including circRNADisease and CircR2Diseae. Our contribution is as follows. • We propose a unified GAT based computational framework to infer potential circRNAdisease associations, which can easily use different additional information and decoders. • We perform essential experiments and case studies to evaluate our proposed method, quantitatively analysis the effect of different additional information and decoders on predictive performance. The results show the prediction performance of our method is superior than other state-of-the-art methods.
2 Materials and Methods 2.1 Overview In this paper, we propose a unified GAT based framework, which can extract node features with rich graph information with different additional information, and can be followed
642
C.-M. Ji et al.
by different decoders. We quantitatively analyze the impact of different components. The flowchart of our proposed method is shown in Fig. 1. We first construct the circRNAdisease heterogeneous graph according to the known circRNA-disease associations. Then, different types of similarities of circRNA and disease are calculated by additional information, such as disease ontology, known circRNA-disease pairs. These similarities can be easily taken into our framework as initial node features. Furthermore, we design a universal GAT based node feature extraction from the heterogeneous graph. Finally, we propose several decoders to predict the potential disease-related circRNAs.
Fig. 1. The flowchart of the proposed GAT based framework for predicting circRNA-disease associations. Our computational framework consists of three steps: (1) data collection and preprocessing; (2) GAT based feature extraction of circRNA and disease; (3) Several decoders are introduced to infer the potential links between circRNAs and diseases.
2.2 Graph Construction According to the known circRNA-disease relationship, we can construct a heterogeneous graph. Let A ∈ Rnc∗nd be the known correlation matrix, nc and nd are the number of circRNA nodes and the number of disease nodes, respectively. Suppose that G = (V, E, X) is denoted as the circRNA-disease graph, where V is the node set of circRNAs and diseases, it can be denoted as V = {c1 , c2 . . . cnc , d1 , d2 . . . dnd }. E is the set of edges, and each edge denotes a known circRNA-disease pair. X ∈ RN ∗d is the initial node features, where N = nd + nc denotes the sum of circRNAs and diseases, d is the dimension size of node features. Additional information, such as disease ontology, circRNA sequencing can be directly or indirectly considered as initial vertex features.
A Unified Graph Attention Network
643
2.3 Feature Preprocessing Disease Semantic Similarity Disease ontology (DO) (available at https://disease-ontology.org) represent human diseases with semantic descriptions. The triplet denotes two diseases has a child-parent relationship, which can be used for constructing directed acyclic graphs (DAGs). According to previous work [17], semantic scores between diseases are calculated by the similarity DAGs. Given a disease pair (di ,dj ), we of their define the semantic similarity as SD di , dj by the following equation.
d ∈Td ∩Td i j
SD di , dj =
Ddi (d )+Ddj (d )
DV (di )+DV (dj )
(1)
where Tdi denotes nodes in its DAG. Ddi (d ) represents the contribution value of di to d , and can be defined as follows: 1, if d = di Dd (di ) = (2) max Dd di |di ∈ children of di , if d = di DV (di ) =
d ∈Tdi
Dd d is the semantic value of disease di .
CircRNA Functional Similarity Wang proposed a computational method to measure functional similarity between two circRNAs by the number of co-related diseases [17]. We introduce this method to calculate the functional similarity between circRNAs. Let ci and cj be two circRNAs, we define the functional similarity FS ci , cj as follows: FS ci , cj =
d∈Dj
S(d ,Di )+ d ∈D SD(d ,Dj )
|Di |+|Dj |
i
(3)
where Di and Dj are disease sets that related to circRNA ci and cj , and | · | denotes the number of disease set. S(d , Di ) = max (SD(d , dk )) is the max semantic similarity dk ∈Di
between disease d and disease set Di . Gaussian Interaction Profile Similarity We observe that FS and SD are sparse matrices, and if only these similarities are used as circRNA and disease features, the final predictive performance may suffer. We apply Gaussian interaction profile (GIP) kernel method to compute similarities. GIP circRNA and disease similarities can be defined as follow. (4) GC ci , cj = exp −∇c Ai· − Aj· 2 GD di , dj = exp −∇d A·i − A·j 2
(5)
where A denotes known circRNA-disease matrix. Ai· , A·j denote the i-row and j-column of A. ∇c and ∇d are the normalization factors.
644
C.-M. Ji et al.
2.4 GAT Based Node Feature Learning In this section, we introduce a Graph attention network (GAT) [15] based method to learn graph structure information and update node features. Given i-th node in the graph G, the l-th graph layer convolution operation can be defined as follows:
(l+1) (l) (l) xi = σ αi,i W (l) xi + αi,j W (l) xj , i = 1, 2 . . . , N (6) j∈N (i)
(l)
(l+1)
where xi and xi are the input and output of l-th layer, W (l) is the learnable weight of the l-th layer. σ (·) denotes the non-linear activation function, such as ReLU: x if x > 0 ReLU (x) = (7) 0 if x ≤ 0 N (i) represents the neighborhood of the i-th node. |N (i)| denotes the degree of i-th (l) (l) node as the normalized constant. αi,i and αi,j are the attention coefficients and can be computed as follows: (8) where σ (·) is activation function, such as LeakyReLU function. denotes one layer neural network and ||represents concatenation operation. GAT can effectively aggregate information from their neighbors, and the multi-heads mechanism further enhances learning ability, K heads operator can be defined by the following equation.
K
(l+1) (l) (l) (l) (l) =σ αi,j W xj αi,i W xi + , i = 1, 2 . . . , N (9) xi k=1
where
j∈N (i)
denotes multi-heads feature fusion operator such as average or concatenation.
We performing L-th layer graph convolution, and obtain the final node features and denote as X(L) ∈ RN ∗d . For convenience, we use X to represent X(L) in the following sections. 2.5 circRNA-Disease Association Predictors After training step, hidden node representations ae extracted from the heterogeneous network. We then design a predicting framework, including several score decoders, such as dot decoder, bilinear decoder, and neural networks (NN) based decoder, are used to calculate how much a circRNA-disease pair related.
A Unified Graph Attention Network
645
Dot Decoder Let xi and xj to represent a circRNA ci and disease dj , we define the dot score function as follows: (10) yˆ ij = sigmoid xi · xj where yˆ ij denotes the predicting score. sigmoid function is kind of activate function to limit the score ranging from 0 to 1, and can be written as sigmoid (x) = 1/ 1 + e−x . Bilinear Decoder We also introduce a bilinear decoder to calculate the predictive score, and defined as follows. (11) yˆ ij = sigmoid xi W xj where W ∈ Rd ∗d is a learnable parameter, NN Based Decoder Neural networks (NN) has achieved great success in many fields [1]. We implement a NN based decoder to output the predicting score. (12) yˆ ij = NN concat xi , xj Let z = concat xi , xj be the input of NN model, then l-layer output can be represented by following equation: l zl + bl , zl = Wnn nn
(13)
l and bl are weight and bias of l-layer in the NN model. where Wnn nn In training step, known associations between circRNA and disease are labeled as positive samples, while the same number unknown links are selected randomly as negative samples. We define T + and T − to denote positive samples and negative samples, and then use cross-entropy loss function to train our model. LOSS = − yij log yˆ ij + 1 − yij log 1 − yˆ ij + γ ||θ ||2 (14) + − (i,j)∈T ∪T
where yij and yˆ ij are ground truth labels, and predicting score, respectively. γ is a hyperparameter, and θ represents the parameters of our model. || · || is the Frobenius-norm of the parameters. We use Adam [9] algorithm as the optimizer of the loss.
3 Results 3.1 Benchmark Datasets To evaluate the prediction performance of our method, we introduce two wildly used circRNA-disease verification datasets, CircR2Disease [3] and circRNADisease [30]. CircR2Disease can be downloaded from http://bioinfo.snnu.edu.cn/CircR2Dis ease, while circRNADisease is available at http://cgga.org.cn:9091/circRNADisease/. After filtering invalid and redundant non-human data, CircR2Disease contains 585 circRNAs, 88 diseases and 650 interactions, and circR2Disease has 332 known associations between 313 circRNAs and 40 diseases. The benchmark datasets for evaluation are shown in Table 1.
646
C.-M. Ji et al. Table 1. Details of two benchmark datasets. Datasets
#circRNAs
#diseases
#associations
circR2Disease [3]
585
88
650
circRNADisease [30]
313
40
332
3.2 Evaluation Setup and Metrics In this study, large experiments are conducted to evaluate our proposed method. We implemented a 5-fold cross validation (5-CV) method on the above two benchmark datasets, and chose several evaluation metrics for model comparison and analysis. Specially, 5-CV method randomly splits all known circRNA-diseases pairs into five parts. In each round, the four parts are considered as positive samples, and the same number of unknown pairs that are randomly selected are used as negative samples, and then used for training. While the left part and other unknown pairs are used as test sets. Our method was developed in PyTorch and Pytorch Geometric v1.6 [5], and the experiments was run on a Tesla V100 GPU. We set the node feature dimension to 128, and 3 GAT layers with 4 heads of each GAT layer as default. Each test step, standard evaluation metrics, such as Accuracy (Acc), Precision, F1Score, Recall are used to evaluate the performance of our method.In addition, receiver operating characteristic (ROC) curve, precision-recall (PR) curve, and areas under ROC and PR are respectively denoted as AUC and AUPR to measure the overall performance. 3.3 Quantitative Analysis of Our Proposed Method To quantitatively evaluate the impact of additional information and different decoders on the performance of our method, we validate on combination of different types of features and decoders, and conduct 5-fold cross-validation (5-CV) for 10 times on two benchmark datasets. To be specific, we design three types of combinations as initial node features: (1) disease semantic similarity and circRNA functional similarity; (2) GIP similarities of circRNA and disease; (3) the integration of above two similarities, and can be written as follows. SDdi , dj if SD di , dj = 0 (15) XD di , dj = otherwise GD di , dj FS ci , cj if SD ci , cj = 0 XC ci , cj = (16) otherwise GC ci , cj where XD and XC are integrated similarities for disease and circRNA. Then, we experimented 5-CV with these different type of these features and different decoders on two benchmark datasets.
A Unified Graph Attention Network
647
Table 2. Quantitative analysis of effect of different type of node features and decoders in our proposed method on CircR2Disease dataset. FS, SD, GC, GD denote circRNA functional similarity, disease semantic similarity, and the respective GIP similarities. Decoder Features
Acc (%)
Dot
FS & SD
70.07 ± 2.65 56.03 ± 3.46
90.83 ± 4.18
75.20 ± 2.28
GC & GD
69.33 ± 2.56 67.37 ± 6.42
94.98 ± 2.35
75.61 ± 1.75
FS + DC & SD + 71.34 ± 2.65 66.36 ± 4.75 GD
98.40 ± 1.50
77.47 ± 1.70
Bilinear
Precision (%) Recall (%)
FS & SD
62.52 ± 5.40 54.46 ± 3.80
73.88 ± 10.24 66.09 ± 6.51
GC & GD
70.92 ± 2.94 74.17 ± 8.38
98.26 ± 1.23
FS + DC & SD + 77.05 ± 7.29 82.68 ± 9.29 GD NN
F1-Score (%)
77.20 ± 1.80
87.95 ± 20.05 77.68 ± 15.15
FS & SD
73.02 ± 5.51 77.76 ± 4.47
68.52 ± 11.19 71.35 ± 7.22
GC & GD
83.78 ± 6.30 84.61 ± 4.09
79.18 ± 12.54 82.49 ± 8.19
FS + DC & SD + 84.81 ± 3.46 85.77 ± 4.19 GD
84.37 ± 6.15
84.66 ± 3.82
Table 3. Quantitative analysis of effect of different type of node features and decoders in our proposed method on circRNADisease dataset. Decoder Features
Acc (%)
Dot
FS & SD GC & GD
Bilinear
NN
Precision (%)
Recall (%)
F1-Score (%)
74.05 ± 5.36 58.15 ± 5.32
94.13 ± 5.51
78.43 ± 4.27
72.81 ± 3.86 76.79 ± 8.66
96.63 ± 2.36
78.09 ± 2.74
FS + DC & SD + 72.76 ± 4.11 70.65 ± 10.50 98.80 ± 1.39 GD
78.46 ± 2.55
FS & SD
71.87 ± 7.79 62.17 ± 6.77
81.16 ± 13.13 73.88 ± 8.76
GC & GD
73.85 ± 3.93 79.15 ± 9.42
98.52 ± 1.72
79.10 ± 2.44
FS + DC & SD + 77.80 ± 6.38 73.90 ± 16.09 97.66 ± 3.16 GD
81.76 ± 4.19
FS & SD
75.77 ± 7.53 83.64 ± 6.83
73.19 ± 14.56 74.47 ± 9.77
GC & GD
89.93 ± 2.91 88.65 ± 4.81
92.72 ± 3.90
90.21 ± 2.80
FS + DC & SD + 86.30 ± 4.15 89.96 ± 4.76 GD
93.66 ± 4.21
87.29 ± 3.63
For each combination, we performed 5-CV 10 times, and the results are shown in Tables 2 and 3. We can see that the integration features in these three decoders bring the better predictive performance, and using the first type of features results in the worst predictor. This illustrate that the sparse matrices, such as FS and SD used as node features would limit the final node representation, while dense but not accurate enough
648
C.-M. Ji et al.
GIP similarities can bring in noise information. The integration similarities can avoid too sparsely or noisy, and obtain the best final performance. We conduct 5-CV on the benchmark datasets with different decoders, including Dot predictor, Bilinear predictor, and NN predictor. NN based predictor achieves the three best of four metrics on both two datasets, according to Tables 2 and 3. The comparison of different decoders indicates that neural network has more powerful ability for prediction potential associations between circRNAs and diseases. To further illustrate the predictive performance of our method, we first define our baseline approach as using integrated similarities and a NN based decoder. Then, we perform 5-CV on CircR2Disease and circRNADisease datasets, respectively, and figure the ROC curves, calculate the AUC value and AUPR value. The results are shown in Figs. 2 and 3. We can see that our baseline method can achieve AUC of 0.9178, AUPR of 0.8757 on CircR2Disease dataset, and AUC of 0.9343, AUPR of 0.9058 on CircR2Disease dataset.
Fig. 2. AUC and AUPR curves for our method based on 5CV on CircR2Disease dataset.
Fig. 3. AUC and AUPR curves for our method based on 5CV on circRNADisease dataset.
3.4 Impact of Data Leak As we discussed in Sect. 3.3, the choice of initial features has a big impact to the final performance. Reviewing the of 5-CV process, we divide all known circRNA-disease
A Unified Graph Attention Network
649
pairs into five parts and train the model with four parts. Note that the initial features such as circRNA functional similarity, GIP similarities should be re-calculated. Otherwise, the training data will contain the features of test samples, and resulting in data leak. Therefore, overestimated metrics do not reveal the real ability of the methods. We implement a comparative experiment with and without data leak on CircR2Disease dataset. We define two models as follows. With-Data-Leak. We use baseline model, each round in 5-CV, the initial features are integrated similarities of circRNA and disease, which are calculate from all known associations. Notice that these similarities are only calculate only once. Without-Data-Leak. The same baseline model is used, unlike the with-data-leak method, we re-calculate FS, GC, and GD in each round of 5-CV. As shown in Fig. 4, AUC and AUPR values of with-data-leak model are higher than those of the without-data-leak. To be specific, AUC is almost 0.06 higher and AUPR is 0.07 higher if we use with-data-leak model. The reason could be that the data leak will cause the model to remember the test samples, and achieve spurious scores in these metrics.
Fig. 4. AUC and AUPR values of data leak experiment based on 5 CV on circR2Disease dataset
3.5 Performance Comparison with Other Methods We perform 5-CV of our proposed method and compare it with other existing computational methods. As we analyzed in Sect. 3.4, the integrated similarities with NN based decoder obtain the best predictive performance, we apply this as our baseline predictor. Due to the different metrics are used in previous studies, we choose AUC as the quantitative comparison indicator. Previous methods such as KATZHCDA [4], PWCDA [10], GCNCDA [20], GATCDA [2]. We use commonly used metric AUC to compare with
650
C.-M. Ji et al.
other method. As we can see from Table 4, our method achieves better performance than other state-of-the-art method. Especially, GCNCDA and GATCDA are two GNN based methods. Table 4. Performance comparison with other methods. Methods
AUC
KATZHCDA [4]
0.7936
PWCDA [10]
0.8900
GCNCDA [20]
0.9090
GATCDA [2]
0.9011
Our method
0.9178
3.6 Case Study For further demonstrating the predictive performance of our proposed method, we carry out case study for gastric cancer. Case study is implemented based on CircR2Disease dataset, and all known associations are used as positive samples. All unknown associations are then used as candidates for calculating the associated scores. Table 5. Top-20 predicted circRNAs for gastric cancer based on CircR2Disease dataset. E1, E2, E3 denote CircR2Disease, circRNADisease and circAtlas, respectively. Rank
circRNAs
Evidence
1
hsa_circ_0014717
E1; E2; E3
2
circpvt1/hsa_circ_0001821
E1; E3
3
hsa_circ_0001313/circccdc66
PMID: 31173287
4
hsa_circ_0001649
E1
5
circrna_000167/hsa_circrna_000167/hsa_circ_0000518
unknown
6
hsa_circ_0000096/circhiat1/hsa_circ_001013
E1; E3
7
hsa_circ_0003159
E1; E2; E3
8
hsa_circ_0074362
E1; E2; E3
9
hsa_circ_0001017
E1; E2
10
hsa_circ_0061276
E1; E2; E3
11
circ-zfr
E1
12
circrna0047905/hsa_circ_0047905
E1; E3 (continued)
A Unified Graph Attention Network
651
Table 5. (continued) Rank
circRNAs
Evidence
13
circrna0138960/hsa_circ_0138960
E1
14
hsa_circ_0000181
E1; E2; E3
15
hsa_circ_0085616
E1; E2; E3
16
hsa_circ_0006127
E1; E2; E3
17
hsa_circ_0000026
E1; E2
18
hsa_circ_0000144
E1; E2
19
hsa_circ_0032821
E1; E2; E3
20
hsa_circ_0005529
E1; E2; E3
We select the top 20 candidate circRNAs for manually validation on evaluation datasets, including CircR2Disease, circRNADisease and circAtlas [23], and literatures in PubMed. The results are shown in Table 5. We can see that 18 of 20 candidates are found in these three evaluation datasets. One unknown candidate circ-ccdc66 can accelerate gastric cancer by binding to miRNA target [29].
4 Discussion and Conclusion CircRNAs have been wildly identified and revealed that they are involved in various human diseases. Furthermore, circRNAs are highly tissue specific and stable, which are suitable as biomarkers for diagnosis and treatment. In this paper, we present a unified GAT based framework to extract representations of circRNAs and diseases with rich graph structure information. Additional information such as disease semantic similarity, circRNA functional similarity and the respective GIP similarities are convenient used as initial features. Meanwhile, we design several decoders for predicting disease related circRNAs. The quantitative analysis of our method shows that additional information and different decoders have different effects on the final predictive performance. Experimental results show that our baseline method achieves superior prediction performance than other methods. Although various similarities are calculated and integrated as initial node features, there is still some additional information such as circRNA sequences, circRNA-miRNA association and disease knowledge graph could be introduced to improve our prediction performance. In the future, we will attempt to integrate more additional information to enhance the representation of circRNAs and diseases. Furthermore, combining edge based graph convolution with node based graph convolution may improve the prediction performance of our model. Funding. This work was supported by the National Natural Science Foundation of China (grant numbers 61873001, U19A2064), the Natural Science Foundation of Shandong Province (grant number ZR2020KC022), and the Open Project of Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University (grant number MMC202006).
652
C.-M. Ji et al.
References 1. Bengio, Y., et al.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013) 2. Bian, C., et al.: GATCDA: predicting circRNA-disease associations based on graph attention network. Cancers (Basel) 13(11), 2595 (2021) 3. Fan, C., et al.: CircR2Disease: a manually curated database for experimentally supported circular RNAs associated with various diseases. Database 2018(2018), 1–6 (2018) 4. Fan, C., et al.: Prediction of circRNA-disease associations using KATZ model based on heterogeneous networks. Int. J. Biol. Sci. 14(14), 1950–1959 (2018) 5. Fey, M., Lenssen, J.E.: Fast graph representation learning with PyTorch geometric. arXiv. 1, 1–9 (2019) 6. Floris, G., et al.: Regulatory role of circular RNAs and neurological disorders. Mol. Neurobiol. 54(7), 5156–5165 (2017) 7. Ge, E., et al.: Predicting human disease-associated circRNAs based on locality-constrained linear coding. Genomics 112(2), 1335–1342 (2020) 8. Holdt, L.M., Kohlmaier, A., Teupser, D.: Molecular roles and function of circular RNAs in eukaryotic cells. Cell. Mol. Life Sci. 75(6), 1071–1098 (2017). https://doi.org/10.1007/s00 018-017-2688-5 9. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015) 10. Lei, X., et al.: Pwcda: path weighted method for predicting circrna-disease associations. Int. J. Mol. Sci. 19(11), 1–13 (2018) 11. Li, G., et al.: NCPCDA: network consistency projection for circRNA-disease association prediction. RSC Adv. 9(57), 33222–33228 (2019) 12. Lu, C., et al.: Deep matrix factorization improves prediction of human circRNA-disease associations. IEEE J. Biomed. Heal. Inform. 25(3), 891–899 (2020) 13. Memczak, S., et al.: Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441), 333–338 (2013) 14. Patop, I.L., Kadener, S.: circRNAs in cancer. Curr. Opin. Genet. Dev. 48, 121–127 (2018) 15. Veliˇckovi´c, P., et al.: Graph attention networks. In: 6th International Conference on Learning Representations. ICLR 2018 – Conference on Track Proceedings, pp. 1–12 (2018) 16. Wang, C.C., et al.: Circular RNAs and complex diseases: from experimental results to computational models. Brief. Bioinform. 22(6), 1–27 (2021) 17. Wang, D., et al.: Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 26(13), 1644–1650 (2010) 18. Wang, L., et al.: An efficient approach based on multi-sources information to predict circRNAdisease associations using deep convolutional neural network. Bioinformatics 36(13), 4038– 4046 (2020) 19. Wang, L., et al.: GCNCDA: a new method for predicting circRNA-disease associations based on graph convolutional network algorithm. PLoS Comput. Biol. 16, 5 (2020) 20. Wang, L., et al.: GCNCDA: a new method for predicting circRNA-disease associations based on graph convolutional network algorithm. PLoS Comput. Biol. 16(5), 1–19 (2020) 21. Wang, L., et al.: Predicting circRNA-disease associations using deep generative adversarial network based on multi-source fusion information. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 145–152 IEEE (2019) 22. Wei, H., Liu, B.: iCircDA-MF: identification of circRNA-disease associations based on matrix factorization. Brief. Bioinform. 21(4), 1356–1367 (2019) 23. Wu, W., et al.: CircAtlas: an integrated resource of one million highly accurate circular RNAs from 1070 vertebrate transcriptomes. Genome Biol. 21(1), 1–14 (2020)
A Unified Graph Attention Network
653
24. Xia, Y., et al.: GraphBind: protein structural context embedded rules learned by hierarchical graph neural networks for recognizing nucleic-acid-binding residues, 49, 9 (2021) 25. Xiao, Q., et al.: Computational prediction of human disease-associated circRNAs based on manifold regularization learning framework. IEEE J. Biomed. Heal. informatics. 23(6), 2661– 2669 (2019) 26. Xiao, Q., Zhong, J., Tang, X., Luo, J.: iCDA-CMG: identifying circRNA-disease associations by federating multi-similarity fusion and collective matrix completion. Mol. Genet. Genom. 296(1), 223–233 (2020). https://doi.org/10.1007/s00438-020-01741-2 27. Yan, C., et al.: DWNN-RLS: regularized least squares method for predicting circRNA-disease associations. BMC Bioinform. 19(19), 73–81 (2018) 28. Yan, Q., He, X., Kuang, G., Ou, C.: CircRNA cPWWP2A: an emerging player in diabetes mellitus. J. Cell Commun. Signal. 14(3), 351–353 (2020). https://doi.org/10.1007/s12079020-00570-7 29. Yang, M., et al.: Circ-CCDC66 accelerates proliferation and invasion of gastric cancer via binding to miRNA-1238-3p. Eur. Rev. Med. Pharmacol. Sci. 23(10), 4164–4172 (2019) 30. Zhao, Z., et al.: CircRNA disease: a manually curated database of experimentally supported circRNA-disease associations. Cell Death Dis. 9(5), 4–5 (2018) 31. Zheng, Q., et al.: Circular RNA profiling reveals an abundant circHIPK3 that regulates cell growth by sponging multiple miRNAs. Nat. Commun. 7, 1–13 (2016) 32. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1(September 2020), 57–81 (2020)
Research on the Application of Blockchain Technology in the Evaluation of the “Five Simultaneous Development” Education System Xian-hong Xu, Feng-yang Sun(B) , and Yu-qing Zheng Zhengzhou Normal University, Zhengzhou 450055, China [email protected]
Abstract. Guided by “five simultaneous development”, this paper analyzes the social environment and policy factors involved in the comprehensive quality evaluation of primary and secondary schools from the perspective of education evaluation; The integrated development mechanism of the “five educations” and the comprehensive quality evaluation of primary and secondary schools based on achievement evaluation, occupational aptitude test and student development potential evaluation, reform ideas to solve the problems of personal choice and educational equity in the process of basic education, and build a compulsory education stage. The whole-process, whole-course, and whole-staff collaborative education system cultivates qualified builders and successors for the socialist cause. Keywords: Internet accelerated speed · Blockchain technology · Information security · Evaluation of “Five Education Simultaneously” education system
1 Research Background “Internet + education” is the general trend of global education development and reform. With the continuous deepening of education informatization, big data in the field of education has gradually formed and begun to be widely used. Informatization teaching will definitely become the dominant trend of future education. As a province with a large population and a large education province, Henan Province has gathered a large amount of educational information. How to better and rationally tap the value of educational information resources is a problem that needs to be overcome urgently. In addition, the outbreak of the new crown epidemic at the end of 2019 has accelerated the process of education informatization, prompting the rapid development of online education, and online education has officially become an important part of teaching. The proposal of the “double reduction” policy also puts forward higher requirements for students’ own learning ability and teachers’ teaching ability. Blockchain technology can well solve the pain points of the traditional education industry by taking advantage of its characteristics of distributed storage and immutability of information. The rapid rise and development of blockchain technology has brought new opportunities for educational evaluation, analysis and application of educational © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 654–661, 2022. https://doi.org/10.1007/978-3-031-13832-4_53
Research on the Application of Blockchain Technology
655
data, and improvement of teaching technology. Blockchain is considered to be one of the technologies with the most potential to reconstruct education in the 21st century. Its essence is a distributed database. The sending, receiving, transmitting, auditing and saving of data are all based on distributed data structures. A powerful tool for growth, educational evaluation. The subject of education evaluation can make full use of the nontampering characteristics of blockchain technology, give full play to its advantages of leaving traces throughout the process, record the entire process of each student’s learning and growth, establish growth learning files, conduct an objective and comprehensive evaluation of the students’ growth process, and draw each student’s growth process. “Data portraits” of students. Relying on blockchain technology, educational evaluation can better realize the continuity of time, the globality of space, the diversity of value, the complexity of content, the diversity of subjects, etc., which is conducive to the realization of accurate scientific evaluation. In addition, the decentralization, security, and data immutability of the blockchain can effectively improve the inequity in education. Blockchain technology is expected to become a powerful tool to promote the all-round development of students and practice the “five simultaneous development” education construction system. This paper explores the use of blockchain to design and implement a course evaluation system and a learning incentive system, optimize the implementation path of teachers’ educational resources under the background of “double reduction”, encourage students to learn independently, motivate teachers to improve teaching quality, and build an integrated “five education” education human system.
2 Blockchain Technology Application Trends The concept of “blockchain” was first proposed by Satoshi Nakamoto in 2008. In the following years, blockchain has been characterized by “unforgeable”, “full traces”, “traceable”, “open and transparent” and “collective maintenance”, successfully built a field of trustworthy cooperation, and instantly became the hottest information technology. In recent years, under the leadership and promotion of international organizations such as the United Nations and the European Union, various countries have successively released a series of policy documents on blockchain, in-depth exploration of blockchain technology and its applications, making blockchain the world’s largest outlet one. In the “13th Five-Year Plan” National Informatization Plan issued by the State Council in 2016, blockchain technology is also listed as a strategic frontier technology. The front page article of the People’s Daily on February 18, 2019 clearly stated that blockchain is “a new generation of information technology”. With the characteristics of decentralization, security, and data immutability, blockchain is widely used in the field of finance, which requires extremely high trust. With the rapid development of information technology and the explosion of educational information, it is a foreseeable future to use blockchain to break the educational island and build a comprehensive education platform. In the “CNKI Full-text Database” with “blockchain” as the subject word, a total of 37,034 related academic documents were searched (retrieval time 2021-11-29). Research on related “blockchain” began to increase in academic attention since 2016, and it is predicted that 13,905 related papers will be published in 2021 (see Fig. 1).
656
X. Xu et al.
Fig. 1. My country’s “blockchain” research trend
The topics that are highly related to “blockchain” in my country include: blockchain technology, big data, smart contracts, etc. (see Fig. 2).
Fig. 2. Distribution map of research topics related to “blockchain” in my country
Topics that are highly relevant to “blockchain” research abroad include: blockchain technology research, smart contracts, bitcoin, digital currency, etc. (see Fig. 3).
Research on the Application of Blockchain Technology
657
Fig. 3. Distribution map of foreign research topics related to “blockchain”
Literature research found that whether domestic or international, the research on “blockchain” is mainly in specific blockchain technology, financial industry, big data research and so on. There is still relatively little research on the use of “blockchain” technology in the scope of education. However, the current trend of education informatization is inevitable, and the traditional education environment is constantly shifting to an open education environment, as is our province. Blockchain technology has natural advantages in integrating educational information, conducting educational evaluation, solving cross-platform online education, and conducting school-hour exchange. It is necessary to strengthen the research on the integration of blockchain and education. To sum up, based on learning from the quality education models and their effects in different countries and regions, this paper systematically analyzes the difficulties and causes of quality education reform in primary and secondary schools in Henan Province with the construction of the “five education” education system as the research background. Integrate blockchain technology, build an evaluation index system for the construction of the “Five Simultaneous Education” education system in Henan Province, evaluate the effect of educational innovation in Henan Province, and discuss the “Five Educate” with Henan regional characteristics based on blockchain technology. Simultaneous development of education” education system construction and evaluation system.
3 The Application of Blockchain Technology in the Evaluation and Application of the “Five Simultaneous Education” Education System 3.1 Main Content The main theoretical basis of this paper is based on the theory of education economy and management and the theory of education ecosystem, combined with the construction results of the “five education” education system, assumptions: First, there is still
658
X. Xu et al.
scarcity of educational resources in our country and the development of education in the east and the west is unbalanced. To optimize the allocation of educational resources, this assumption has laid a theoretical foundation for the construction and reform of the “five simultaneous education” education system based on blockchain technology; second, the subject, object and environment of compulsory education do not exist in isolation, are interaction and interdependence, and this assumption provides a theoretical premise for the construction of the “five simultaneous development” education system. On the basis of the above assumptions, this project organically combines the subjects of compulsory education, the objects of compulsory education, and the humanities, social and other environments, and separates out the construction of the “five simultaneous development” education system and the integrated development and evaluation mechanism of blockchain technology. To sum up, the main reform contents of this paper are: 1) Evaluate the impact of the construction of the “five simultaneous education” education system on the reform of my country’s primary and secondary education system; 2) Analyze the impact of blockchain technology on the construction of the “five simultaneous education” education system. Contribution; 3) Explore the evaluation mechanism and implementation path of the “five simultaneous development” education system construction and blockchain technology integration development. 3.2 The Reform Goal of This Paper First, based on the theory of educational development and educational economic management, focus on the development achievements of my country’s compulsory education, the construction of the “five simultaneous education” education system and the trend of quality education reform, and define the connotation of “five simultaneous education” under the new development concept; Second, from a comprehensive and systematic perspective, combined with the theory of educational ecosystem, organically combine the subject, object, and the humanities and social and economic environments of compulsory education in the development process of compulsory education, and build a “five education at the same time” education construction and block Chain technology integration evaluation index system; Third, use neural network and other related statistical analysis methods to build a dynamic evaluation mechanism for the “five education” education construction and the integration and development of blockchain technology; Fourth, from the four dimensions of society, government, family and school, taking Henan’s nine-year compulsory education reform as the research background, empirically research and evaluate the operation of the evaluation mechanism for the integration of “five educations” and blockchain technology in Henan Province, and explore Quality education and information technology integration and sustainable development path.
Research on the Application of Blockchain Technology
659
3.3 Key Issues to Be Addressed First, scientifically define the connotation of “simultaneous development of five education” in the development theory of nine-year compulsory education; Although our country has begun to study the teaching concept of “simultaneous development of five education” since 1912, it did not pay attention to it in the early stage, resulting in the unclear connotation of “simultaneous development of five education”. In addition, my country’s educational policy, educational objectives, and teaching methods have been changing with the development of the times, from the traditional educational ideas of “literacy” and “exam-oriented education” in the early years to “comprehensive education” and “education modernization”. The new ideas of education have led to the research on “simultaneous development of five educations” falling behind the development of the times. Therefore, in the new era, it is necessary to redefine the connotation of “simultaneous development of five education” scientifically. Second, reasonable design of questionnaires and selection of corresponding statistical methods; Questionnaire is the main data collection method of this project and an important bridge from qualitative analysis to quantitative analysis. Scientific, reasonable and rigorous questionnaire design is an important prerequisite for collecting and mining information. Unreasonable questionnaires will affect the quality of data and reduce the reliability of analysis. In addition, as an important data analysis method, statistical methods are powerful weapons to see the essence through data. According to the data and the purpose of analysis, choosing an appropriate statistical method can better and more effectively reflect the internal relationship of the data, and on the other hand, it can save energy and avoid detours. So how to design the questionnaire and choose the statistical method are important issues that affect the research results. Third, systematically screen the evaluation indicators and evaluation methods for the construction of the “five educations” education system and the integration of blockchain technology; The educational evaluation index is the direct basis for educational evaluation, which restricts and guides the development of education. Appropriate evaluation indicators should be scientific and targeted to better evaluate the effectiveness and development of the education system. The selection or formulation of classroom teaching evaluation criteria matches the value orientation of classroom teaching and the basic form of classroom teaching. The evaluation standard of traditional classroom teaching is highly consistent with the value orientation of traditional classroom teaching, which mainly pursues students to master knowledge, and the classroom form centered on teacher teaching. At the moment when our country emphasizes the educational policy of “adhering to the simultaneous development of five educations”, the classroom teaching form, teaching mode, teaching concept, etc. have all changed, and the old evaluation standards cannot meet the needs of the times. Therefore, it is necessary to develop teaching evaluation standards based on the educational thought of “five simultaneous development”, which is of great significance for building a “five simultaneous development” education system and evaluation mechanism based on blockchain technology.
660
X. Xu et al.
Fourth, based on the python data processing technology, the corresponding software system is designed, involving computer-related technical support and subsequent database generation and adjustment. In the current era of educational informatization, it is necessary to establish and improve a dynamic educational information data platform. This project uses blockchain technology to build a dynamic education evaluation mechanism, and uses python software processing technology to establish a dynamic data platform for the “five education” education system. 3.4 Implementation Plan On the basis of fully carrying out topic selection design and research review, this project takes the nine-year compulsory education in Henan Province as the research background, and takes the construction of the “five simultaneous development” education system and the integration of blockchain technology as the research object. The idea of discovering problems - analyzing problems - solving problems”, focusing on the scarcity of educational resources and the structural contradiction between supply and demand, analyzes the problems existing in the subject and object of the compulsory education mechanism and its operation mechanism, and educates people in Henan Province’s “five educations” The system construction process is summarized, evaluated and empirically studied, and
Fig. 4. Technology roadmap
Research on the Application of Blockchain Technology
661
the bottlenecks and solutions for the construction of the “five educations” education system and the integration and reform of blockchain technology are analyzed; To sum up, build a “five education at the same time” education system construction and blockchain technology integration evaluation mechanism (Fig. 4).
4 Conclusion First, through the application of blockchain technology to the evaluation process of the “five simultaneous development” education system, the decentralization and pointto-point advantages of blockchain technology are integrated, and the subjectivity, randomness and possibility of tampering in the evaluation process are minimized. Thereby improving the objectivity and reliability of the evaluation. Second, use blockchain technology to dynamically evaluate the construction and implementation environment of the “Five Simultaneous Education” education system, and try to explore the integration of the “Five Simultaneous Education” education system construction and information technology within the context of the digital economy. implementation path.
References 1. Han, M.: The enlightenment of the American school physical education program on the reform of the physical education examination system in my country. Track Field 05, 71–72 (2021) 2. Liu, H.: Analysis of the current situation of the implementation of the physical education examination system for junior high school graduation in my country. J. Phys. Educ. 09, 8–14 (2008) 3. Liu, C., Hu, H.: Research on some problems of my country’s middle school sports examination. J. Guangzhou Inst. Phys. Educ. 35(01), 118–121 (2015) 4. Loisel, R., Baranger, L., Chemouri, N., et al.: Economic evaluation of hybrid off-shore wind power and hydrogen storage system. Int. J. Hydrogen Energy 40(21), 6727–6739 (2015) 5. Böttger, D., Götz, M., Theofilidi, M., Bruckner, T.: Control power provision with power-toheat plants in systems with high shares of renewable energy sources - an illustrative analysis for Germany based on the use of electric boilers in district heating grids. Energy 82, 157–167 (2015) 6. Ommen, T., Markussen, W.B., Elmegaard, B.: Heat pumps in combined heat and power systems. Energy 76, 989–1000 (2014) 7. Nuytten, T., Claessens, B., Paredis, K., et al.: Flexibility of a combined heat and power system with thermal energy storage for district heating. Appl. Energy 104, 583–591 (2013) 8. Dai, Y., Chen, L., Min, Y., et al.: A general model for thermal energy storage in combined heat and power dispatch considering heat transfer constraints. IEEE Trans. Sustain. Energy 9, 1518–1528 (2018). https://doi.org/10.1109/TSTE.2018.2793360 9. Thermal-electric decoupling techniques. Energies 8, 8613–8629 (2015)
Blockchain Adoption in University Archives Data Management Cong Feng1,2(B) and Si Liu2 1 Faculty of Social Sciences and Liberal Arts, UCSI University, Kuala Lumpur, Malaysia
[email protected] 2 College of Information Science and Technology,
Zhengzhou Normal University, Zhengzhou, China
Abstract. Archives data management is an important link in the process of talent training in universities, and the informatization of archives data management has played a great role in promoting education. The existing university archives management generally adopts the centralized management mode, which leads to the low efficiency of archives service and insufficient protection measures for archives data. At the same time, the emergence of advanced technologies, such as quantum computers, has increased the probability of password cracking and information leakage, which has brought new challenges to archives information security. This paper analyzes the feasibility and application advantages of blockchain technology in university archives data management by summarizing the practical difficulties faced by university archives data management, and then a new model of university archive data management based on blockchain technology is proposed. On the basis of ensuring the efficient completion of the archives data management work, it can ensure the safety and reliability of the data in the process of archives data management and comprehensive sharing. Keywords: Blockchain · Archives data management · University · Security · Decentralization
1 Introduction As the cradle of cultivating talents, universities store archives involve a lot of content, including student records, scientific research files, talent files, etc., and many of them overlap. At the same time, the fluidity of file information is also strong. When students enter and graduate each year, their related files need to be moved in and out. The scientific research archives in universities need to be constantly updated with the progress of scientific research, and the flow of university teachers and scientific researchers also requires the transfer of personnel archives at any time. With the development of computer and network information technology, the preservation and use of archives information in universities are also developing in the direction of electronization and digitization [1]. Compared with traditional technologies, blockchain technology has many innovations. Only private information is encrypted, and other stored information is highly © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 662–671, 2022. https://doi.org/10.1007/978-3-031-13832-4_54
Blockchain Adoption in University Archives Data Management
663
transparent. Users on the blockchain system can access the information stored on the chain through the access interface. The blockchain usually encrypts data using technologies, such as timestamps and hash functions, to ensure that the data cannot be tampered with. As a typical case of the practical application of blockchain technology, virtual currencies such as Bitcoin and Ethereum have achieved certain success, and also led to the application of blockchain technology in finance, securities, logistics and other fields [2]. At present, many countries have started the research, development and application of blockchain technology. Because blockchain technology can fully guarantee the security and integrity of electronic archives, the archives academic community has also carried out theoretical and practical explorations on the application of blockchain technology in electronic archives management.
2 Blockchain 2.1 Blockchain and Blockchain Technology At present, blockchain has not formed a unified definition in the academic world. Generally speaking, blockchain is a public database (or public ledger) formed based on blockchain technology. Blockchain technology refers to the combination of technologies for data exchange, processing and storage between multiple parties based on modern cryptography, distributed consistency protocol, point-to-point network communication technology and smart contract programming language. Meanwhile, blockchain technology itself continues to evolve and advance. According to the different combination of technologies adopted by each blockchain, the characteristics of the formed blockchain also vary dramatically. But it should be pointed out that blockchain technology is a package of technologies that can be tailored and innovated according to the needs of the business [3]. 2.2 Features of Blockchain According to the data structure and operation principle of blockchain, blockchain technology, applied in the field of education, has the following features. Decentralization: The peer-to-peer decentralized and distributed blockchain architecture involves the removal of the central controlling authority. Decentralized structures are conducive to individual freedom, privacy and autonomy and diminish the risk of losing the stored information by eliminating single point of failure. Immutability: The data stored on the blockchain are unchangeable records whose states cannot be modified after they are created. Immutability feature of blockchain offers enhanced transparency, data consistency and integrity meanwhile preventing data forgery and theft. Traceability: Each transaction on the ledger is recorded with a timestamp and validated through a verified hash reference. Therefore, people can trace the data information of any block by the correlation between arbitrary blocks.
664
C. Feng and S. Liu
Security: Blockchain provides high security of information through the following three methods. Firstly, the storage with high redundancy is used to ensure the consensus of the data. Only when most of the nodes agree, can the update of the nodes be allowed. Secondly, the encryption algorithm is used to verify the data to ensure that the information are not modified freely. Finally, the multi-private keys are utilized to control the access rights. The privacy of users can be protected while ensuring transparency and traceability of the blockchain. Trust: Blockchain technology converts trust from the need to trust a centralized authority to the need to trust the technology itself. The features of blockchain overlap with each other to a certain extent. Each characteristic is in some way related to or complementary to the other one.
3 The Application of Blockchain Technology in University Archives Data Management 3.1 Current Situation of Archival Data Management in Universities The rapid development of information technology has promoted the modernization process of archives management in universities, and also enhanced the probability of password cracking and information leakage, which has brought threats to the security of archives information. Currently, the factors affecting the information security of university archives include external environment, technical factors, human factors, facilities and equipment, institutional factors, personnel quality, etc., which mainly in the following aspects [4, 5]. 1) Management system issues Because the management of electronic archives in universities covers the whole process of the life cycle of electronic documents, it involves a wide range, an abundant content, and complex, objects, the original archives management system and management system have many inadaptable places. For example, the creation, circulation, inspection, and utilization of electronic archives lack specific rules and regulations, resulting in management loopholes and inadequacies. 2) Archive information security issues The digitization of archives has significantly improved the efficiency of archives work, and the results of the publicity, compilation, research, and development of archives have been initially displayed. However, while electronic archives bring convenience to archives management in universities, there also exist problems of low security and easiness to be tampered with. Nowadays, although digital watermarking technology is adopted for the security protection of electronic archives, limited by the centralized management method, the problems of the original, authenticity and security of archives have not been properly solved from the root, and the malignant events such as forging, stealing and tampering of archives occur frequently [6].
Blockchain Adoption in University Archives Data Management
665
3) Centralized management issues At present, the archives management in universities mostly adopts the centralized management mode, and the archives management process in some universities has not been standardized. Therefore, in the process of archives collection, archiving, and file checking, it is easy for the files to be tampered with and the information is missing due to borrowing. In the centralized management mode, the authenticity and originality of archive data mainly depends on the trust in university archives or third-party entities, such as system servers, central databases, system administrators, database administrators, etc. Once archives or third-party entities are no longer credible, for example, when system databases are invaded, or when administrators are coerced or bought, the authenticity of archival data will be at risk [7]. 4) Archival data sharing issues The collection and storage of archival information involves various aspects of information and requires the cooperation of multiple departments to complete. For example, the grade files in the student files are collected and preserved by the educational affairs department, the student identity information is collected by the Student Office, and the information about students’ participation in scientific research is managed by the Academic Research Office. Each participating management department has its own platform for storing information and it is difficult to communicate with each other, thus forming information islands. In view of these current situations, universities should take reasonable and appropriate measures according to local conditions to strengthen the management of electronic archivists and put them in place. 3.2 Application Feasibility Unlike ordinary data management technologies, blockchain is an integrated package of several existing mature technologies that can provide comprehensive solutions to a number of combinatorial problems. As regards archival data management in universities, the feasibility of the application of blockchain technology is reflected in the following aspects [8, 9]. The concept is feasible. There is no top-down hierarchical structure or centralized organization in the blockchain system, but a distributed network with equal node status is adopted to realize the mutual coordination and collaboration of the macro system [10], which reflects the decentralized concept of multi-agent participation. There are also various groups involved in archival data management in universities, including full-time archival management departments, grassroots teaching departments, and functional departments. The application of P2P technology in blockchain to archives management in universities has changed the existing archives management mode, as shown in Fig. 1. As can be seen from the figure, university archives, as the collection center of archival materials, need to collect archives from various departments on a regular basis for archiving. Such centralized archiving mode will greatly increase the workload of archivists. Blockchain is decentralized, trust-free and tamper-proof. With the introduction of blockchain technology, departments become peers in the blockchain network, and any node can upload archival material to the blockchain [11]. However, uploaded
666
C. Feng and S. Liu
files need to be verified by node voting (verified by relevant departments) before they can be shared to every node on the blockchain through smart contracts and consistency algorithms, so that each department node holds the same archive information. This distributed storage method can effectively avoid the problem that the central database is attacked or the archives are artificially modified by the archives, causing the archives information to be no longer trusted. It can be seen that blockchain and archival data management in universities are consistent in the concept of multi-subject participation. Current Archives Management Mode
Blockchain Archives Management Mode
Management Center
Consensus Algorithm
Shared Data
Smart Contract
Fig. 1. Changes of archival management mode after the adoption of blockchain technology
The function is feasible. Blockchain adopts asymmetric encryption algorithm to ensure the security of data transmission and the authenticity of data source, uses an automatically executable program code to perform impartial data storage [12], and applies consensus mechanism to describe the process of multiple nodes reaching consensus [13]. The function of archival data management in universities is to form an objective and fair historical marks, which can only be read and added, but not be modified and deleted. Thus, it can be seen that the application of blockchain technology in archival data management in universities has functional feasibility. The technology is feasible. The blockchain adopts a chain data structure with blocks as the unit [14]. The basic label information is encapsulated in the block header, and all the transaction information of the current block is stored in the block body [15], continuously superimposing to form an orderly linked data chain. This special data structure can not only save the data generated by the university archives business, but also record the events and behaviors generated by the data [16]. 3.3 Application Superiority 1) Improve the accuracy and efficiency of information collection The collection and storage of archival information involves various aspects of information and requires the cooperation of multiple departments. Each participating management department has its own platform for storing information and it is difficult to communicate with each other. It needs to go to the relevant departments
Blockchain Adoption in University Archives Data Management
667
respectively when querying archival information. At the same time, the collection, input and preservation of archives information are respectively responsible for the archives management personnel of each department, which is prone to inconsistency of information. The application of the distributed decentralized organization model of blockchain technology to manage university archives data can expand the archives data management team. This work is no longer the exclusive business of the archives management department, and more stakeholders have the opportunity to participate [17]. 2) Reduce the cost of information collection and entry Traditional university archives collection requires university archives managers to conduct information review, confirmation, verification, input, etc., which takes a long time. For example, when collecting and entering the grades of college students, it needs to be confirmed by the students themselves, the teachers of the course and the Academic Affairs Office, besides, any problems in any link will affect the accuracy and integrity of the archive information, so the archive information processing efficiency is low and labor costs are high. Blockchain technology can well simplify the processing process and reduce costs. At the same time, the nodes are also supervised by other nodes, which avoids manual verification and saves resources and costs. 3) Enhanced traceability and security With the help of the encryption algorithm of blockchain technology, archives can be protected from being stolen and tampered with during storage, transmission, and review [38]. Once the relevant data is generated, it will be permanently retained, ensuring the security and credibility of archive data. The blockchain technology makes the archive data stored on the blockchain immutable by stamping the “time stamp” mark, using the user’s electronic signature and asymmetric data encryption during the collection and preservation of archive data [3]. Users need the consent of other nodes on the chain when accessing or modifying data. Meanwhile, any modification of archive data can be backtracked, which is beneficial to constrain and guide users to consciously ensure the authenticity of archive information. 4) Advantages of openness and sharing The management of archives information in universities involves departments such as the Student Office, Academic Affairs Office, Scientific Research Management Department, and Secondary Colleges, and the archives information will also be accessed and used by multiple departments at the same time, as student files involve personal information of students, test scores, scientific research participation information, work-study information, and so on. The student work office, educational administration department, scientific research department and so on need to input and file relevant information, so as to facilitate the later processing. Therefore, if there is no sharing platform for archival information resources, or data entry, modification and supervision work is not in place, then archival information will lose authenticity and uniqueness, and archives will lose the meaning of existence. Blockchain technology provides convenient conditions for building an archive data sharing platform. Each department, as each node on the blockchain, can input, obtain and access the archival information on the chain, realizing the openness and sharing of archival information resources. At the same time, blockchain technology can ensure the
668
C. Feng and S. Liu
security, integrity, uniqueness, and traceability of archive information on the chain, which is conducive to the sharing and intercommunication of archive information.
4 System Construction Blockchain is a data chain that records and stores digital data information in sequence according to time. It aims at ensuring data safety and has the potential to protect the credibility of the life cycle of electronic documents. Therefore, the integration of blockchain technology and electronic archives can improve the efficiency of archives management in universities. 4.1 Mode Selection Blockchain can be divided into Public Blockchain, Consortium Blockchain and Private Blockchain according to the different parties involved [3]. Public Blockchain is open to the public, so users can participate anonymously without registration, and can access the network and blockchain without authorization. The blocks on the Public Blockchain can be viewed by anyone, all users can send transactions on it, and everyone can also participate in the process of forming consensus on the network at any time, that is, deciding which block can be added to the blockchain and recording the current state of the network. Public Blockchain is a completely decentralized blockchain, but it is generally not directly used due to its high operation and maintenance costs. Consortium Blockchain is limited to the participation of members of the Consortium. The read and write permissions on the blockchain and the permissions to participate in bookkeeping are formulated according to the rules of the alliance. The network of Consortium Blockchain is jointly maintained by the member institutions, and the network access is generally connected to each node through the gateway node of the member institutions. The network connection is more secure and stable, and the efficiency of verification and confirmation is higher. Consortium Blockchain is a multi-center or partially decentralized network structure. It is a blockchain composed of several organizations or institutions with related interests participating in management, and it is suitable for the internal use of groups or organizations. Private Blockchain is similar to a local area network (LAN). It is a completely closed and centralized blockchain. The scope, read and write permission Settings and participation in accounting of its nodes are determined according to the rules of the private organization. The management of Private Blockchain is relatively strict. Each node’s participation in the consensus process is determined by the organization, and the reading permission of data information is controlled by the organization. The degree of information openness and who has the permission to enter the private chain are determined by the organization. Therefore, it is suitable for internal use or industrial applications. Combined with the needs of university work, the particularity of electronic archives work, the sensitivity of the work object, this paper chooses a combined mode of Consortium Blockchain and Private Blockchain.
Blockchain Adoption in University Archives Data Management
669
4.2 Frame Model The university archives management system based on blockchain technology constructed in this paper is composed of the underlying system of blockchain and the application layer of archives management. This paper proposes to build the architecture of university archives data management system, as shown in Fig. 2.
The Personnel Department
Academic Affairs Office
Students Affairs Department
Secondary Colleges N
Scientific Research Office
Admissions Department
eam
AP
Str
I/S D
ta
K
Da Blocks Data
Network Management
Consensus Mechanism
Chains Management
Smart Contract
Blockchain Service Platform Operation Management
Supporting Services
Fig. 2. The architecture of university archives data management system
The underlying system of the blockchain includes modules such as Smart Contracts, Consensus Algorithm modules, Public Ledgers, and Authority Management, with functions such as data creation, reception, association, maintenance and verification. Smart Contracts guarantee immutability and traceability of data information, Consensus Algorithms guarantee consistency of data information, and Public Ledger records all the data information in the blockchain. Since the archives data of universities are generally circulated among the Personnel Department, the Academic Affairs Office, the Scientific Research Office, the Students Affairs Department, the Admissions Department and the Secondary Colleges, it not only breaks through the geographical restrictions of the
670
C. Feng and S. Liu
institution, but also prevents the free participation of all groups, which just fits with the application scenarios of Consortium Blockchain and Private Blockchain. This system uses Hyperledger Fabric as the bottom layer of blockchain, which mainly includes five functional modules. 1) Chains management. Consortium Blockchain is composed of many interconnected chains, so chains management is the basis for the formation and orderly operation of the Consortium Blockchain, which mainly includes functions such as channel management, performance management and policy management. 2) Network management. It mainly provides communication support for the information exchange between nodes on the blockchain network, mainly including functional modules such as nodes, orderers, clients, CA certification and Gossip protocol [18]. 3) Consensus mechanism. It is a consensus algorithm for blockchain nodes, which supports all nodes on the chain to reach a consensus on the validity of block data, mainly including two types of consensus mechanisms: alliance chain member management and transaction management. 4) Block data. It includes data such as timestamp, random number, Merkle root and chain structure in the block header, as well as transaction information encrypted and stored in the block body. 5) Smart contracts. The contract is written in a programming language and enforced, including data open contracts, query contracts, contract scripts, and related institutional mechanisms that can be converted into contract execution. In addition, the blockchain service platform layer also provides necessary platform operation management and supporting services, such as configuration management, privacy protection, log management, key security and visual monitoring.
5 Conclusion For archives, it is required to be true, complete, usable and safe. However, the current management of archives in universities is faced with a series of problems, such as looseness, redundancy, loss, leakage and tampering. All of the existing data management technologies, including distributed computing, hash structure, data encryption, smart contracts and consensus algorithms, can only solve a certain type of problems. Therefore, there is an urgent need for a comprehensive technology that can solve the dilemma of archives data management in universities. This paper firstly points out the problems existing in the current university archives management, then expounds specific characteristics of blockchain technology itself, and after analyzing the feasibility and advantages of introducing blockchain technology into the archives management system, the university archives management service platform based on blockchain technology is constructed. As an emerging and booming new technology, although blockchain technology provides a possible solution for archival data management in universities, it is bound to face various obstacles and problems restricting its development, for example, ideological and cognitive disorders, technology use disorders, operational supervision disorders and so on. To overcome these obstacles, it depends on the popularization of blockchain technology knowledge, talent training and the introduction of relevant governance norms, and only by joint efforts, can effective application of blockchain technology in archival data management in universities be achieved.
Blockchain Adoption in University Archives Data Management
671
Acknowledgment. This paper is sponsored by the National Natural Science Foundation of China (NSFC, Grant 61572447), Henan International Joint Laboratory of Blockchain and Audio/Video Security, Zhengzhou Key Laboratory of Blockchain and CyberSecurity.
References 1. Liu, Y., Zhang, Y., et al.: Blockchain technology and file management: two-way thinking of technology and management. Arch. Sci. Bull. (1) (2020) 2. Zhou, Z.: Research on file management model based on blockchain. Entrepreneurship Manag. (4), 67–70 (2019) 3. Zou, Y., Zhang, H., et al.: Blockchain Technology Guide, p. 12. China Machine Press, Beijing (2016) 4. Song, H.: Application of blockchain technology in university electronic archives management. J. Jinan Vocat. Coll. (1) (2022) 5. Wen, M., Li, Y., et al.: Blockchain technology in archives data management in universities application analysis. J. Lit. Data (4), 1 (2022) 6. Benil, T., Jasper, J.: Cloud based security on outsourcing using blockchain in E-health systems. Comput. Netw. (178), 107344 (2020) 7. Tan, H., Zhou, T., et al.: Archival data protection and sharing method based on blockchain. J. Softw. (9), 2620–2635 (2019) 8. Zhou, W.: Application of blockchain technology in archives management in universities. Heilongjiang Sci. 11(17), 122–123 (2020) 9. Hu, N., Zhao, Q., et al.: Research on university student archives management based on block chain technology. J. Xinzhou Normal Univ. 37(2), 41–44 (2021) 10. Yuan, Y., Wang, F.: Parallel blockchain: conceptual approach and connotation analysis. Proc. CSU-EPSA 43(10), 1703–1712 (2017) 11. He, X., Hang, X.: Brief analysis of the application of blockchain technology in electronic document management. Arch. Constr. (2), 4–8 (2018) 12. Gatteschi, V., Lamberti, F., Demartinic, G., et al.: Blockchain and smart contracts for insurance: is the technology mature enough? Future Internet 10(2), 20 (2018) 13. Yang, Y., Ni, C., et al.: Development status and prospect of blockchain consensus algorithm. Proc. CSU-EPSA 44(11), 2011–2022 (2018) 14. Liu, A., Du, X., et al.: Blockchain technology and its research progress in information security. J. Softw. 29(7), 2092–2115 (2018) 15. Zheng, Z., Xie, S., Dai, H., et al.: An overview of blockchain technology: architecture, consensus, and future trends. In: 2017 IEEE International Congress on Big Data (Big Data Congress), pp. 557-564. IEEE, Piscataway (2017) 16. Yang, Q.: Trust management model of electronic archives based on blockchain technology: inspiration from ARCHANGEL project in UK. Implications ARCHANGEL Proj. UK Arch. Sci. Study 33(3), 135–140 (2019) 17. Qin, F., Zhao, X.: Application of blockchain technology in archives management of scientific research in universities. Off. Autom. 26(5), 47–49 (2021) 18. Zheng, X.,Yang, H., et al.: Blockchain technology architecture and operation mechanism design. China Educ. Technol. (3), 71–78 (2021)
A Novel Two-Dimensional Histogram Shifting Video Steganography Algorithm for Video Protection in H.265/HEVC Hongguo Zhao1 , Yunxia Liu1(B) , and Yonghao Wang2 1 College of Information Science and Technology,
Zhengzhou Normal University, Zhengzhou, China [email protected] 2 Computing and Digital Technology, Birmingham City University, Birmingham, UK
Abstract. Video security is becoming a significant issue in multimedia industry while considering the increasingly prosperous TikTok and video conference. The video steganography technique which plays a key role in protecting video security has been a hot research area in recent years. In this paper, we provide a novel twodimension histogram shifting video steganography algorithm for H.265/HEVC based video applications. In this two-dimension histogram shifting scheme, we can achieve embedding 4-bits metadata (secret data) while at most modifying only one-bit QDST coefficient. In order to eliminate the embedding error for carrier videos, the embedded carrier is limited to 4 × 4 block QDST coefficients. First, the high frequency coefficients in 4 × 4 blocks are selected to form an embedded coefficient pair, which are divided into 19 non-intersecting sets according to the value of the coefficient pair. Second, according to the set of embedding coefficient pair, the metadata is embedded by provided two-dimensional histogram shifting rules. Finally, the embedded carrier block is encoded into bitstream by entropy encoding, and the metadata is extracted by inverse process corresponding to embedding. The experimental results have been proven the superiority of efficiency and performance about the proposed algorithm. Keywords: Video steganography · Histogram shifting · DST coefficients · 4 × 4 intra/inter blocks
1 Introduction In the latest years, video industry has been rapidly developed while the mobile video applications are more favored by the majority, especially for TikTok, video conference, live-broadcast, etc. However, under the increasingly prosperous apps industry, video protection issues are also emerging and becoming more significant. These issues include, but not limited, illegal distribution, malicious recording, and unauthorized play for selfinterest. The video steganography technology, a powerful tool for protecting the video owner’s legitimate rights, is becoming more and more prominent and has been a hot research area for numerous scholars. Video steganography usually embeds metadata © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 672–683, 2022. https://doi.org/10.1007/978-3-031-13832-4_55
A Novel Two-Dimensional Histogram Shifting
673
(signals of video protection) into video contents and extracts these signals in the future essential scene, thus, it can achieve a strong protection level [1]. Besides, the continuous development of video coding technology also brings challenges to the video protection. Benefited from higher compression efficiency, videos evolve in larger resolution and less bitrate. However, higher compression efficiency further eliminating the embedding space of video steganography, and more new technologies adopted in new codec standard makes it hard to directly use previous video steganography algorithms to protect video privacy and legitimate rights. With the consideration of above mention, there’s still strong need to explore a novel and high embedding efficiency video steganography algorithm to protect video security of high and beyond-high resolution video products, which typical examples are H.265/HEVC (the latest video codec technology [2, 3]) based video products. Histogram shifting based video steganography is a typical reversible data hiding (RDH) technology, which aims to provide a path that the carrier video can be completely recovered after the metadata has been extracted [5]. There are two mainly techniques, difference expansion based and histogram shifting based algorithms for the RDH. In the difference expansion algorithms, Tian [6] proposed to utilize the difference between two neighboring pixels to embed one bit metadata and use the location map to record all expanded locations, which is also the common processes in RDH methods. In addition, Alattar [7] expanded the difference between 3–4 pixels and Kamstra and Heijmans [8] makes the location map compressed by using the low-pass image to predict expandable locations. In the histogram modification (HM) based algorithms, metadata is always embedded into the peak points of the histogram, and the embedding capacity is always depended on the quantity of peak points of the histogram. While considering the scenario of transferring video through network, transform domain based video steganography methods provides more practical application meanings. In the transform domain based researches, discrete cosine transform (DCT) coefficient is a basic and renowned carrier due to its majority occupying the bitstream and reversibility [4, 9–14]. Actually, there have been several RDH methods combing the HM and DCT coefficient. In [12], two random coefficients are selected from 4 × 4 blocks and according to the pre-defined coefficient set they belong to, the metadata can be embedded and achieve a large embedding capacity. Based on above consideration, the combination between DCT coefficients and histogram shiftting would be a potential tool to design an efficient video steganography scheme for digital video protection, especially for HD or beyond HD format videos compressed by H.265/HEVC. In this paper, we focus on the issues of video security protection, by designing an efficient and large embedding capacity video steganography algorithm to promote the security level of the digital video transmitted on network. In this paper, we propose to combine the DST coefficients and provided two-dimensional histogram shifting rules in 4x4 DST block to embed secret data. In order to minimize the visual distortion, we design a shifting rules that can achieve embed 4-bits metadata while modifying only 1-bit coefficient at most. Experimental evaluation has been proven that the proposed method can achieve high visual quality and security performance for carrier videos and higher embedding capacity than state-of-the-art.
674
H. Zhao et al.
The remainder of this paper is organized as follows: Sects. 2 reviews the related technical backgrounds of traditional two-dimensional histogram shifting algorithm. Section 3 proposes the scheme of our video steganography research based on provided 19 nonintersecting sets and two-dimensional histogram shifting rules. In Sect. 4, the experimental results are presented and evaluated our scheme. Finally, the conclusion is shown in Sect. 5.
2 Related Technical Backgrounds 2.1 QDST Coefficients in H.265/HEVC Since the proposed algorithm utilize 4 × 4 QDST coefficients as the embedding carrier, the main transformation and quantization process in H.265/HEVC is elaborated in this section. In H.265/HEVC, 4 × 4 transform block use DST transformation matrix, and other dimension blocks use DCT transformation matrix. The main advantage of this transformation mechanism is for higher precision and lower dynamic range [2]. The DST transformation matrix core of 4 × 4 transform block can be formulated as following: ⎤
⎡
π ⎢ sin 9 ⎢ ⎢ sin 3π 2⎢ 9 C= ⎢ 3⎢ 4π ⎢ sin ⎢ 9 ⎣ sin 2π 9
sin
2π 9
sin
3π 9
sin
3π 9
π 9
− sin
− sin
4π 9
sin
4π 9
− sin
0
− sin
sin
3π 9
3π 9
sin
3π 9
2π 9
− sin
π 9
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(1)
where C presents the transformation matrix core. Before rounding and scaling C, we can acquire the integer transformation matrix H and obtain the two-dimensional DST transformation of 4 × 4 blocks as follows: ⎡ ⎤ 29 55 74 84 ⎢ 74 74 0 −74 ⎥ ⎥ H =⎢ (2) ⎣ 84 −29 −74 55 ⎦ 55 −84 74 −29 Y = HXH T
(3)
where Y presents the transformed coefficients and X depicts the original sample residuals after prediction. After transformation, the coefficients Y will go through post-scaling and quantization process as follows: Y˜ = (Y . × MF)/2(qbits+T _Shift)
(4)
where qbits = 14 + floor(QP/6) MF = 2qbits /Qstep , and OP is the quantization parameter and Qstep presents the quantization step, which is determined by coding configuration and rate-distortion optimization (RDO) process for bit-rate restriction scenario.
A Novel Two-Dimensional Histogram Shifting
675
As shown in Fig. 1, the QDST coefficients in 4 × 4 blocks can be classified into two categories. Group 1 includes direct coefficients (DC) and other low-frequency coefficients, which are located at top-left, and reserve more fundamental information of predicted residuals. Group 2 includes coefficients located at bottom-right directions, and reserve more detail information in high-frequency AC coefficients (marked as gray in Fig. 1). As mentioned above, embedding metadata into AC coefficients would bring less distortion in carrier video. So the AC coefficients in 4 × 4 blocks will be ideal embedding carrier in H.265/HEVC video.
Y0,0
Y0,1
Y0,2
Y0,3
Y1,0
Y1,1
Y1,2
Y1,3
Y2,0
Y2,1
Y2,2
Y2,3
Y3,0
Y3,1
Y3,2
Y3,3
Group 1
Group 2
Fig. 1. The 4 × 4 QDST AC coefficients
2.2 Histogram Modification Mechanism The traditional histogram modification mechanism can be classified into 1-dimension and 2-dimension. In traditional 1-dimension method, metadata is always embedding into QDCT coefficient according to the shifting rule of 1-dimension histogram. The following provides the detailed principle of embedding procedure: ⎧ ⎪ Y , if (Y = 0|| − 1)&&(m = 0) ⎪ ⎪ ⎪ ⎪ 1, if (Y = 0)&&(m = 1) ⎨ (5) Y = −2, if (Y = 0)&&(m = 1) ⎪ ⎪ ⎪ Y + 1, if Y > 0 ⎪ ⎪ ⎩ Y − 1, if Y < −1 where Y and Y represents the original and embedded QDCT coefficients, and m depicts the binary character of metadata which needs to be embedded in current QDCT coefficient. In the traditional 2-dimension method, two QDCT coefficient are selected for
676
H. Zhao et al.
embedding and each QDCT coefficient could embed 1-bit metadata with the same manner as in Eq. (5). For instance, if we need embed metadata is 0, and the two QDCT coefficients are (0, −1). According to the shifting rule of 1-dimension histogram, the embedded coefficient would be (0, −2) and if the metadata is 1, the two QDCT coefficients would be (1, −2) after the embedding procedure.
3 Proposed Two-Dimensional Histogram Shifting Video Steganography Algorithm The proposed two-dimensional histogram shifting video steganography algorithm is illustrated in Fig. 2. The scheme can be divided into two components, including embedding and extraction process. In embedding section, appropriate 4 × 4 embedded blocks can be selected based on the random seed and embedded block threshold. Then the specific coefficients are selected for embedding according to the pre-defined 2-Dimensional histogram shifting rules, where the selected QDST coefficient pair would be modified from embedding or shifting. After entropy encode (CABAC or CALVC), the carrier video will be encoded to bitstream and transmitted through external network. The extraction section is an inverse loop of embedding. To guarantee the security of metadata, essential encryption and decryption are also manipulated before and after video steganography process. .bin Original Input Video Embedded block Selection
Selection of 4×4 Extraction Block Selection of 4×4 block
Extraction procedure 2-dimension histogram shifting
QDST Coefficient pair
Embedding procedure Embedding metadata
Output: Carrier video
a. Embedding procedure
Binary of metadata
2-Dimensional histogram shifting rules
Coefficient Pair of AC Coefficient
Extracting Metadata
Metadata
Decoded carrier video
b. Extraction procedure
Fig. 2. Proposed two-dimensional histogram shifting video steganography algorithm
In order to enhance the security of the proposed video steganography algorithm, the embedded 4 × 4 block is selected randomly to resist the detection of video steganography. In addition, the block threshold is also utilized to determine whether the random number is larger than threshold or not. If the random number is less than threshold, the current 4 × 4 block will be used as embedded block. If not, current 4 × 4 block will be skipped for embedding. Thus, the introduction of block threshold will be used as embedding
A Novel Two-Dimensional Histogram Shifting
677
strength to adjust the embedding capacity for different demands. In the embedded 4 × 4 block, the coefficient threshold are also used as the selection of coefficient pair. If the current high-frequency AC coefficient’s absolute is less than the coefficient threshold, the current QDST coefficient will be selected as component of embedded coefficient pair. When the embedded coefficient pair have all been selected, the current 4 × 4 block will not be scanned. If the embedded coefficient pair is {Y1 , Y2 }, then the embedding procedure will be manipulated according to 2-dimensional histogram shifting rules and the coefficient set they belong to. The 2-dimensional histogram shifting rules are depicted as Fig. 3.
000 001 010 011 100 101 110 1110 1111 0 10 11 1 Shifting
Fig. 3. Proposed 2-dimensional histogram shifting rules
As shown in Fig. 3, each coefficient set has different shifting rules according to predefined 19 non-intersecting sets which are determined by the values of the coefficient pair. If we denote the current coefficient pair as symbol A, the 19-non-intersecting sets can be defined as following: A1 A2 A3 A4
= {(0, 0)} = {(Y1 , 0)|Y1 > 0} = {(Y1 , 0)|Y1 < 0} = {(0, 1)}
678
H. Zhao et al.
A5 = {(0, −1)} A6 = {(0, Y2 )|Y2 > 1} A7 = {(0, Y2 )|Y2 < −1} A8 = {(−1, Y2 )|Y2 > 1} A9 = {(1, Y2 )|Y2 > 1} A10 = {(1, Y2 )|Y2 < −1} A11 = {(−1, Y2 )|Y2 < −1} A12 = {(Y1 , 1)|Y1 < 0} A13 = {(Y1 , Y2 )|Y1 < −1, Y2 > 1} A14 = {(Y1 , 1)|Y1 > 0} A15 = {(Y1 , Y2 )|Y1 > 1, Y2 > 1} A16 = {(Y1 , −1)|Y1 < 0} A17 = {(Y1 , Y2 )|Y1 < −1, Y2 < −1} A18 = {(Y1 , −1)|Y1 > 0} A19 = {(Y1 , Y2 )|Y1 > 1, Y2 < −1} where sets A1 , A2 , A3 , A4 , A5 , A8 , A9 , A10 , A11 are used as embedding and other coefficient sets are used for shifting. The detailed embedding procedure is determined by the coefficient set and histogram shifting rules as depicted in Fig. 3. In order to describe the embedding process more clearly, we provide the detailed histogram shifting principle in the following description. If the current coefficient pair is denoted as {A1 , A2 }, the embedded coefficient pair is denoted as {A 1 , A 2 }, the binary characters of metadata is denoted as mi mi+1 mi+2 mi+3 , then the embedding procedure can be depicted as follows: 1). If the current coefficient pair {Y1 , Y2 } ∈ A1 , follows: ⎧ {Y1 , Y2 } if ⎪ ⎪ ⎪ ⎪ + 1, Y } if {Y ⎪ 1 2 ⎪ ⎪ ⎪ {Y , Y + 1} ⎪ if 1 2 ⎪ ⎪ ⎪ {Y − 1, Y } ⎪ if ⎨ 1 2 Y1 , Y2 = {Y1 , Y2 − 1} if ⎪ ⎪ ⎪ + 1, Y +1} if {Y 2 ⎪ ⎪ 1 ⎪ ⎪ − 1, Y {Y ⎪ 1 2 +1} if ⎪ ⎪ ⎪ − 1, Y {Y ⎪ 1 2 − 1} if ⎪ ⎩ {Y1 + 1, Y2 − 1} if
then {A 1 , A 2 } can be obtained as = 000 = 001 = 010 = 011 = 100 = 101 = 110 = 1110 = 1111
mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2 mi mi+1 mi+2
2). If the current coefficient pair {Y1 , Y2 } ∈ A2 , then {A 1 , A 2 } can be obtained as follows: ⎧ ⎨ {Y1 +1, Y2 +1} if mi mi+1 = 10 Y1 , Y2 = {Y1 +1, Y2 − 1} if mi mi+1 = 11 ⎩ if mi mi+1 = 0 {Y1 +1, Y2 }
A Novel Two-Dimensional Histogram Shifting
3). If the current coefficient pair {Y1 , Y2 } follows: ⎧ ⎨ {Y1 − 1, Y1 , Y2 = {Y1 − 1, ⎩ {Y1 − 1,
679
∈ A3 , then {A 1 , A 2 } can be obtained as Y2 − 1} if mi mi+1 = 10 Y2 +1} if mi mi+1 = 11 Y2 } if mi mi+1 = 0
4). If the current coefficient pair {Y1 , Y2 } ∈ A4 , then {A 1 , A 2 } can be obtained as follows: ⎧ ⎨ {Y1 − 1, Y2 +1} if mi mi+1 = 10 Y1 , Y2 = {Y1 +1, Y2 +1} if mi mi+1 = 11 ⎩ if mi mi+1 = 0 {Y1 , Y2 +1} 5). If the current coefficient pair {Y1 , Y2 } ∈ A5 , then {A 1 , A 2 } can be obtained as follows: ⎧ ⎨ {Y1 +1, Y2 − 1} if mi mi+1 = 10 Y1 , Y2 = {Y1 − 1, Y2 − 1} if mi mi+1 = 11 ⎩ if mi mi+1 = 0 {Y1 , Y2 − 1} 6). If the current coefficient pair {Y1 , Y2 } ∈ A8 , then {A 1 , A 2 } can be obtained as follows: {Y1 − 1, Y2 + 1} if mi = 1 Y1 , Y2 = {Y1 , Y2 + 1} if mi = 0 7). If the current coefficient pair {Y1 , Y2 } ∈ A9 , then {A 1 , A 2 } can be obtained as follows: {Y1 + 1, Y2 + 1} if mi = 1 Y1 , Y2 = {Y1 , Y2 + 1} if mi = 0 8). If the current coefficient pair {Y1 , Y2 } ∈ A10 , then {A 1 , A 2 } can be obtained as follows: {Y1 + 1, Y2 − 1} if mi = 1 Y1 , Y2 = {Y1 , Y2 − 1} if mi = 0 9). If the current coefficient pair {Y1 , Y2 } ∈ A11 , then {A 1 , A 2 } can be obtained as follows: {Y1 + 1, Y2 − 1} if mi = 1 Y1 , Y2 = {Y1 , Y2 − 1} if mi = 0 10). If the current coefficient pair {Y1 , Y2 } ∈ A6 , A7 , A12 , A13 , A14 , A15 , A16 , A17 , A18 , A19 , then the current coefficient pair will not be used as embedding, and only for
680
H. Zhao et al.
coefficient shifting. And {A 1 , A 2 } can be obtained as follows: ⎧ ⎪ if {Y1 , Y2 } ∈ A6 ⎪ {Y1 , Y2 + 1} ⎪ ⎪ ⎪ {Y , Y − 1} if {Y1 , Y2 } ∈ A7 1 2 ⎪ ⎪ ⎨ {Y1 − 1, Y2 + 1} if {Y1 , Y2 } ∈ A12 ∪ A13 Y1 , Y2 = ⎪ {Y1 + 1, Y2 + 1} if {Y1 , Y2 } ∈ A14 ∪ A15 ⎪ ⎪ ⎪ ⎪ {Y1 − 1, Y2 − 1} if {Y1 , Y2 } ∈ A16 ∪ A17 ⎪ ⎪ ⎩ {Y1 + 1, Y2 − 1} if {Y1 , Y2 } ∈ A18 ∪ A19 we provide a simple example to describe above embedding procedure. If the current coefficient pair {Y1 , Y2 } is {0, 0}, then the coefficient pair can be modified to{0, 0}, {1, 0}, {0, 1}, {−1, 0}, {0, −1}, {1, 1}, {−1, 1}, {−1, −1}, {1, −1}, and corresponding embedded metadata will be 000, 001, 010, 011, 100, 101, 110, 1110, 1111, respectively. As seen from above embedding procedure, in the optimistic scene, we can embedded 4-bits metadata at most and only modify coefficient once only. Extraction procedure is an inverse process compared to embedding procedure. During the extraction procedure, the random seed, 4 × 4 block and coefficient threshold should be kept with the same values as embedding procedure. And then according to the values of coefficient pair Y1 , Y2 , we can extract the embedded metadata with an inverse process of 2-dimensional histogram shifting rules depicted in Fig. 3.
Original sample
Proposed algorithm sample a). BasketballPass
Original sample
Proposed algorithm sample b). KristenAndSara
Fig. 4. Subjective visual quality of the proposed video steganography algorithm
A Novel Two-Dimensional Histogram Shifting
681
4 Experimental Evaluation The proposed video steganography algorithm is manipulated and evaluated under the reference software HM16.0. The coding parameters are set as follows: frame-rate is set to be 30 frames/s, quantization parameter is set to be 32, and the test video sequence is set to be all intra frames, B and P frames with the interval 4. The main evaluation includes PSNR, embedding capacity and bit-rate increase. The subjective visual quality of the proposed algorithm is depicted in Fig. 4, where the first column are the original video samples resolutions in the range of 416 × 240 to 1280 × 720 (BasketballPass: 416 × 240 and KristenAndSara: 1280 × 720), and the second column are the proposed video steganography algorithm. It can be seen that the proposed algorithm has achieved good visual quality on carrier videos and imperceptible performance. Figure 5 (a) provides comparisons about PSNR comparisons when POC various from 0–9 while embedding metadata in test video sample BasketballPass. For the visual quality evaluation, PSNR is the average value of all tested frames, it can be seen that for visual quality, the average PSNR of our proposed scheme is 35.10 dB while original PSNR value is 35.41 dB for tested videos BasketballPass.The experimental results have proved a good embedding performance of our proposed scheme on PSNR and bit-rate increase.
b. Bit-rate increase
a. PSNR Comparison
c. Embedding capacity
Fig. 5. Embedding performance of our proposed algorithm
682
H. Zhao et al.
Figure 5 (b) and (c) also provides the embedding performance of bit-rate increase and embedding capacity of our proposed video steganography algorithm, 20 frames are used to test embedding capacity and bit-rate increase is the result of various quantity of embedding metadata from 100 to 2200 bits, and embedding capacity is the result of QP parameters various from 28 to 36. For bit-rate increase, our proposed scheme can achieve average 0.64%. For embedding capacity, our proposed scheme has achieved average 2225.6 bits. It can be seen that our proposed scheme can achieve a good performance on visual quality, embedding capacity and bit-rate increase.
5 Conclusion In this paper, an effective 2-dimensional histogram shifting based video steganography algorithm is proposed for H.265/HEVC video security protection. The proposed algorithm could embed 4-bits metadata in 4 × 4 QDST block at most while only one AC coefficient is modified. The experimental results also show that our proposed algorithm can achieve a good embedding performance on visual quality, embedding capacity and bit-rate increase. Acknowledgment. This paper is sponsored by the National Natural Science Foundation of China (NSFC, Grant No. 61572447), Henan International Joint Laboratory of Blockchain and Audio/Video Security, Zhengzhou Key Laboratory of Blockchain and CyberSecurity.
References 1. Asikuzzaman, M., Alam, M.J., Lambert, A.J., Pickering, M.R.: Imperceptible and robust blind video watermarking using chrominance embedding: a set of approaches in the DT CWT domain. IEEE Trans. Inf. Forensics Secur. 9(9), 1502–1517 (2014) 2. Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012) 3. Ohm, J.R., Sullivan, G.J., Schwarz, H., Tan, T.K., et al.: Comparison of the coding efficiency of video coding standards-Including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012) 4. Liu, Y.X., Zhao, H.G., Liu, S.Y., et al.: A robust and improved visual quality data hiding method for HEVC. IEEE Access 6, 53984–53987 (2018) 5. Zhang, X.P.: Reversible data hiding with optimal value transfer. IEEE Trans. Multimedia 15(2), 316–325 (2012) 6. Tian, J.: Reversible data embedding using a difference expansion. IEEE Trans. Circuits Syst. Video Technol. 13(8), 890–896 (2003) 7. Kamstra, L., Heijmans, H.: Reversible data embedding into images using wavelet techniques and sorting. IEEE Trans. Image Process. 14(12), 2082–2090 (2005) 8. Li, X.L., Li, B., Yang, B., Zeng, T.Y.: General framework to histogram-shifting-based reversible data hiding. IEEE Trans. Image Process. 22(6), 2181–2191 (2013) 9. Liu, Y., Jia, S., Hu, M., et al.: A reversible data hiding method for H.264 with Shamir’s (t, n)-threshold secret sharing. Neurocomputing 188, 63–70 (2016)
A Novel Two-Dimensional Histogram Shifting
683
10. Yang, J., Li, S.: An efficient information hiding method based on motion vector space encoding for HEVC. Multimedia Tools Appl. 77(10), 11979–12001 (2017). https://doi.org/10.1007/s11 042-017-4844-1 11. Zhao, H.G., Liu, Y.X., Wang, Y.H., et al.: A video steganography method based on transform block decision for H.265/HEVC. IEEE Access 9, 55506–55521 (2021). https://doi.org/10. 1109/ACCESS.2021.3059654 12. Zhao, J., Li, Z.-T., Feng, B.: A novel two-dimensional histogram modification for reversible data embedding into stereo H.264 video. Multimedia Tools Appl. 75(10), 5959–5980 (2015). https://doi.org/10.1007/s11042-015-2558-9 13. Zhao, H., Pang, M., Liu, Y.: Intra-frame adaptive transform size for video steganography in H.265/HEVC bitstreams. In: Huang, D.-S., Premaratne, P. (eds.) ICIC 2020. LNCS (LNAI), vol. 12465, pp. 601–610. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-607968_52 14. Zhao, H., Liu, Y., Wang, Y., Wang, X., Li, J.: A blockchain-based data hiding method for data protection in digital video. In: Qiu, M. (ed.) SmartBlock 2018. LNCS, vol. 11373, pp. 99–110. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05764-0_11
Problems and Countermeasures in the Construction of Intelligent Government Under the Background of Big Data ZhaoBin Pei and Ying Wang(B) College of Marine Law and Humanities, Dalian Ocean University, Dalian 116023, China [email protected]
Abstract. With the rapid development of new generation of information technologies such as big data, cloud computing and the Internet of Things, the traditional government governance model has been unable to effectively cope with the changing social environment and new challenges. The development of big data has made significant progress and breakthroughs in government’s modern governance ability, which has become a new path of government data governance. The construction of smart government and the realization of smart government governance model are based on the rule of law as the basic guarantee, big data and other science and technology as the basic conditions, and smart decision-making as the core to build a dynamic network collaborative governance mechanism among government, market and society, so as to realize the efficient governance mode of smart government and promote the modernization of government governance ability. Starting from the construction of smart government, this paper mainly analyzes the mode concept, challenges and countermeasures of smart government construction in the era of big data. Keywords: Data sharing · Smart cities · Rule of law government
1 Introduction Since the 21st century, with the rapid development of science and technology and information technology, the economic and social environment of each country has become increasingly complex and changeable. The traditional governance mode of each government is facing many problems, and the traditional governance means alone cannot effectively deal with the complex and changeable social development environment [1]. At present, big data, with its unique nature and way, is gradually influencing and changing human life, working mode and thinking mode, affecting the development of economy and society, as well as the modern governance mode and modern governance ability of the government. It has become another new era of information technology after the computer and Internet era, namely the era of big data [2]. In today’s big data era, the development of science and technology leads the great change of human society. Global informatization and digitization affect people’s lives in all aspects. The new generation of information science and technology, such as big data, cloud computing and Internet, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 684–697, 2022. https://doi.org/10.1007/978-3-031-13832-4_56
Problems and Countermeasures
685
has accelerated the mutual integration and penetration of the network virtual world and the human real world, and realized the ‘interconnection between people’ and ‘interconnection between people and things’, which has greatly increased the volume of data, catalyzed ubiquitous data, networking and intelligent services, promoted social change, and had a significant impact on the modernization of the governance system and governance capacity of the national government. Big data has brought new challenges and opportunities to government, society and enterprises in various industries, especially in the transformation of thinking mode and innovative decision-making mode, which has made significant progress and inspired the new model and new ideas of government governance based on big data. The development of big data has made great progress and breakthroughs in government’s modern governance ability, which has become a new path of government data governance, which has a significant impact on improving government governance model, improving government supervision services, realizing government intelligent decision-making, building intelligent government and building intelligent city. As early as 2015, Prime Minister Li Keqiang considered and adopted the Outline of Action for Promoting the Development of Big Data (hereinafter referred to as the Outline) at the State Council’ s executive meeting held under the chairmanship of Prime Minister Li Keqiang. The Outline was officially released on September 5, which clearly pointed out that establishing a management mechanism of “speaking with data, making decisions with data, managing data and innovating with data” and realizing scientific decision-making based on data will promote the progress of government management concepts and social governance models [3]. Through the innovation of big data services, government management and public governance will be improved, and then the precise governance of the government will be promoted [4]. Building a smart government is the key to building a smart city. The construction of smart government is to establish a communication platform between citizens and the government through information science and technology. By understanding the smart city model established by G. Perboli et al. [5], it can be seen that the construction of smart government is an important part of the model. Therefore, with the gradual and indepth development of smart city construction, the construction of smart government has been paid more and more attention by the state. Smart government is often constructed based on the environment of smart city, aiming to solve how to make the city maintain stable economic growth and sustainable development through the construction of smart government [6]. Information science and technology can effectively help the government face challenges and better manage the city. The new information science and technology can help the government make more reasonable and intelligent decisions and better serve the people, such as making citizens more convenient to handle related affairs [7]. Building smart government has become one of the important contents of smart city construction.
2 General Theory of Smart Government 2.1 The Concept of Smart Government Smart government is a derivative concept of e-government. The so-called derivative is not native. The theoretical and practical circles give a higher level of e-government the
686
Z. Pei and Y. Wang
word “wisdom,” meaning that it chooses the smart government as the new development direction. This purpose is to emphasize that the e-government of “wisdom” is better than other e-governments before. The construction of smart government should have a clear value goal, establish the technical path and technical model of construction, and determine the basic way of its development and construction. At present, many scholars have different views on the concept of smart government. According to Gil-García scholars [8], smart government is based on the context of the development of a new generation of information science and technology to describe the government’s innovative decision-making and creative strategies, thus making government management activities more flexible. Z. Lv [9] and other scholars pointed out that smart government is a key part of the smart city construction model, and that intelligence is essentially a link to intelligent networks. Therefore, the government realizes data sharing among departments through information and communication technology, and uses information science and technology such as big data to achieve more open, transparent and sustainable government services. H. Alenezi [10] and other scholars define smart government as an advanced stage of e-government and a more open and innovative government. Howard and Maio [11] scholars believe that smart government adopts integrated governance for cross-regional or cross-state and local governments through the use of modern information technology and communication technology, and then continuously innovates and generates certain social value through the vertical perspective on all levels of government (cities, federations or states) or the horizontal perspective on cross-regional or cross-state and local governments. The above are the views and understandings of some foreign scholars on smart government. In China, scholars in the theoretical circle generally believe that smart government is developed from e-government, which is a higher stage of e-government development and the only way to develop e-government. For example, Shang Shanshan and Du Juan scholars [12]’ definition of smart government is to make full use of advanced information science and technology, such as REID, sensors and other new hardware technologies for real-time data collection, and realize data communication between government departments and break the phenomenon of information island between government departments. At the same time, big data and artificial intelligence are used to process and analyze relevant data information, so as to realize public service oriented, provide more accurate, more convenient, more efficient and higher quality public government services, and ultimately promote the harmonious and sustainable development of cities. The scholars of Fei Jun and Jia Huizhen [13] believe that the smart government is the government organs at all levels of the country. It mainly relies on modern information science and technology and intelligent electronic sensing equipment, adheres to the concept of people-oriented administration according to law, constructs a service-oriented smart government, effectively protects the relevant system, makes full use of big data, cloud computing and Internet and other information science and technology, carries out reasonable, efficient and open and fair social collaborative governance, and then provides intelligent and high-quality public services to citizens, so as to realize the “smart” decision-making and build a smart government; Scholars from Song CongLin and Lu Min [14] believe that smart government is the advanced stage of the development of e-government. Compared with e-government, smart government can provide more optimized management, service and better decision-making.
Problems and Countermeasures
687
Through information science and technology such as big data and cloud computing, smart government can create the overall process of government work, integrate government data resources, optimize the structure and organization, realize the rationalization of government work arrangement, and meet the many needs of the public from multiple perspectives and levels, thus forming an efficient, convenient and scientific management mode. In summary, scholars at home and abroad mainly understand smart government from two aspects. One is the perspective of information science and technology, and the other is the perspective of public management services [15]. The research of the above scholars reflects the integration of information science and technology and egovernment, and then develops the concept of “smart government,” which provides a useful reference for the construction and governance of smart government. 2.2 Value Orientation of Smart Government Open and Transparent Governance Environment Wisdom government is constructed through network technology with other organizations or citizens, which means that other organizations or citizens also communicate and cooperate with wisdom government through network technology. In an alternative way and with increased opportunities, to ensure good governance, smart governments need to fully balance the interests of all parties, while requiring fair competition and cooperation in this environment. Due to the complexity of various factors, it is necessary for smart government to take the concept of openness as the principle. And because of the competition and cooperation between the various organizations, it is necessary for the smart government to supervise it and create a good transparent management environment. Because of the new information science and technology makes the social environment and thinking mode has undergone great changes, the government for all kinds of organizations in society and citizens to manage the way also will change, keep pace with the times, not only to make them by the common goal and value pursuit, but also to maximize the public interest, encourage their fair and effective competition and cooperation, to protect their legitimate interests. This is bound to require the smart government network technology structure diversification, flexible management system, better safeguard the government and social organizations and other citizens’ own rights and interests and common interests, so that all subjects in today’s social environment to work for common goals. Thus, the construction of open and transparent governance environment is the basic principle and value goal of the construction of smart government. In practice in other countries, the United States proposes that building an open and transparent government should be based on the basic principles of transparency, participation and collaboration [16], The British government, Australia, New Zealand, Canada and other countries have successively built big data open platforms (data. gov. uk, data. gov. au, data. gov. nz, open. canada. ca). In 2012, China launched the first open data portal, Shanghai Municipal Government Data Service Network. Since then, Beijing, Foshan, Wuhan and other cities have built open data portals [17].
688
Z. Pei and Y. Wang
Cooperative Atmosphere of Consensus Consensus refers to the policy of major disputes or projects involving the interests of the public, before making decisions to actively communicate with the public, through joint negotiations to reach a decision. Can expand the scope of public participation, through the relevant channels of thinking, taking into account the rights and interests of all parties, combined with the creativity and wisdom of all parties, choose the method and path to solve the problem. If the government wants to discuss, it requires that it has a solid and broad public foundation and a positive sense of participation. Taking it as a prerequisite, it can better achieve equal communication, listen to opinions, widely absorb the masses’ words and wisdom, and then form a basic consensus. From the perspective of management, the cooperation referred to in smart government should take ‘co-construction and sharing’ as the basic principle, and emphasize win-win and symbiosis as the atmosphere of cooperation. The consensus and the construction of smart government are interrelated and will affect each other. Building a cooperative atmosphere of consensus can provide good conditions and foundations for the construction of smart government, help to improve the development model of smart government, improve the governance system of smart government and enhance the modernization of governance capacity of smart government, and promote the healthy and efficient development of smart government. Basic Resources of Co-construction and Sharing The basic resources of smart government mainly include infrastructure such as network, data, platform, system and related technologies, and of course also include thinking, decision-making, theoretical basis and laws and regulations. The governance model of Chinese government has changed from “single center” to “multiple centers”, from “organization-centered” to “public-centered”, from organizational governance to universal governance, and from partial governance to comprehensive governance. Such change means that more citizens and organizations, organizations and organizations should cooperate with each other and actively participate in governance through network technology. Therefore, taking “good governance” as the goal and increasing public value will inevitably put forward relevant requirements for the co-construction and sharing of basic resources. Co-construction and sharing of basic resources is the basic requirement of realizing ‘good governance’, the core requirement and basic principle of building and developing government affairs system, and the prerequisite for orderly and efficient construction of smart government, avoiding related risks and maintaining social benign development. Insisting on the goal and principle of co-construction and sharing requires the government to share the right to use basic resources and the right to know with the public by relying on laws, policies or the actual needs of the public. Efficient and Accurate Government Service The rapid development of information science and technology such as big data, cloud computing and the Internet provides an endless stream of new tools for human society, the continuous development and accumulation of electronic intelligent products, and a large number of basic resources such as data, information technology and scientific knowledge. At the same time, it also brings a variety of information systems that can deal with these resources, including government service systems, etc., and thus provides
Problems and Countermeasures
689
a realistic and feasible science and technology for smart government to have wisdom and use wisdom. The existing practical experience of e-government and smart city has created a certain practical foundation and rich time experience for the construction of smart government. They have established perfect infrastructure, databases and big data platforms, as well as government affairs systems. They also provide relevant resources, information technology, management and service foundations for the construction of smart government. Equal and Inclusive Partnership President Xi Jinping pointed out that “the realization of the Chinese dream must follow the Chinese path, carry forward the Chinese spirit and unite the Chinese power”, which is mainly reflected in the practice of the smart government. In the construction and development process of smart government and its government affairs system, we should make full use of wisdom based on cooperative consciousness and governance concept, rely on information science and technology to build a platform for communication with the public, and establish the most extensive and equal partnership with the public, so as to better use the wisdom of the public, rely on the power of the public to achieve the purpose of building a smart government. In November 2012, the UK government launched the “UK government digital strategy,” which plans to unify the services of various government departments to the GOV. UK website, transfer a large number of government services to the network platform after restructuring, and develop the “digital default” service standard. The standard requires that online services provided by all platforms must be simple and convenient, so that any citizen who can use these online services can take the initiative to choose relevant online services, and also ensure that people who cannot use these online services are not excluded from these services [18]. 2.3 Governance Framework of Smart Government The Governance Concept of Smart Government Dwight Waldo once said, ‘Our well-being, well-being and real life for all depend to a large extent on the performance of the executive authorities that influence and sustain our lives [19]. Therefore, in the era of big data, the governance concept of smart government should not only reflect the external characteristics of wisdom and efficiency, but also have the internal concept of serving the people’s livelihood. First, follow the concept of intelligent and efficient. Through the use of information technology, smart government deals with public administrative affairs online and integrates management functions offline, which ensures the interconnection between government decision-making and implementation, and makes the organizational structure networked and intelligent. At the same time, the smart government uses advanced information science and technology to analyze a large number of data, to understand and master the relevant situation, and to manage internal and external affairs accurately and efficiently with scientific theoretical knowledge. Secondly, pay attention to follow the concept of collaborative governance. With the help of modern science and technology, smart government integrates and optimizes the information mastered, from the original traditional governance
690
Z. Pei and Y. Wang
mode to collaborative opening, so that various regions, departments and levels communicate with each other and coordinate governance. Again, promote people-oriented concept. Wisdom government should take ‘people’ as the standard, adhere to ‘people’ as the center, be good at building modern technology platform, widely solicit citizen opinions, and listen to voices from different levels. To provide personalized modern services for citizens, make decisions more in line with the vital interests of most citizens, and ultimately build an open, fair, just, transparent, efficient and intelligent service-oriented government. Governance System of Smart Government The governance system of smart government mainly includes organization system, procedure system, management system and operation system. Firstly, the networked organizational system makes the basic framework of smart government governance. Traditional organizational structure is a narrow structure, government organizations and decision makers are in a closed framework for resolution. However, with the development of information science and technology, the traditional organizational structure has changed, so that there is a good division of labor and cooperation between decision makers and executives, and then make the organizational structure flexible and cooperative. The procedural system of smart government refers to the process system of solving problems and dealing with government affairs. Smart government provides personalized services for citizens through online, while using offline integration between various departments can only optimize the administrative examination and approval workflow to achieve efficient and convenient purposes. The management system of smart government is to maintain the whole dynamic mode in the process of smart government management. In daily management, the smart government pays attention to the antecedents and consequences of public events, uses big data to collect and analyze, perceives social dynamic changes at any time, predicts the development trend of events, and integrates pre-warning, inprocess monitoring and post-management to realize the dynamic management of the whole process. The operation system of smart government refers to the government’ s compliance with the principle of service priority, taking itself as a service provider, adhering to the concept of “above citizens,” adhering to the criterion of “serving the people,” providing relevant services for the public, and solving the needs of citizens with efficient wisdom. Governance Mechanism of Smart Government The governance mechanism of smart government mainly includes two aspects. On the one hand, in the development mechanism, the smart government builds a diversified collaborative development mechanism through the integration of information and technical resources of various departments and the coordination of internal and external and upper and lower levels, which is the only way to adapt to the trend of social common governance by convening social forces to jointly maintain network security and order. Que Tianshu scholars once said: ‘Only through multiple participation, multi-directional interaction and multi-system state governance intervention, can we eliminate the risks and problems of cyberspace to the greatest extent [20].’ On the other hand, on the regulatory mechanism, smart government governance combines information science and technology and various departments platforms, and is committed to building a smart
Problems and Countermeasures
691
government regulatory mechanism by constructing a technical regulatory system, a citizen satisfaction system and a result-oriented social evaluation mechanism. For example, Guizhou, as a comprehensive experiment of big data in China, took the lead in carrying out local legislation on big data, and formulated the “Regulations for the Promotion of the Development and Application of Big Data in Guizhou Province” and the “Regulations for the Sharing and Opening of Government Data in Guiyang”. Through legal supervision of the development and application of big data, it is committed to protecting the rights and interests of the public, and comprehensively constructs the “Guizhou Model” in line with the development of government affairs in the western region [21]. Governance of Smart Government Building a smart government is a revolution in innovative management, advocating the concept of ‘talking with data, making decisions with data, managing data and innovating with data’. First of all, in terms of “speaking with data,” smart smart government pays attention to the governance value of research data, and takes the data as the result of recording and quantifying the reality of the objective world. It is a silent speech record of social multiple subjects, a hotbed for political issues and major policies, and also represents the direction and trend of future social development. Secondly, in the aspect of ‘data decision-making,’ the smart government uses big data, Internet and other emerging information science and technology as a means to help decision makers think more scientifically and efficiently through intelligent data analysis and optimization of decision-making process, so that the government can move from ‘decision by experience’ to ‘scientific decision-making.’ Again, in the ‘data management’ aspect, the smart government through real-time monitoring data, find out the common points from the data and optimize the integration, the static data into dynamic data, so as to realize the analysis and application of data and realize the purpose of precision management. Finally, in terms of ‘data innovation’, smart government provides better innovative solutions for future social governance by integrating and configuring data resources among various functional departments to achieve online and offline communication and data interconnection and sharing.
3 Problems in the Construction of Intelligent Government Under the Background of Big Data 3.1 Inadequate System Construction of Smart Government Governance To build a smart government, we should first establish a governance system of smart government, and the establishment of governance system should clarify the boundaries between the government and society, and give full play to the initiative, enthusiasm and innovation of relevant social forces. However, the boundaries between China’s government and society are still unclear. The government’s governance focus still focuses on social stability. The government makes administrative orders according to laws and regulations, which interferes with social administration too much and controls are not flexible, so it is easy to cause low vitality of social organizations. Moreover, the governance standards of smart government are not perfect, and the newly developed information science
692
Z. Pei and Y. Wang
and technology are constantly interacting with government governance. However, there is no unified evaluation standard for the internal and external governance. In addition, the legislation on smart government governance is relatively backward. In recent years, with the increasingly wide range of fields involved in the process of government governance, new problems continue to emerge, and the requirements for laws and regulations on how to govern are bound to increase rapidly. Governments at all levels lag slightly in the legal construction of smart government governance [22], For example, in terms of the management and service of the floating population, China currently lacks a strong and high-level legal system. Especially in terms of smart government governance, there are still some blanks of laws, regulations and policies [23]. In addition, in terms of information disclosure system, China’s government information disclosure work is still based on ‘information disclosure regulations’ (hereinafter referred to as ‘regulations’), lack of legal norms and constraints. The lack of precision and prudence in ‘ regulations’ relative to the law, and the necessary external oversight of the information disclosure system is not perfect, easily resulting in government inaction. In the era of big data, the governance of smart government is more dependent on data. The continuous development of information science and technology makes the decision-making of government departments more dependent on the analysis and use of data. The government collects data in various fields, and then analyzes and stores them. Each step is a new challenge for the government. Faced with important data resources, Governments need to make information public while ensuring their security [24]. 3.2 Big Data Core Technology is Still Lacking By observing the development trend of foreign countries, it can be seen that the government, as the producer and processor of a large number of information, as well as the largest owner of public information and even personal privacy information, needs more information science and technology to improve its governance level in all aspects of information data collection, extraction, analysis and storage [25]. Using big data technology and new tools to improve the processing speed and efficiency of information data is an important part of improving the government’s social governance ability and creating a harmonious society atmosphere. However, in the field of information science and technology such as big data, China’s government still lacks relevant technical personnel, and the technical level is not high. In the face of the challenges of emerging new situations, the technical requirements for China’s big data core mastery are getting higher and higher. 3.3 Lack of Necessary Technical Supervision Mechanism Big data information resources are not only one of the three major production factors in today’s society, but also the core strategic resources. Therefore, it is necessary to supervise them [26]. However, at present, China lacks hard laws and regulations that rely on the national coercive force to protect it, and also lacks soft law constraints on security and confidentiality. Thus, there is a serious security risk of privacy data information, and it is prone to leakage or infringement of citizens’ privacy information. In the stage of data collection, smart government lacks legal and effective authorization
Problems and Countermeasures
693
mechanism and effective constraint system; in the data storage phase, there is no security encryption system maintenance and related institutional constraints for the preservation and retrieval of data information; in the stage of data transportation, there is no reasonable personal accountability mechanism and archiving system; in the stage of data analysis and processing, there is a lack of specified original support. Due to the lack of necessary regulatory mechanisms for the smart government at all stages of information data, there is a conflict between the smart government and the protection of personal information, which cannot protect the legitimate rights and interests of citizens, and ultimately slow down the protection of citizens’ legitimate rights and interests and the construction of public service system. Moreover, smart government builds a legal system to ensure the effectiveness of data information. However, in the process of practice, the “personal information protection law” for the protection of data information is small, unable to prevent the occurrence of data information violations from the root causes, so that the legitimate rights and interests of citizens have been destroyed [27]. 3.4 Failure to Construct Dynamic Network Collaborative Governance System The concept of ‘collaborative governance’ was first proposed at the Fourth Plenary Session of the Sixteenth Central Committee of the CPC in 2004. After more than ten years of exploration and development, some achievements have been made, but some challenges remain. For example, with the continuous development of information science and technology, collaborative governance is not closely related to science and technology such as big data, and the dynamic network collaborative governance system is not fully perfect. Dynamic network governance is a social governance mode that builds a cloud platform for intelligent governance through new information science and technology such as big data, cloud computing and the Internet, and promotes the network governance of basic information, so as to realize networking, intelligence, precision, dynamic and collaborative. But at present our country has not yet established the effective dynamic network governance system, some government departments coordination degree is still low, administrative organs at all levels still exist ‘information isolated island’ phenomenon. Smart government governance is a complete and efficient governance system. In this system, all departments need close collaboration to maximize the effectiveness of resources [28]. However, in practice, government departments at all levels build smart government governance platforms, which makes it difficult to achieve cross-regional information sharing between governments at all levels and departments, and it is also difficult to achieve high coordination between governments, governments and departments, departments and departments.
4 Countermeasures and Suggestions 4.1 Perfecting the Legislation of Open Government Big Data The better construction of smart government in China is to pay attention to the institutional construction of smart government governance. System construction first needs to establish and improve the corresponding laws and regulations, but also need to formulate
694
Z. Pei and Y. Wang
unified norms and standards of smart government governance. China can refer to the experience of other countries, such as big data disclosure should be strictly improved system security system, the core is to improve the relevant laws and regulations as soon as possible [29]. Law is the embodiment of the state’ s coercive force, the rigorous expression of the system, and the disclosure of information by the smart government through the network platform, which makes the public really enjoy the right to know and the right to supervise, which is an effective means of supervising government behavior. The construction of the legal system of government information disclosure is an effective means of administrative control, and also the guarantee and path for the public to supervise the government [30]. Information disclosure and sharing should have perfect corresponding legal requirements or policy management methods, and should be constrained from legal and policy provisions to ensure data security and privacy protection. To build a smart government, it is necessary to continuously improve and perfect a series of laws and regulations such as big data openness, technology, standards, security, application and supervision, accelerate the interoperability, co-construction and sharing of government public data resources, create a good new environment for big data governance of big data openness, cooperation and sharing, and realize the scientific, humanistic and accurate management, decision-making and service of government public administration. 4.2 Shared Development of Big Data and Related Technologies Big data has a large number of data size, a variety of data types and rapid data flow characteristics. To build a smart government, it is necessary to ensure that the data information received by the government in front of a large database and the data information relied on for making decisions are accurate, and the decisions made should be reasonable and timely and effective. This requires the government to realize the classification and induction of data information on the basis of data sharing, and construct different data systems, which has strict requirements and standards for science and technology. For example, cloud computing technology under big data can process data quickly and accurately. In addition, core technologies such as irreversible distributed ledger system, asymmetric encryption technology, and complex mathematical algorithms of blockchain technology provide the possibility to solve problems that are difficult to solve such as openness, security, and authenticity of data [31]. Combining big data technology with blockchain technology has obvious advantages in data management of smart government. Therefore, it is necessary to improve the combination of big data and its related information science and technology with big data technology, which has obvious advantages in government data governance. Therefore, it is necessary to upgrade big data and related technologies to continuously promote the development of smart government construction. 4.3 Perfecting the Supervision System of Smart Government Governance System In the era of big data, improving the supervision system of smart government governance system helps to strengthen the internal and external supervision, thereby protecting the legitimate rights and interests of citizens and preventing the abuse of government power. To strengthen the internal supervision and supervision system of the smart government
Problems and Countermeasures
695
governance system, it is necessary to formulate the power list system and actively develop the network supervision platform. It is necessary to take the combination of online and offline supervision, and integrate the functions and powers of various government departments into the network comprehensive evaluation and supervision platform from a series of processes of granting power, exercising subjects, operating processes and corresponding responsibilities, so as to effectively link up various execution links and achieve supervision in each link. In addition, to strengthen the external supervision system of the smart government governance system, we should fully integrate the national conditions and establish a sound social three-dimensional network supervision system. On the one hand, social enterprises actively cultivate new regulatory technical talents in the era of big data, improve the hard power of dealing with technical loopholes, and effectively maintain network security; China should also combine the national conditions, the legislation should be implemented to the details, so as to improve the social three-dimensional network supervision system. On the other hand, social organizations and citizens should give full play to the collective power, make full use of the network supervision of public opinion, and supervise the unreasonable behavior of the government, so as to achieve the combination of internal and external coordination, and conduct comprehensive supervision of government behavior. 4.4 Governance Mode of Innovation Dynamic Network Collaboration With the continuous development of information science and technology, various elements of social economy flow in space at a faster speed, and social development becomes more and more challenging and uncertain. Therefore, to construct smart government and promote smart government governance, it is necessary to innovate the mode of dynamic collaborative governance of network in order to be more suitable for the increasingly complex social environment. First of all, the governance body multiple coordination. Based on big data technology, it integrates the power of various organizations, various strata and various teams in society to participate in government governance, and forms a pluralistic collaborative governance model with government-led, departmental joint, enterprise support and social participation. Secondly, promote online and offline collaborative governance. On the one hand, online information interaction platform is constructed through Internet technology to achieve information and data exchange, build social governance database, capture dynamic focus in real time, accurately understand and effectively respond to problems and challenges in society. On the other hand, online and offline through the existing grid, the active participation of all kinds of subjects is brought into play, and social pluralistic cooperation is promoted to achieve social coordination. Finally, the governance mode is dynamically coordinated. The use of law, market, administration and other governance methods, and with the help of big data, cloud computing, Internet and other science and technology all-round dynamic perception of social focus, timely adjustment of government governance, in order to more efficient response to complex changes in the social environment.
696
Z. Pei and Y. Wang
5 Complimentary Close Under the background of big data, in order to realize the modernization of government governance system and governance ability, it is necessary to reshape the government, build smart government and realize the smart governance of the government. Information science and technology such as big data not only triggered technological and industrial revolutions, but also led to changes in government governance methods. At present, the development of information science and technology to promote government reengineering and governance reform has become a global trend. Therefore, promoting the construction of smart government governance with the help of emerging information science and technology to effectively deal with and solve the increasingly complex social environment is an urgent problem that contemporary government management researchers and practitioners must pay attention to. Due to the important role of big data technology in government governance, governments at all levels in China should pay attention to the R&D and in-depth integration of the application of information science and technology in government governance, so as to promote the development of smart government governance and the modernization of government governance capacity.
References 1. Hu, S., Wang, H., Mo, J.: Research on smart government governance innovation based on big data. Exploration (01), 72–78+2 (2017) 2. Smart Government Governance: The way of government governance reform in the era of big data. E-government (05), 85–92 (2018) 3. Chinese Government Website: ‘Notice of the State Council on the Issuance of Action Plan for Promoting Big Data Development [EB/OL] (2015). http://www.gov.cn/zhengce/content/ 2015-09/05/content_10137.htm. Accessed 10 Mar 2015 4. Wu, T.: Research on ‘precise’ decision-making of smart government from the perspective of big data governance. J. Yunnan Univ. Adm. 19(06), 110–115 (2017) 5. Boli, G., De Marco, A., Perfetti, F., et al.: A new taxonomy of smart city projects. Transp. Res. Procedia 3, 470–478 (2014) 6. Huo, L.: Willow. Research on smart government service mode under cloud architecture. Mod. Intell. 36(7), 3–6 (2016) 7. Wu, J.: Countermeasure research on promoting the development of smart city by smart government construction. China Inf. Ind. 5, 24–26 (2011) 8. Gil-Garcia, J.R., Helbig, N., Ojo, A.: Being smart: emerging technologies and innovation in the public sector. Gov. Inf. Q. 31, 1–18 (2014) 9. Lv, Z., Li, X., Wang, W., et al.: Government affairs service platform for smart city. Futur. Gener. Comput. Syst. 3, 21–33 (2017) 10. Alenezi, H., Tarhini, A., Sharmas, K.: Development of quantitative model to investigate the strategic relationship between information quality and e-government benefits. Transform. Gov. People Process Policy 9(3), 324–351 (2015) 11. Howard, R.,Maio, A.D.: Hype cycle for smart government, 2013 [EB/OL]. (2013–07–22). https://www.gartner.com/doc/2555215/hype-cycle-smartgovernment. Accessed 10 Mar 2015 12. Shang, S., Du, J.: Analysis and path design of smart government function construction under the background of big data. Intell. Theory Pract. 42(04), 45–51 (2019) 13. Fei, J., Jia, H.: Path selection of government APP providing public service platform from the perspective of smart government. E-government 09, 31–38 (2015)
Problems and Countermeasures
697
14. Song, L., Lu, M.: Review of domestic research on smart government (2005–2015). Mod. Econ. Inf. (11), 122–123+125 (2016) 15. Zhang, J., Zhu, J., Shang, J.: Summary of research status and development trend of smart government at home and abroad. E-government 08, 72–79 (2015) 16. Xia, Y.: International comparison of open government data strategy and China’s countermeasures. E-government 7, 45–56 (2017) 17. Geng, Q., Sun, Y., Liu, X.: China Government Information Development Report - Smart Government, Government Data Governance and Data Opening, p. 107. Beijing University of Posts and Telecommunications Press, Beijing (2017) 18. Yun, Q., Suo, Z., Huang, S.: Report on internet development and governance (2017). J. Shantou Univ. Human. Soc. Sci. 33(11), 20–23 (2017) 19. Waldo, D.: The Study of Public Administration, p. 55. Doubleday, New York (1955) 20. Que, T.: National governance in China’s cyberspace: structure, resources and effective intervention. Contemp. World Social. (02), 158–163 (2015) 21. Shen, F., Zhu, J.: Smart government governance in the era of big data: advantage value, governance limit and optimization path. E-government (10), 46–55 (2019) 22. Zhang, L.: Smart City Governance Research. CPC Central Party School, Beijing (2015) 23. Hu, S., Wang, H., Mo, J.: Research on smart government governance innovation based on big data. Exploration (01), 72–78+2 (2017) 24. Cao, L.: Big data innovation: EU open data strategy research. Inf. Theory Pract. (4), 118–122 (2013) 25. European Commission: Open Data: An Engine for Innovation,Growth, and Transparent Governance [R/OL]. http://eurlex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2011:0882: FIN:EN:PDF. Accessed 05 Dec 2015 26. Li, Z.: Internet + Government Services, Open a New Era of Intelligent Government, p. 101. China Railway Press, Beijing (2017) 27. Shen, F., Zhu, J.: Smart government governance in big data era: advantage value, governance limits and optimization path. E-government 10, 46–55 (2019) 28. Zhang, L.: Research on Smart City Governance. CPC Central Party School, Beijing (2015) 29. Shen, G.: Data sovereignty and national data strategy in the era of big data. Nanjing Soc. Sci. 6, 113–119 (2014) 30. Su, Y., Ren, Y.: Construction and improvement of government information disclosure system under the background of big data - and on the enlightenment of the frontier development of foreign transparent government practice to China. Libr. Inf. 02, 113–122 (2016) 31. Han, Y.: Research on the problems and countermeasures in the construction of rule of law government under the background of big data. Legal Syst. Soc. 35, 102–103 (2019)
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy Yunyi Bai, Mang Xu, and Pengjiang Qian(B) School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, Jiangsu, China [email protected]
Abstract. Conventional Raman spectroscopy, which is based on the qualitative or quantitative determination of substances, has been widely utilized in industrial manufacture and academic research. However, in traditional Raman spectroscopy, human experience plays a prominent role. Because of the massive amount of comparable information contained in the spectrograms of varying concentration media, the extraction of feature peaks is especially crucial. Although manual feature peak extraction in spectrograms might reduce signal dimensionality to a certain extent, it could also result in spectral information loss, misclassification, and underclassification of feature peaks. This research solves the problem by extracting a feature dimensionality reduction method based on an auto-encoder-attention mechanism, applying a deep learning approach to spectrogram feature extraction, and feeding the features into a neural network for concentration prediction. After rigorous testing, the model’s prediction accuracy may reach a unit concentration of 0.01 with a 13% error, providing a reliable aid to manual and timely culture medium replenishment. And through extensive comparison experiments, it is concluded that the self-encoder-based dimensionality reduction method is more accurate compared with the machine learning method. The research demonstrates that using Raman spectroscopy to deep learning can produce positive outcomes and has great potential. Keywords: Raman spectroscopy · Deep learning · Auto-encoder · Attentional mechanisms
1 Introduction Raman spectroscopy is currently widely utilized in industry, food, and biotechnology as an accurate material detection tool with easy data capture, speed, and high accuracy for qualitative or quantitative study of material composition. The processing of spectrograms and the analysis of spectrum data are critical steps in the quantitative measurement of substances, and the regression algorithms used to do so have a direct impact on the spectral data’s accuracy. It has been used to analyze substance concentrations in both qualitative and quantitative ways. The typical Raman spectroscopy processing technique includes numerous steps, and many of them, such as de-baseline correction [1] and smooth-denoising, rely on human © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 698–708, 2022. https://doi.org/10.1007/978-3-031-13832-4_57
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy
699
expertise to complete the experiments. To discover the feature peaks in the spectrum map, start by downscaling the high-dimensional data to get the key features, then use principal component analysis, random forest, or other approaches to finish particular tasks. The purpose of dimensionality reduction is to extract the features that are useful for the regression task and discard those that are of useless. In addition, there is serious covariance between the wavelength points [2], using all dimensions of the spectrogram not only increases the complexity of the model computation but also introduces unnecessary noise to affect the prediction results. As a result, the selection of feature peaks is crucial. Traditionally, there are three types of feature peak selection, the first one is to explore the feature peak intervals one by one, using the statistical information related to the model, and then decide which feature peak intervals are needed, such as interval partial least squares (IPLS) [3], moving window partial least squares (MWPLS), etc. [4]. The second type of method is to select the peaks according to their covariance index, regression index, and other indicators, such as competitive adaptive reweighted sampling (CARS) [5], partial least squares uninformative elimination (UE-PLS) [6]. The third category is the algorithms for optimization problems. For instance, genetic algorithm (GA), and simulated annealing (SA) to select feature peaks. In the past few years, With the evolution of hardware techniques, deep learning has continued to develop which has been widely used and achieved rich results in image processing, autonomous driving, speech recognition, etc. A CNN is an important part of deep learning, which is a kind of feed-forward neural network and has shown powerful classification and regression ability in many fields. For example, classical convolutional neural networks: Alexnet [7], VGG [8], Resnet [9], etc. With the advancement of deep learning, auto-encoder [10] Compared with the traditional classical PCA [11] algorithm, auto-encoder is an unsupervised deep learning algorithm for dimensionality reduction, which can learn the nonlinear feature representation that cannot be learned by PCA and can better learn advanced semantic features for high-dimensional data. This work utilizes the self-learning feature of neural networks to obtain the corresponding weight information of each feature peak and achieves the purpose of feature peak selection by reconstructing the input information to retain the large-weighted feature peaks and eliminate the small-weighted feature peaks, meanwhile, further improves the experimental effect when predicting the concentration with the attention mechanism. Our proposed self-encoder-attention mechanism method does not require human manual feature peak selection, and the prediction results are more stable than some traditional machine learning methods, and better results are achieved in the experimental results.
2 Related Work 2.1 Raman Spectra of Single Component Glucose Medium A total of 17 concentrations of glucose medium concentrations were selected, and their Raman spectra ranged from 0 to 3400 cm−1 , as the concentrations were different, the corresponding characteristic peaks were different, which led to different Raman spectrograms for each concentration. With a total of 170 samples, the key to effective identification of the spectrograms of different concentrations lies in the selection of the
700
Y. Bai et al.
characteristic peaks. As shown in Fig. 1, which shows the Raman spectra of the singlecomponent medium samples, it is easy to see that in some regions, the curves of different concentrations are nearly overlapping, and the features in these overlapping regions are not useful for the experiment, i.e., they are features to be discarded when performing feature extraction. In other regions, the curves do not overlap, and the features in these regions are the features that are useful for the experiments, which are also called feature peaks, and it is critical to make the most of these usable features for feature extraction.
Fig. 1. Raman spectra of single component glucose medium samples.
2.2 Spectrograms of Multi-component Mixtures In the previous section, we introduced the Raman spectrogram of single-component glucose medium, based on which we extended the original data and used a Raman spectroscopy detector to obtain multi-component Raman spectroscopy data. In this batch of data, besides containing glucose, bacterial substances were also added. As the experiment proceeded, the glucose concentration in the medium was gradually decreasing while the bacterial content was showing an increasing trend. The Raman spectrometer is real-time detection of component concentrations in the medium, generating seven spectral data per minute, and we selected the spectral data every 30 min and recorded the concentration of each component. As shown in Fig. 1(b), the spectra of the mixtures overlap less, indicating that the diversity of the components has a greater impact on the spectra and the multi-component data is more challenging for the model, while at the same time there may be interactions between the components, resulting in some noise characteristic peaks. 2.3 Auto-encoder The encoder’s job is to convert the high-dimensional input x into a low-dimensional implied variable, which allows the network to learn the most valuable characteristics out of the many available. As for the decoder, the role of the decoder is to reduce the implied
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy
701
variable a to the initial dimension, i.e., to obtain the reconstruction xR . A good selfencoder is one in which the output of the decoder is almost a complete approximation of the original input. A 3-layer stacked self-encoder is utilized in this study, as illustrated in Fig. 2, with 128, 64, and 128 neuron connections in the hidden layer, respectively.
Fig. 2. The hidden layer is a 3-layer Auto-encoder model
The original data x is encoded from the input layer to the hidden layer during the encoding process. a = σ (w1 x + b1 )
(1)
Decoding process: from the intermediate layer, i.e. the hidden layer, to the output layer. xR = σ (w2 a + b2 )x
(2)
where W1, W2 is the weight parameters, b1, b2 are the bias terms, and σ is the activation function, here Relu is chosen as the activation function. Optimization objective function. MinimizeLoss =
N 2 1 x − xR N
(3)
n=1
Adding a nonlinear activation function to the encoded linear combination to reconstruct the input data using the new features obtained after encoding is a very effective and practical means of feature extraction. 2.4 Attention Mechanism An attention mechanism is commonly known as a resource weighting mechanism, which reallocates resource weights based on the importance of data in different dimensions, and is centered on recalculating to highlight certain important features based on the correlation between the original data. Many kinds of attention mechanisms have emerged,
702
Y. Bai et al.
such as the latest attention algorithm [19] and applying self-attention from natural language processing (NLP) [15] Applied to computer vision tasks, the feature map with attention is obtained by weighting and summing the values of the Query, Key vector after similarity calculation with the Value vector. Spatial attention [16] and channel attention [17]. The former retains the key spatial information of the original image while transforming it into another space to focus on the important regions, while the latter assigns weights to the image channel dimensions. Based on this, CBAM emerges [18] to obtain the attentional feature map by tying together the channel attention and spatial attention mechanisms. We provide an attention mechanism in this paper that is comparable to Senet [16], with the variations outlined in Sect. 3.
3 Algorithm Design Algorithm 1 illustrates the flow of our algorithm design. The original data is the spectral data acquired every minute by the Raman spectroscopy detection probe instrument, and while saving the spectrogram, the concentration of single-component substances and the concentration of individual multi-component substances are recorded. Since our algorithm model cannot read the .spc file format, the .spc spectral file is processed by obtaining and storing the coordinates of each data point of the spectrogram in a twodimensional array and adding the previously recorded substance concentration values as labels to the two-dimensional array where the spectral data are stored.
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy
703
In the same way, this paper adds the attention mechanism in deep learning to the model inspired by Senet, which assigns weights to different features based on the channel dimension for the purpose of weight assignment, thus making the neural network pay more attention to the features with higher weights. The original Senet is used for image processing, where the image is usually composed of length, width, and channel. After several layers of convolution, the number of channels increases, and the size becomes smaller, at which point the Senet adaptively pools the length and width to 1 and only weights the channel dimension. Our data does not have the attributes of length and width, so the pooling process is omitted, the weights are directly weighted, and the obtained weights are spliced with the original data to get the weighted data, which will be beneficial for the subsequent prediction or classification tasks. The next step is the construction of the algorithm model. In selecting the self-encoder, we choose the encoder and decoder based on the Dense layer, and the number of codec layers is all 3 layers, with 128, 64, and 64 neurons per layer, respectively. We also choose the Dense layer-based attention mechanism, with the number of layers set to 2. The model is then back-propagated to update the parameters, using the fully connected layer for prediction, and the predicted value is lost in mean square error with the real value.
4 Experiment The experimental part of this paper will use multiple algorithms and deep learning methods to compare the results of the algorithmic models by selecting single- and multicomponent Raman spectral data of glucose culture media, calculating the root mean square error of each algorithm, by analyzing the fitted curves of the true and predicted values, and by comparing the convergence speed of the neural network model with the fully connected layer and the convolutional layer neural network model. 4.1 Traditional Methods and Neural Networks To reflect the advantages of the proposed method in this paper, firstly, some classical algorithms from machine learning methods were selected separately for comparison on the Raman spectral data obtained from a single-component glucose medium. As shown in Fig. 3, the results were obtained by CARS, PCA, and NCA [12] by reducing the dimensionality of Raman spectral data to 117, 68, and 49 dimensions, respectively. The overall trend of the basic fitted curves may be noticed is roughly the same despite the different dimensionality of feature extraction, except that there is a lack of inaccuracy, so the fitting effect of the algorithm needs to be further improved. Figure 4(a) shows the same single-component data as Fig. 3, and Fig. 5 shows the fitting results obtained by the self-coding-attention mechanism, and the results are marked enhancement over conventional algorithms. From the results obtained by the above machine learning method, we added deep learning into it for improvement. Figure 4 shows the fitting curves of multi-component mixtures obtained by machine learning and deep learning algorithms, and it can be
704
Y. Bai et al.
Fig. 3. (a)(b)(c) are the fitted curves of predicted and true values of CARS, PCA, and NCA in order (single component).
Fig. 4. Fitting curves (multicomponent mixtures) for PCA conventional method (a) and selfencoder (b).
seen that the fitting curves obtained by the deep learning self-encoder method in multicomponent glucose medium are better and more accurate. In the experiments, we also found that the conventional PCA method is less stable, the fitting effect is sometimes good and bad, and it is not suitable to handle the real-time Raman spectroscopy detection task, while the deep learning method is not only better but also more stable. 4.2 Self-encoder and Attention Mechanism In Sect. 4.2, we predicted the concentration of culture-based Raman spectral data, and the results proved that the self-encoder method using deep learning is better than machine learning and traditional neural network methods, but at the same time, it is easy to see that the results obtained by using the self-encoder alone are still lacking in fitting accuracy, so to further improve the fitting effect. In order to further improve the fit, we add the attention mechanism to the self-encoder model, and the selection of the attention mechanism is described in Sect. 3. To demonstrate the reliability and robustness of the model, we also compared the fitting curves of single- and multi-component glucose
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy
705
Fig. 5. Self-encoder-attention mechanism fitting curve, single-component (a) multi-component (b).
medium concentrations: Fig. 5 shows that the results of feature extraction based on the self-encoder with the attention mechanism are better than those of the traditional method mentioned above, and after several experiments, we observed that the fitting curves of our method are more stable and there is no model collapse. The predicted and true values are basically on the same curve. Figure 5(b) shows a significant improvement in the fitted curve using the self-encoder-attention mechanism compared to the experimental results in Fig. 4 and Fig. 5. In addition to this, it can be observed that although the fitted curve is intuitively better for the single component, the concentration values for the single component span from 0 to 50 units of concentration, while the multi-component is from 0 to 4 units of concentration. For further illustration, we compared the results with the real sample concentrations by random sampling: as can be seen from Table 1, among the 10 different concentrations selected for comparison, the results of AE-Attention are the closest to the real values, with errors in the range of 0.3% to 1.7%, and similarly, for the six different concentration species in Table 2, the error can be as small as 0.01 concentration units, which satisfies the error in the practical, Therefore, the experiment has reliable practicality. Table 1. Comparison of samples and predicted for different concentrations (single components) Label
0
0.1
5
15
50
CARS [5]
0.35 ± 0.13
0.33 ± 0.18
4.25 ± 0.64
14.65 ± 1.28
49.24 ± 1.83
PCA [11]
0.7 ± 0.22
0.16 ± 0.09
4.45 ± 1.66
14.55 ± 1.87
48.47 ± 2.21
NCA [12]
1.02 ± 0.37
0.80 ± 0.23
5.48 ± 1.89
15.97 ± 1.65
49.26 ± 0.91
GBR [13]
0.30 ± 0.10
0.37 ± 0.18
5.82 ± 2.37
15.63 ± 2.67
48.93 ± 3.43
RFR [14]
0.18 ± 0.11
0.56 ± 0.35
5.30 ± 2.28
16.50 ± 3.11
49.63 ± 3.03
AE-Attention
0.11 ± 0.05
0.08 ± 0.04
5.06 ± 0.53
15.34 ± 0.66
50.12 ± 0.38
706
Y. Bai et al.
Table 2. Comparison of samples and predicted for different concentrations (multiple components) Label
0.27
0.57
1.158
1.708
3.79
CARS [5]
0.22 ± 0.11
0.38 ± 0.14
1.30 ± 0.23
1.55 ± 0.61
3.67 ± 0.27
PCA [11]
0.35 ± 0.20
0.07 ± 0.58
0.87 ± 0.31
1.48 ± 0.27
3.57 ± 0.38
AE-Attention
0.24 ± 0.08
0.62 ± 0.04
1.26 ± 0.13
1.71 ± 0.04
3.78 ± 0.02
Comparing the convergence speed and the number of parameters and accuracy of fully connected layer neural network (NN) and convolutional neural network (CNN) in the prediction task: In the classification stage after extracting the features, we used CNN and NN for comparison experiments, Fig. 6 illustrates that the accuracy obtained by using N is higher than that obtained by using convolutional neural network (CNN) for classification, and the corresponding convergence speed is As the parameters of NN are more than those of CNN, the accuracy of NN is also a little higher than that of CNN, and in the context of accuracy, we prefer to use NN to complete the task.
Fig. 6. Comparison of convergence speed of NN (left panel) and CNN (right panel).
The root means square error (MSE) is a measure that responds to the degree of difference between the predicted and true values, and the actual effect of the model can be visualized by calculating the root mean square error of different algorithms. As shown in Table 3, comparing the five algorithms for extracting features and the use of fully connected layer neural networks (NN) and convolutional neural networks (CNN) for comparison in prediction, it can be seen that the method based on the self-encoder + attention mechanism is the best in both NN and CNN conditions. The same Table 4 for the multicomponent MSE error yields the same experimental results as Table 3. At the same time, also from Table 5, the results obtained without dimensionality reduction feature extraction are poor because the presence of many noisy and useless features has an impact on the results.
Application of Auto-encoder and Attention Mechanism in Raman Spectroscopy
707
Table 3. Root mean square error values of different algorithms for a single component MSE
CARS [5]
PCA [11]
NCA [12]
Select-Percentile
AE-Attention
NN
0.272 ± 0.12
0.67 ± 0.14
0.539 ± 0.22
4.464 ± 2.43
0.19 ± 0.08
0.4 ± 0.18
0.76 ± 0.26
1.609 ± 0.89
3.55 ± 2.02
0.18 ± 0.072
CNN
Table 4. Root mean square error values of different algorithms for multi-component MSE error
PCA [11]
NCA [12]
AE-Attention
NN
0.060 ± 0.021
0.053 ± 0.014
0.019 ± 0.008
CNN
0.065 ± 0.014
0.058 ± 0.011
0.026 ± 0.010
Table 5. MSE metrics for GBR and RGR
MSE error
GBR [13]
RFR [14]
0.642 ± 0.38
0.433 ± 0.27
5 Results and Discussion The deep learning approach in this paper is done in the TensorFlow framework based on Python, using an Intel(R) Core(TM) i5-8500 CPU. In this work, we propose a Raman spectral processing algorithm based on a selfencoder-attention mechanism, which transforms some methods of total deep learning image processing applied to regression modeling in several different concentrations of culture media, comparing with several commonly used traditional algorithms, and applying deep learning methods in which the consumption of substances in glucose medium can be observed in time in biological fermentation experiments. In order to facilitate timely replenishment and recording of data, yielding practically meaningful results. The method can next be used to predict or classify Raman spectrograms of more complex multi-component mixtures or other substances in the medium, which has good research value.
References 1. Zheng, Y., Zhang, T., Zhang, J., et al.: Study on the effects of smoothing, derivative, and baseline correction on the quantitative analysis of PLS in near-infrared spectroscopy. Spectrosc. Spectral Anal. 2004(12), 1546–1548 (2004) 2. Martens, H., Naes, T.: Multivariate Calibration. Wiley, Hoboken (1992) 3. Norgaard, L., Saudland, A., Wagner, J., et al.: Interval partial least-squares regression (iPLS): a comparative chemometric study with an example from near-infrared spectroscopy. Appl. Spectrosc. 54(3), 413–419 (2000)
708
Y. Bai et al.
4. Chen, H., Tao, P., Chen, J., et al.: Waveband selection for NIR spectroscopy analysis of soil organic matter based on SG smoothing and MWPLS methods. Chemometr. Intell. Lab. Syst. 107(1), 139–146 (2011) 5. Li, H., Liang, Y., Xu, Q., et al.: Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648(1), 77–84 (2009) 6. Polanski, J., Gieleciak, R.: The comparative molecular surface analysis (CoMSA) with modified uniformative variable elimination-PLS (UVE-PLS) method: application to the steroids binding the aromatase enzyme. ChemInform 34(22), 656–666 (2003) 7. Technicolor, T., Related, S., Technicolor, T., et al.: ImageNet classification with deep convolutional neural networks [50] (2012) 8. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014) 9. He, K., Zhang, X., Ren, S., Sun, J.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 10. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313 (2006) 11. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987) 12. Roweis, S.: Neighborhood component analysis (2006) 13. Wang, J., Peng, L., Ran, R., et al.: A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl. Sci. 8(5), 689 (2018) 14. Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001) 15. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) 16. Jaderberg, M., Simonyan, K., Zisserman, A.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015) 17. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 18. Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1 19. Guo, M.H., Xu, T.X., Liu, J.J., et al.: Attention mechanisms in computer vision: a survey. arXiv preprint arXiv:2111.07624 (2021)
Remaining Useful Life Prediction Based on Improved LSTM Hybrid Attention Neural Network Mang Xu, Yunyi Bai, and Pengjiang Qian(B) School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Wuxi 214122, Jiangsu, People’s Republic of China [email protected]
Abstract. In recent years, data-driven fault prediction and health management (PHM) methods based on sensor data have achieved rapid development. Predicting the remaining useful life (RUL) of mechanical equipment is not only efficient in averting abrupt breakdowns, but also in optimizing the equipment’s operating capacity and lowering maintenance expenses. This study proposed a prediction model based on an improved LSTM hybrid attentional neural network to better forecast the RUL of mechanical equipment under multi-sensor conditions. The temporal pattern attention (TPA) module uses the features extracted by the LSTM module to weight their relevant variables and increase the model’s capacity to generalize to complex data sets. In comparison to the current mainstream RUL prediction methods, the improved LSTM hybrid attentional neural network has better prediction performance and generalization capability on the turbofan engine simulation dataset (C-MAPSS) after experimental tests. Keywords: RUL · LSTM · Attention mechanism
1 Introduction The twenty-first century has witnessed significant improvements in telecommunications and computer science, and the machinery and equipment used in industry have grown increasingly sophisticated, placing greater demands on prognosis and health management (PHM) [1–5] systems of machinery and equipment. An accurate RUL forecast technique for machinery and equipment in an industrial environment may ensure the smooth functioning of machinery and equipment to the greatest degree possible within the safety range, as well as a considerable improvement in the efficiency of equipment repairing and maintenance. Currently, the two most common RUL forecasting approaches are model-based and data-driven approaches. Traditional inspection methods rely significantly on personnel who have substantial knowledge in operating circumstances, material qualities, mechanical structures, and failure mechanisms, whereas model-based methods do not. It employs empirical data to develop mathematical models that can represent the deterioration process of machinery throughout its operational cycle, such as autoregressive moving average (ARMA) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 709–718, 2022. https://doi.org/10.1007/978-3-031-13832-4_58
710
M. Xu et al.
models [6], Wiener process models [7, 8], Markov models [9], and so on. However, in today’s industrial setting, the state of a single piece of machinery or equipment may be monitored by a network of sensors working together. A high number of sensors may create massive amounts of monitoring data, but traditional model-based techniques are frequently incapable of dealing with it. As a result, determining how to infer and evolve the deterioration information of machinery and equipment during the operation cycle from massive amounts of monitoring data and establishing the RUL prediction model from it has become a critical challenge that must be resolved as soon as possible. A data-driven strategy was created in this scenario. Deep learning techniques, in particular, have demonstrated impressive performance in terms of learning based on massive amounts of data, hence developing the mapping capability between data and RUL. Among them, CNN [10] and RNN [11]-based techniques are extensively utilized and produce accurate results. Li [12] presented DCNN, which employed normalized raw sensor ranking data as input and accomplished RUL estimation using raw sensor data. The CNN-based technique, on the other hand, ignores the correlation of time series data in time series. RNNs, particularly LSTM [13] neural networks, develop long-term relationships by remembering past information in sensor data and so excel at time series prediction. Hinchi [14] employed CNN and LSTM networks to predict RUL; the CNN and LSTM networks are sequential; first, local characteristics are extracted using CNN, and these features are subsequently fed into the LSTM module for RUL prediction. These existing deep learning-based prediction methods define the equal contribution of input data obtained from different sensors to the output during network construction, but in practice, the data monitored by various sensors of various types and locations frequently contain varying degrees of degradation information exhibited by the mechanical device at a given moment. As a result, this uncertain information will have an impact on the network’s prediction performance, leading in less accurate and generalized machine RUL predictions. The proposed improved LSTM hybrid attention neural network uses the attention mechanism to weight the multi-variate sensor information derived by the LSTM neural network, which may overcome the shortcomings of previous approaches and more precisely and effectively estimate the remaining life of mechanical devices. When compared to the existing mainstream RUL prediction approaches, the improved LSTM hybrid attentional neural network shows greater prediction performance and generalization ability on the turbofan engine simulation dataset (C-MAPSS) [15].
2 Related Work 2.1 LSTM Network Because of their network topology, RNNs have an edge over other networks when dealing with time series problems. However, the model training process suffers from gradient disappearance during backpropagation, which RNNs are unable to address, and so LSTMs are formed. The cell structure of the LSTM is seen in Fig. 1 and Fig. 2. LSTM neurons construct LSTM layers by substituting regular RNN hidden neurons once more.
Remaining Useful Life Prediction
711
Fig. 1. LSTM cell unit.
Fig. 2. LSTM layer.
2.2 Temporal Pattern Attention Mechanism The traditional attention mechanism [16] picks important time steps for weighting, but this approach is appropriate for activities with individual information in each time step. However, when information monitored by numerous sensors in real time might lead to the emergence of multidimensional variables within a single time step, variables that are noisy in terms of predictive usefulness cannot be identified. As a result, the temporal
712
M. Xu et al.
pattern attention (TPA) mechanism [17] was developed to pick important variables for weighting within the same time step. In this mechanism, the hidden layer state hi of each time step is obtained through the LSTM module, the dimension of each hi is m, and the hidden layer state matrix H = {ht−w , ht−w+1 , · · · .ht−1 } is obtained. In Fig. 3. First, the hidden layer state matrix C repreH is obtained after LSTM feature extraction keeping the serialized output, Hi,j sents the resulting values of the i-th row vector and the j-th convolution kernel action. Second, attention is weighted. The sigmoid function was normalized to obtain the variable attention weights that facilitate the selection of multivariates. Using the resulting attention weight vector, the weighted sum of H C each row is made to obtain the context vector vt , further add map the output ht of the context vector and the hidden layer after add operation, and finally, obtain the prediction results.
Fig. 3. The time pattern attention module.
2.3 Improved LSTM Hybrid Attention Neural Network The overall model diagram of the improved LSTM hybrid attention neural network is shown in Fig. 4. The model extracts features from time series data using two LSTM layers and 64 hidden layer neurons (the former does not retain the sequence output and the latter does). We focus on the final hidden layer output of the first LSTM and utilize it as the input of the second LSTM layer through the RepeatVector operation between the two layers of LSTM. We feed the LSTM module’s retrieved features into
Fig. 4. Improved LSTM hybrid attention neural network.
Remaining Useful Life Prediction
713
the TPA module to weight its associated variables. The TPA module’s output is also one-dimensional data with 128 characteristics, which is then fed into a fully connected layer with 64 neurons. We employed the Dropout regularization technique, which is extensively used in deep learning, to prevent overfitting. At each iteration, some neuronal units are turned off at random, and the neurons grow less sensitive to the activity of other specific neurons as the iteration goes because other neurons may be turned off at any time. After crossvalidation, the best results are obtained when the implicit node dropout rate is equal to 0.5, as this generates the most network configurations randomly. “ReLU” is the activation function for all of the layers above. After passing through the Dropout layer, the final feature is supplied into the RUL layer, where it is predicted using a sigmoid activation function. We used Adam optimizer throughout the model training procedure. The learning rate was set to 0.001 and the batch to 512. A total of 50 epochs were experienced, with model training and testing alternating.
3 Experiments 3.1 Dataset Introduction The dataset for turbofan engines utilized in this work was obtained from NASA’s CMAPSS platform. This dataset is used to verify the RUL prediction accuracy and stability of the proposed upgraded LSTM hybrid attentional neural network. The dataset shows how an aero-engine comprises various sensor characteristics that vary when the engine begins to function in its initial condition, and as the engine’s working duration increases, some of its components decline in performance until the engine dies. The details of the data set are documented in Table 1. The training set records information for the whole engine cycle, from start to finish, whereas the test set displays information from the engine’s initial condition to a specific instant before failure. The four datasets reveal one failure mode in FD001 and FD002, and two failure modes in FD003 and FD004; meanwhile, FD001 and FD003 are data collected while the engine is driven under one operating condition, whereas FD002 and FD004 are data obtained under six. Table 1. C-MAPSS dataset. Dataset
FD001
FD002
FD003
FD004
Train set
100
260
100
249
Test set
100
260
100
248
Operating conditions
1
6
1
6
Fault modes
1
1
2
2
714
M. Xu et al.
3.2 Data Preprocessing In each subset of the C-MAPSS dataset, 21 sensors separately captured distinct data. Following a thorough examination of the four sub-datasets, distinct constant sensor data were discovered in the four datasets. The first step is to get rid of the continual sensor data. Simultaneously, it is discovered that the data range changes substantially across various sensor data and that the network model cannot be trained directly using such raw data. To standardize the removed data, we employed the Z-score approach. yjN
=
yj − yjmean yjstd
(1)
where yi is the j-th data of the current sensor, yjmean is the average value, and yjstd is the standard deviation. We set the time step of the time series to 30 after evaluating the data in the test set, ensuring that all of the data may be used. Furthermore, the RUL value revealed by the data labels is critical for the prediction network. Because the remaining life is not explicitly stated in the training set, but rather the length of operation from the beginning condition until failure, adding RUL labels to the aero engines in the training set is required. The procedure is to take the largest value from each aero-engine operating cycle and use it as the remaining life label, gradually lowering it until it hits zero, indicating the end of a cycle. Furthermore, the deterioration features are not visible in the early stages of the real mechanical action, therefore mechanical degradation can be neglected. The initial RUL state can be set to a constant threshold for better network performance, and when this threshold is achieved, the RUL is set to a linearly declining state, and after evaluating the data, the threshold is set to 125, and the labels are normalized before joining the network. 3.3 Performance Evaluation Metrics In the article, the performance measures Score [18] and RMSE are utilized to assess the performance of the modified LSTM hybrid attention network model. The Score value and the RMSE number will be reduced if the prediction error is smaller. However, the punishment mechanism for delayed prediction differs between these two metrics; RMSE is symmetric, but Score is asymmetric, therefore the cost for delayed prediction is higher in Score. The Score function and RMSE formulae are as follows: di Q (e− 13 − 1) for di < 0 (2) Score = i=1 di Q 10 − 1) for di ≥ 0 (e i=1 1 Q 2 d (3) RMSE = i=1 i Q di = pRULi − aRULi
(4)
Remaining Useful Life Prediction
715
3.4 Experimental Analysis We execute RUL prediction on each of the four test sets of the C-MAPSS dataset to validate the prediction performance of the upgraded LSTM hybrid attention network model. In Fig. 5, the blue line represents the real label and the red line represents the anticipated RUL, and it is clear that our model’s RUL prediction is extremely near to the actual value.
Fig. 5. Forecast results. (Color figure online)
Enhanced analysis and observation, followed by verification of the accuracy and stability of the improved LSTM hybrid attention network model in RUL prediction. Table 2 and Fig. 6 compare the Score and RMSE findings of the upgraded LSTM Table 2. Score comparison. Method
FD001
FD002
FD003
FD004
MLP
1.79 × 104
7.78 × 106
1.66 × 104
5.56 × 106
RVM
1.50 × 103
1.74 × 104
1.43 × 103
2.65 × 104
CNN
4.42 × 102
1.75 × 103
4.07 × 102
2.38 × 103
DCNN
2.62 × 102
4.58 × 103
2.37 × 102
6.74 × 103
LSTM
2.84 × 102
1.11 × 103
3.52 × 102
1.91 × 103
Proposed
2.39 × 102
8.65 × 102
1.91 × 102
1.28 × 103
716
M. Xu et al.
hybrid attention network model to those of five traditional network models: MLP [19], correlation vector machine (RVM) [20], CNN, DCNN, and LSTM.
17.11 18.97 28.65 19.25 34.25
FD004
11.79 13.96 12.75 16.36 22.12
FD003
15.25 16.03 24.73 17.72
FD002
12.89 13.46 13.32 16.35
FD001
10
23.67
20
30
Proposed
LSTM
77.46
35.35
32.24
78.65
37.22 40 DCNN
50 CNN
60 RVM
70
80
MLP
Fig. 6. RMSE comparison.
On the four datasets, our method outperforms the better performing LSTM model by 3.22%, 4.87%, 7.53%, and 9.81%, respectively in RMSE, while the Score metrics are 8.78%, 22.07%, 19.41%, and 32.98% higher. It can be observed that the method used in this study has significant advantages over traditional MLP and RVM methods, and that when compared to deep learning methods such as CNN, DCNN, and LSTM, the error of this paper’s method is smaller and more applicable. Furthermore, the research demonstrates that the DCNN has superior prediction performance for the FD001 and FD003 datasets, but the LSTM network has superior prediction power for the FD002 and FD004 datasets. However, the upgraded LSTM hybrid attention network model gets extremely high prediction results on all four datasets, demonstrating the proposed model’s great predictive capacity and stability.
Remaining Useful Life Prediction
717
On the test set, engine units were chosen at random to visibly demonstrate the prediction effect of the upgraded LSTM hybrid attention network model. In Fig. 7, black indicates the genuine values while the other colors, in sequence, represent other models. The enhanced LSTM hybrid attentional neural network model employed in this study predicts values closer to the real values and better than the other techniques in the linear degradation stage.
Fig. 7. Random test unit prediction comparison. (Color figure online)
4 Conclusions By merging advanced temporal pattern attention approaches and the LSTM neural network structure, this research provides an improved LSTM hybrid attention network model. After the LSTM neural network extracts data features, the temporal pattern attention mechanism is used to effectively increase the model’s capacity to evaluate and forecast multivariate sensor data. We can infer from testing the prediction model that the prediction model in this work has superior RMSE and Score, as well as improved prediction performance and generalization capacity on diverse data sets. In the future, we will enhance our model, optimize its structure, boost training time efficiency, and make it as light as feasible to improve prediction results accuracy based on attaining lightweight in order to better match actual operating scenarios.
References 1. Huang, X., Li, Y., Zhang, Y., et al.: A new direct second-order reliability analysis method. Appl. Math. Model. 55, 68–80 (2018)
718
M. Xu et al.
2. Zhang, W., Li, X., Ding, Q.: Deep residual learning-based fault diagnosis method for rotating machinery[J]. ISA Trans. 95, 295–305 (2019) 3. Bird, J., Wu, X., Patnaik, P., et al.: A framework of prognosis and health management: a multidisciplinary approach. In: Turbo Expo: Power for Land, Sea, and Air, vol. 4790, pp. 177– 186 (2007) 4. Schacht-Rodríguez, R., Ponsart, J.C., Garcia-Beltran, C.D., et al.: Prognosis & health management for the prediction of UAV flight endurance. IFAC-PapersOnLine 51(24), 983–990 (2018) 5. Li, Y., Liu, K., Foley, A.M., et al.: Data-driven health estimation and lifetime prediction of lithium-ion batteries: a review. Renew. Sustain. Energy Rev. 113, 109254 (2019) 6. Sun, S.L., Liu, L.F.: Optimal linear estimation for ARMA signals with stochastic multiple packet dropouts. Control Decis. 28(2), 223–228 (2013) 7. Tang, S., Yu, C., Wang, X., et al.: Remaining useful life prediction of lithium-ion batteries based on the wiener process with measurement error. Energies 7(2), 520–547 (2014) 8. Zhang, Z., Si, X., Hu, C., et al.: Degradation data analysis and remaining useful life estimation: a review on Wiener-process-based methods. Eur. J. Oper. Res. 271(3), 775–796 (2018) 9. Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden Markov model: analysis and applications. Mach. Learn. 32(1), 41–62 (1998) 10. Lin, Z., Gao, H., Zhang, E., et al.: Diamond-coated mechanical seal remaining useful life prediction based on convolution neural network. Int. J. Pattern Recogn. Artif. Intell. 34(05), 2051007 (2020) 11. Guo, L., Li, N., Jia, F., et al.: A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 240, 98–109 (2017) 12. Li, X., Ding, Q., Sun, J.Q.: Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 172, 1–11 (2018) 13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 14. Hinchi, A.Z., Tkiouat, M.: Rolling element bearing remaining useful life estimation based on a convolutional long-short-term memory network. Proc. Comput. Sci. 127, 123–132 (2018) 15. Saxena, A., Goebel, K., Simon, D., et al.: Damage propagation modeling for aircraft engine run-to-failure simulation. In: 2008 International Conference on Prognostics and Health Management, pp. 1–9. IEEE (2008) 16. Li, X., Yao, C.L., Fan, F.L., et al.: Recurrent neural networks based paraphrase identification model combined with attention mechanism. Control Decis. 36(1), 152–158 (2021) 17. Shih, S.Y., Sun, F.K., Lee, H.: Temporal pattern attention for multivariate time series forecasting. Mach. Learn. 108(8), 1421–1441 (2019) 18. Saxena, A., Goebel, K., Simon, D., et al.: Damage propagation modeling for aircraft engine run-to-failure simulation. In: 2008 international conference on prognostics and health management, pp. 1–9. IEEE (2008) 19. Devadoss, A.V., Ligori, T.A.A.: Forecasting of stock prices using multi layer perceptron. Int. J. Comput. Algorithm 2(1), 440–449 (2013) 20. Gu, Y., Wylie, B.K., Boyte, S.P., et al.: An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data. Remote Sens. 8(11), 943 (2016)
Medical Image Registration Method Based on Simulated CT Xuqing Wang(B) , Yanan Su, Ruoyu Liu, Qianhui Qu, Hao Liu, and Yi Gu School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Avenue, Wuxi 214122, Jiangsu, People’s Republic of China [email protected]
Abstract. In recent years, artificial intelligence (AI) has been widely used in medicine, for example, tumor screening, qualitative diagnosis, radiotherapy organ delineation, curative effect evaluation and prognosis, etc. Medical image registration is a typical problem and technical difficulty in the field of image processing. Specifically, in order to achieve the goal of information fusion, a certain spatial transformation is used to map two images in a set of image data sets to each other, so that the points corresponding to the same location in the two images are matched one by one. Most of the existing algorithms perform rigid registration based on simple features of images or non-automatic registration based on artificial markers. The registration accuracy is not ideal due to the imaging mechanism of different modal images, moreover, calculate consumption is high. In order to overcome these various problems, this essay proposes a new method of image registration based on differential homeomorphic deformation field to realize the same mode image registration between simulated CT and given CT, and that of image fusion using pixel weighted average method. Good registration and fusion imaging performance can still be achieved in the abdomen, which is extremely challenging body part in terms of medical image registration. The use of intelligent medical technology can reduce the workload of doctors and improve the accuracy of registration to make the diagnosis more Scientifically and reliably. Keywords: Intelligent medical technology · Medical image registration · Differential homeomorphism algorithm · Pixel weighted average method
1 Introduction In recent years, with the continuous growth of digital economy, artificial intelligence (AI) has been developing rapidly and now is deeply integrated with many application scenarios, Wise Information Technology of med (WITMED) has become one of research hotspots. Image registration technology has been widely used in computer vision, medical image processing and material mechanics and many other fields. Combining various images and displaying their information on the same image to provide multi-data and multi-information images for clinical medical diagnosis has become a technology with great application value, the accurate and efficient image matching criterion is both a key and a difficult point. For different medical applications, the research and development © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 719–728, 2022. https://doi.org/10.1007/978-3-031-13832-4_59
720
X. Wang et al.
of special registration algorithms is an essential direction of development. At present, the two research directions of medical image registration are full automation and high precision. In order to avoid the over-reliance on imaging specialists in clinical practice and reduce their work intensity, image registration technology ranges from complex and laborious pre-imaging registration based on the positioning devices to a semi-automated approach to human-computer interaction, and then completely completed by the computer automatic registration, whose development is very rapid, the accuracy is gradually improved. The current registration accuracy has reached the “sub-pixel” level [1]. In order to improve the registration accuracy of Computed Tomography (CT) and Magnetic Resonance Imaging (MR), which are the two most common medical images, focusing on the abdomen, this essay proposes an indirect multimodal image registration method, which converts a multimodal image registration problem into a multiple singlemodal image registration problem. The method introduces the great similarity and high precision of the single-modal image registration into the multimodal image registration. Analog CT is introduced as the intermediate medium. The MR image to be registered generates an analog CT image to be registered, and the reference image is still a CT image. The method is a quasi-single-mode image registration, where the analog CT image replaces the original MR image in the traditional registration mode as the image to be registered, and the CT image is taken as the reference image. At the same time, a semi-supervised classification model (SSC-EKE) with maximum knowledge utilization ability is introduced to generate Analog CT [2, 3]. The voxels generated by this method can correspond to the that in MRI, which is helpful to improve the registration accuracy. In clinical diagnosis and treatment of diseases, image fusion can obtain an image that shows lesions more clearly and includes more medical information of the image [4]. Medical image fusion technology plays an important role in the localization of lesions, the diagnosis and analysis of disease, the formulation of treatment plan, and the recording and research of later pathology [5]. Therefore, since the image fusion was proposed, it has become one of hotspots of research and exploration in the field of medical imaging. In this study, we adopt the spatial domain fusion method, since it is intuitive and easy to implement. The main content distribution of this essay is as follows: The second part mainly introduces the related work, the third and fourth parts mainly introduce the research process and results, and then the fifth part is the summary of the proposed method.
2 Related Work 2.1 Laplace Support Vector Machine (LAPSVM) The learning performance of traditional SVM depends on the quality and quantity of training samples to a great extent. When the training label data is insufficient but the quantity of non-label data is sufficient, the classification accuracy is low. Transforming the geometric structure information of edge distribution of labeled data and unlabeled data into manifold regular terms and adding them to the traditional classification supervision algorithm SVM, LAPSVM algorithm extends SVM to a semi-supervised learning algorithm.
Medical Image Registration Method Based on Simulated CT
721
Let be a training dataset, which contains n labeled data and n unlabeled data. The label corresponding to the tag data is denoted as yi . The popular regularization framework can be expressed as: min (
f ∈HK
1 l V (xi , yi , f (xi )) + γA f 2K + γI f 2I ) i=1 l
(1)
where V(·) is the loss function, γA and γI are the coefficients of the two regularization terms. The first term in Eq. (1) controls the empirical risk expressed as a hinge loss function, and the second item avoids over-fitting by applying a smoothing condition to the possible solutions in the regenerated kernel Hilbert space R KHS. Fitting problem, the third term is based on manifold learning to exploit the inherent geometric distribution of all data strengths and bases on manifold learning. Use the adjacency data graph to G = (w, f) characterize the intrinsic manifold of the data distribution: f 2T
l+u 1 1 = (f (xi ) − f (xj ))2 Wij = f T Lf (u + l)2 (u + l)2
(2)
i,j=1
T At this point f = f(x1 ), . . . , f(xl+u ) , Wij ∈ W, i, j = 1, . . . , u + l, it represents the edge weights in the data adjacency graph. L = S − W Refers to the Laplacian l+u plot, Wij , where D represents the degree matrix, D the mid-diagonal elements Dii = j=1 and the rest are 0 in matrix D. From formula (1), the framework of Laplacian support vector machine (LapSVM) can be obtained: 1 γI (1 − yi f (xi )) + γA f 2K + f T Lf ) min ( f ∈HK l (u + l)2 l
(3)
i=1
l+u The solution of the Eq. (3) is f∗ (x) = i=1 αi K(x, xi ), introducing slack variables ξi , i = 1, . . . , l, Eq. (3) can be rewritten as: l
γI T ξi + γA α T Kα + (u+l) 2 α KLKα), α∈Rl+u ,ξi ∈R i=1 l+u s.t. yi ( αj K(xi , xj ) + b) ≥ 1 − ξi , i = 1, . . . , l, ξi j=1
( 1l
min
(4) ≥ 0, i = 1, . . . , l.
Based on the Karush-Kuhn-Tucker (KKT) condition, the dual condition of Eq. (4) is as follow: max(
β∈RI
1 1 1 βi − β T Qβ), s.t. βi yi = 0, 0 ≤ βi ≤ , i = 1, . . . , l l 2 l l
l
i=1
i=1
(5)
where Q = YJK(2γA I + (2γI /(u + l)2 LK)−1 J T Y , J = [I 0] is a l × (l + u) matrix of size l × (l + u), I is the identity matrix, Y = diag(y1 , y2 , . . . , yl ) and K is (l + u) × (l + u) the kernel matrix with the size of (l + u) × (l + u).
722
X. Wang et al.
According to formula (5), it can be obtained that the solution of formula (4) is as follow: −1 J T Y β ∗. α ∗ = 2γA I + (2γI /(u + l)2 )LK
2.2 Semi-supervised Classification Maximum Knowledge Utilization (SSC-EKE) Semi-Supervised Classification Algorithm with Maximum Knowledge Utilization (SSCEKE) is a semi-supervised learning algorithm [3], which effectively combines the manifold regularization mechanism of Laplacian support vector and fully mining useful knowledge in samples and a large number of unlabeled data samples to improve the classification performance, and achieved remarkable results. On the basis of LAPSVM, SSC-EKE uses MS and CS to represent two collections---must-link and cannot-link, and then defines a matrix that is as follow: ⎧ ⎨ 1/|MS|, ∀i, j ∈ MS or j, i ∈ MS; Qij = Qji = −1/|CS|, ∀i, j ∈ CS or j, i ∈ CS; ⎩ 0 default, After combining this matrix with the LAPSVM part, the definition of the joint regularization formula for the manifold and pairwise constraints can be obtained: min(MPCJFR (f ) = (1 − τ )f T L f + τ f T Z f = f T ((1 − τ )L + τ Z )f ) f
(6)
where Z = H − Q and H = diag(Q · 1(l+u)×1 ), similar to the definition of L in LapSVM, so in this formula, the range of values for fT L f and fT Z f are same, τ ∈ 0, 1) is the dimension trade-off coefficient, which can control their individual significance in any data scenario. Following the SVM minimum risk structure, the expression definition of SSC-EKE can be obtained from formula (6):
1 1 2 T
(7) min (1 − yi f (xi ))+ + γA fk + γJ f (1 − τ )L + τ Z f i=1 f ∈Hk l On the basis of Eq. (7), a bias term b is Introduced, so that Eq. (4) can be rewritten as:
min
α∈Rl+u ,ξi ∈R
s.t. yi
l+u
1 l
l
ξ i + γA
i=1
α T Kα
+ γJ
α T K((1 − τ )L
+ τ Z )K∂
, (8)
αj K(xi , xj ) + b ≥ 1 − ξi , i = 1, . . . , l,
j=1
Lagrange multiplier is expressed by, β = (β1 , . . . , βl ), and then according to the K KT condition, the dual form of formula (8) is as follow: l 1 T βi − 4 β Sβ , min β∈Rl i=1 (9) l s.t. βi yi = 0, 0 ≤ βi ≤ 1l , i = 1, . . . , l i=1
Medical Image Registration Method Based on Simulated CT
723
According to the solution of Eq. (9), the initial solution of Eq. (8) can be expressed as:
−1 ∗ 1 γA K + γJ K (1 − τ )L + τ Z K Pβ , 2 l+u
1 l b∗ = yi − α∗j K xi , xj , i=1 j=1 l
α∗ =
(10) (11)
Ultimately, the final form of the classification decision function of the S SC-EKE algorithm is as follow: f ∗ (x) =
l+u i=1
αi∗ K xi , xj + b∗
2.3 Medical Image Registration Based on Phase Difference In order to achieve the effect of registration without changing the overall topological information of the image, we use the phase difference between multiple CT image data to minimize the objective function in the differential homeomorphic space, so as to improve its accuracy and improve the effect of image registration for large deformation. The phase difference is the wavefront aberration extracted from several different images by using the wavefront detection technology. This algorithm is mainly divided into three steps: Deformation field calculation: transform the phase difference between the floating image and the reference image into deformation field by using orthogonal filter. (1) Deformation field calculation: Transform the phase difference between the floating image and the reference image into deformation field by using orthogonal filter. During each iteration, updating the deformation field (Du ) is through the function (θ) calculated from the floating image (m) driven by the reference image (f) and the deformation field (Da ), a and u represents deformation field Da , Du . The calculation form is: Du ← θ (f, m ◦ a)Da (2) Deformation field superposition: The differential homeomorphism algorithm is used to map the deformation field exponentially and add the new deformation field to the original deformation field. After calculating the deformation field, increase the total displacement Da by updating the field Du : Da ← (Da , Du ) (3) Deformation field regularization: The main purpose is to obtain smooth transformation and reduce the influence of image noise on the registration output. Its mathematical expression is: Da ← (Da ). Operation is achieved by applying a low-pass filter to each component of the deformation field. The specific implementation is carried out by using the convolution of the deformation field through the Gaussian
724
X. Wang et al.
kernel. At the same time, considering the deterministic mapping mentioned above, in order to ensure the high certainty of the position, the corresponding relationship between the displacement vector and its corresponding determined position is maintained, and the deterministic mapping is regularized as the deformation field mapping. 2.4 Pixel Weighted Average Image Fusion Algorithm Multimodal medical image fusion is based on the registration of two or more different modes of medical images, which can achieve the purpose of synthesizing different modes of medical images [6]. The spatial domain fusion method is to directly process the pixel values of the image and fuse them in the gray space of the image pixels [7–9]. The spatial domain fusion methods suitable for medical image processing mainly include: the pixel gray value selection method, the pixel weighted average method [10], the pixel insertion method and the method based on Neural Network [4]. After comprehensive consideration of the effect and time of several algorithms, we choose to use the pixel weighted average method [5], which can process the fusion between different modal images in real time, and simply and intuitively express the characteristics of different modal images. The mathematical expression is as follows: F(i, j) = w1 A(i, j) + W2 (i, j) w1 + w2 = 1
3 Experimental Details 3.1 Generate Analog CT This article mainly discusses the accuracy of multimodal medical image registration with simulated CT as the intermediate medium, so we choose to use the current method with better effect to generate simulated CT of lower abdomen to generate simulated CT [8]. The following is a brief description of this method: The first stage is to extract features from MR images. Because there is some noise in MR images, extracting feature data directly will have an unpredictable impact on subsequent experiments. Therefore, a filter matrix is set to smooth MR images to reduce the impact of noise. We obtained abdominal texture features from Dixon-water sequence, Dixon-fat sequence, Dixon-IP sequence and Dixon-OP sequence [11]. Considering the low bone and air signals in the mDixon sequence, the mesh generation strategy is used to obtain the location information. Finally, texture features and position information are combined to obtain seven dimensional feature vectors. The second stage is to generate simulated CT. Read the circled data, adjust the super parameters by grid search and cross validation, and train eight SSC-EKE classifiers. After inputting the required image, about 400 points are extracted from it to obtain the average
Medical Image Registration Method Based on Simulated CT
725
probability. Combined with the seven dimensional characteristics of the image, all voxel tissue types are estimated by KNN nearest neighbor method, and the corresponding CT values are given. The values of bone, fat, air and soft tissue were set as 380, −98, 700 and 32 [12] respectively. 3.2 Image Registration Based on the simulated CT image generated in Sect. 3.1, this method changes the floating image into simulated CT, and the reference image is still the real CT image. The phase difference between simulated CT images and CT images will produce corresponding deformation on MR images. The first step is to transform the phase difference between analog CT and real CT into deformation field by using orthogonal filter. The second step is to use the differential homeomorphism algorithm to map the deformation field exponentially and superimpose the new deformation field with the original deformation field. The third step is to use the Gaussian kernel to regularize the deformation field by using the convolution of the deformation field, so as to obtain a smooth transformation and reduce the impact of noise.
4 Experimental Results 4.1 Environment Configuration The experimental environment and configuration of the system are as follows: the operating system is Windows 11, the CPU model is AMD Ryzen 54600H, the CPU memory size is 16 GB, the GPU model is NVIDIA Geforce GTX 1650, and the GPU memory size is 11 GB. The project operation effect may be different under different configurations. 4.2 A Subsection Sample This chapter focuses on measuring the performance of the proposed method through performance indicators of some internationally recognized technical indicators. The specific performance indicators are shown in the following figure (Table 1): Table 1. Registration specifications. Technical specifications
Metric values (reference)
LP (Local-Phase)
1.9874E+06
CC (Cross-correlation)
0.2808
MI (Mutual-information)
0.7124
The technical indicators used to evaluate the quality of the images after registration have the following specific meanings:
726
X. Wang et al.
LP index: expressed as local phase difference, this index is used to describe the registration effect between two images. The lower the value is, the better the registration effect is. CC index: indicates the degree of correlation between the image and the reference image and the degree of matching in the relative position between the images. It is also used to describe the registration effect between the image and the CT image, the closer a value is to 0, the better the registration. MI index: indicates the mutual information between the image to be registered and the reference image, and is also used to describe the registration effect between the MR image and the CT image. The closer the value is to 1, the better the registration effect. 4.3 Evaluation of Results We recruited 8 subjects using a protocol approved by the Cleveland Medical Center Institutional Review Board to obtain their lower abdomen as well as CT data. Using the known method to generate the corresponding simulated CT, and then based on the simulated CT multi-modality medical image registration, thus indirectly achieve the purpose of registration with CT. The following table shows the three index values of this method and the traditional multimodal registration method, through observation, we can find that the registration method based on simulated CT can achieve smaller phase difference and maximum mutual information than that based on local phase difference and maximum mutual information. It can be seen that the registration method based on simulated CT is an effective registration method (Fig. 1).
Fig. 1. Algorithm performance index value.
From Fig. 2, we can find that the multimodal registration method based on simulated CT can more accurately identify the position, shape and tissue and organs of the bones compared with the two traditional registration methods mentioned above. Competently meet the registration requirements of the lower abdomen.
5 Conclusion In this paper, we introduce simulated CT as an intermediate medium for multimodal registration of MR and CT images, and select two traditional registration methods for
Medical Image Registration Method Based on Simulated CT
727
Multimodal registration based on simulated CT
Multimodal Registration Based on Local Phase Difference
Multimodal Registration Based on Mutual Information Fig. 2. Comparison of registration results of each method.
comparison. The traditional multi-modal registration problem is transformed into singlemodal registration, and the high-precision effect of single-modal registration is used to indirectly achieve high-precision registration between multi-modalities and obtain highquality registration images. After comparing various fusion algorithms, we choose the pixel weighted average algorithm, which is simple, intuitive and easy to implement. It can realize the fusion of two images with different degrees of fusion, and the fusion effect is better.
References 1. Alafeef, M., Fraiwan, M.: On the diagnosis of idiopathic Parkinson’s disease using continuous wavelet transform complex plot. J. Ambient Intell. Hum. Comput. 10, 2805–2815 (2019). https://doi.org/10.1007/s12652-018-1014-x 2. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36, 1–14 (2017) 3. Guo, K., Cao, R., Kui, X., Ma, J., Kang, J., Chi, T.: LCC: towards efficient label completion and correction for supervised medical image learning in smart diagnosis. J. Netw. Comput. Appl. 133, 51–59 (2019)
728
X. Wang et al.
4. Raja, N.S.M., Fernandes, S.L., Dey, N., et al.: Contrast enhanced medical MRI evaluation using Tsallis entropy and region growing segmentation. J. Ambient Intell. Hum. Comput. (2018). https://doi.org/10.1007/s12652-018-0854-8 5. Alvén, J., Norlén, A., Enqvist, O., Kahl, F.: Überatlas: fast and robust registration for multiatlas segmentation. Pattern Recogn. Lett. 80, 249–255 (2016) 6. Fernandez-de-Manuel, L., et al.: Organ-focused mutual information for nonrigid multimodal registration of liver CT and Gd–EOB–DTPA-enhanced MRI. Med. Image Anal. 18(1), 22–35 (2014) 7. Ahmad, S., Khan, M.F.: Multimodal non-rigid image registration based on elastodynamics. Vis. Comput. 34(1), 21–27 (2018) 8. Qian, P., Xi, C., et al.: SSC-EKE: semi-supervised classification with extensive knowledge exploitation. Inf. Sci.: Int. J. 422, 51–76 (2018) 9. Engelen, J.E.V., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109, 373– 440 (2019) 10. Wald, L.: Some terms of reference in data fusion. IEEE Trans. Geosci. Remote Sens. 37(3), 1190–1193 (1999) 11. Eggers, H., Brendel, B., Duijndam, A.: Dual-echo Dixon imaging with flexible choice of echo times. Magn. Reson. Med. 65(1), 96–107 (2011) 12. Schneider, W., et al.: Correlation between CT numbers and tissue parameters needed for Monte Carlo simulations of clinical dose distributions. Phys. Med. Biol. 45(2), 459–478 (2000). https://doi.org/10.1088/0031-9155/45/2/314
Research on Quantitative Optimization Method Based on Incremental Optimization Ying Chen(B) , Youjun Huang, and Lichao Gao Xiamen Identity Check Network Technology CO., LTD., Xiamen, China {cheny,huangyoujun,gaolc}@xmigc.com
Abstract. Existing automatic mixed-precision quantization algorithms focus on search algorithms, ignoring the huge search space and inaccurate performance evaluation criteria. In order to narrow the search space, this paper analyzes the influence of quantization truncation error and rounding error on the performance of quantization model from the perspective of progressive optimization. It was found that for a given model, the quantization truncation error is a constant, while the quantization rounding error is a function of the quantization accuracy. Based on this, this paper proposes a finite-error progressive optimization quantization algorithm. In order to solve the problem of inaccurate performance evaluation criteria, based on quantitative loss analysis and reasoning, this paper proposes a performance evaluation criteria based on Hessian matrix. Adam’s second-order gradient is used as proxy information to reduce the computational complexity of Hessian matrix. The method obtains a model that satisfies the hardware constraints in an end-to-end manner. Rigorous mathematical derivation and comparative experiments have proved the rationality of the algorithm, and its performance far exceeds the current mainstream algorithms. For example, on the ResNet-18 network, while achieving a search space reduction of 1019 x, the computational efficiency of the model performance evaluation standard is increased by 12 times, and the mixed precision model only loses 0.3% of performance, while achieving a 5.7x compression gain. Keywords: Neural network quantization · Incremental optimization · Compression and acceleration
1 Introduction Quantization is a common and well-established algorithm for compression and acceleration of deep convolutional neural networks. This algorithm sets all the convolutional layers of the neural network to a unified low-precision, and performs convolution operations on the low-precision multiplier-adder to achieve the purpose of compression and acceleration. However, different neural network layers have different sensitivities and different degrees of redundancy to different quantization precision settings, and also have different performances on hardware. Reflected on the performance of the entire network, there will be different impacts. Without loss of generality, this uniform precision setting is not optimal. To solve this problem, mixed-precision quantization came into being. For © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 D.-S. Huang et al. (Eds.): ICIC 2022, LNAI 13395, pp. 729–740, 2022. https://doi.org/10.1007/978-3-031-13832-4_60
730
Y. Chen et al.
a given neural network with N network layers, assuming that the size of the optional quantization precision space is m, the mixed precision quantization is intended to find the optimal quantization precision for each layer of the neural network. Combined with the network architecture search algorithm, the traditional mixed-precision quantization algorithm is generally divided into three steps: (1) design an optional mixed-precision search space; (2) design a performance evaluation index to measure the performance of each mixed-precision model; (3) Select an appropriate search algorithm and explore in the alternative precision space based on the performance evaluation criteria. Traditional algorithms [2, 24, 26] usually focus on the design of iterative search algorithms, such as reinforcement learning [15, 24, 25], evolutionary learning [2], and gradient-based update [26]. The mixed-precision search space is generally set manually. As for the performance evaluation indicators, the existing mixed-precision quantization algorithms are generally based on the Performance Ranking Hypothesis. For a given network A and network B, if the verification performance of network A is higher than that of network B in the initial training stage, then After both A and B have converged, the performance of network A is often better than that of network B), and the model performance based on one training frequency (Epoch) is used as the model evaluation criterion. In the step-by-step iteration process, the search algorithm and performance evaluation criteria are used to find the optimal mixed-precision strategy in the entire search space. Although these methods have improved the performance of the model, there are still two important and urgent problems in mixed-precision quantization: (1) huge mixed-precision search space; (2) imprecise performance criteria. The huge mixed-precision search space is an exponential O(m2 N) complexity problem. An effective search space approximation method is urgently needed to achieve the purpose of speeding up the search. The traditional method simply reduces the number of candidate precisions [5] by hand, namely: m