123 12 162MB
English Pages 1704 [1700] Year 2021
Lecture Notes in Electrical Engineering 808
Qi Liu · Xiaodong Liu · Bo Chen · Yiming Zhang · Jiansheng Peng Editors
Proceedings of the 11th International Conference on Computer Engineering and Networks
Lecture Notes in Electrical Engineering Volume 808
Series Editors Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Naples, Italy Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, Munich, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China Shanben Chen, Materials Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore Rüdiger Dillmann, Humanoids and Intelligent Systems Laboratory, Karlsruhe Institute for Technology, Karlsruhe, Germany Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China Gianluigi Ferrari, Università di Parma, Parma, Italy Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Spain Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, Munich, Germany Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA Yong Li, Hunan University, Changsha, Hunan, China Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martín, Departament d’Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA Sebastian Möller, Quality and Usability Laboratory, TU Berlin, Berlin, Germany Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Japan Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi “Roma Tre”, Rome, Italy Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Germany Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Walter Zamboni, DIEM - Università degli studi di Salerno, Fisciano, Salerno, Italy Junjie James Zhang, Charlotte, NC, USA
The book series Lecture Notes in Electrical Engineering (LNEE) publishes the latest developments in Electrical Engineering - quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning:
• • • • • • • • • • • •
Communication Engineering, Information Theory and Networks Electronics Engineering and Microelectronics Signal, Image and Speech Processing Wireless and Mobile Communication Circuits and Systems Energy Systems, Power Electronics and Electrical Machines Electro-optical Engineering Instrumentation Engineering Avionics Engineering Control Systems Internet-of-Things and Cybersecurity Biomedical Devices, MEMS and NEMS
For general information about this book series, comments or suggestions, please contact leontina. [email protected]. To submit a proposal or request further information, please contact the Publishing Editor in your country: China Jasmine Dou, Editor ([email protected]) India, Japan, Rest of Asia Swati Meherishi, Editorial Director ([email protected]) Southeast Asia, Australia, New Zealand Ramesh Nath Premnath, Editor ([email protected]) USA, Canada: Michael Luby, Senior Editor ([email protected]) All other Countries: Leontina Di Cecco, Senior Editor ([email protected]) ** This series is indexed by EI Compendex and Scopus databases. **
More information about this series at http://www.springer.com/series/7818
Qi Liu Xiaodong Liu Bo Chen Yiming Zhang Jiansheng Peng •
•
•
•
Editors
Proceedings of the 11th International Conference on Computer Engineering and Networks
123
Editors Qi Liu School of Computer and Software Nanjing University of Information Science and Technology Nanjing, Jiangsu, China Bo Chen State Key Laboratory of Radar Signal Processing Xidian University Xi’an, Shaanxi, China
Xiaodong Liu School of Computing Edinburgh Napier University Edinburgh, UK Yiming Zhang School of Civil Engineering and Transportation Hebei University of Technology Tianjin, Tianjin, China
Jiansheng Peng Hechi Universtiy Hechi, Guangxi, China
ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-981-16-6553-0 ISBN 978-981-16-6554-7 (eBook) https://doi.org/10.1007/978-981-16-6554-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022, corrected publication 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This conference proceeding is a collection of the papers accepted by the CENet 2021—the 11th International Conference on Computer Engineering and Networks held on October 21–25, 2021, in Hechi, China. This proceeding contains the five parts: Part I Internet of Things and Smart Systems (5 papers); Part II Artificial Intelligence and Applications (41 papers); Part III Medical Engineering and Information Systems (23 papers); Part IV Security and Communication Networks (26 papers); and Part V Communication system detection, analysis, and application (87 papers). Each part can be used as an excellent reference by industry practitioners, university faculties, research fellows, graduate students, and undergraduates who need to build a knowledge base of the most current advances and state of practice in the topics covered by this conference proceedings. This will enable them to produce, maintain, and manage systems with high levels of trustworthiness and complexity. Thanks to the authors for their prestigious work and dedication as well as the reviewers for ensuring the selection of the high-quality papers; their efforts made the proceedings possible.
v
Contents
IOTS Internet of Things and Smart Systems A Double Incentive Trading Mechanism for IoT and Blockchain Based Electricity Trading in Local Energy Market . . . . . . . . . . . . . . . . Bingyang Han, Yanan Zhang, Qinghai Ou, Jigao Song, and Xuanzhong Wang A Survey on Task Offloading in Edge Computing for Smart Grid . . . . Jing Shen, Yongjie Li, Yong Zhang, Fanqin Zhou, Lei Feng, and Yang Yang Data Fusion of Power IoT Based on GOWA Operator and D-S Evidence Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huiping Meng, Jizhao Lu, Fangfang Dang, Yue Liu, Yang Yang, and Binnan Zhao Edge Task Offloading Method for Power Internet of Things Based on Multi-round Combined Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yi Ge, Ying Wang, and Yufan Cheng VEC-MOTAG: Vehicular Edge Computing Based Moving Target Defense System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bingchi Zhang, Shujie Yang, Tao Zhang, Weixiao Ji, Zhongyi Ding, and Jiahao Shen
3
13
21
31
42
AIA Artificial Intelligence and Applications Short-Term Wind Power Forecasting Based on the Deep Learning Approach Optimized by the Improved T-distributed Stochastic Neighbor Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xing Deng, Feipeng Da, and Haijian Shao
53
vii
viii
Contents
Adaptive Image Steganographic Analysis System Based on Deep Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ge Jiao
66
RETRACTED CHAPTER: An Efficient Channel Attention CNN for Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingwei Wang, Ziqin Guo, Haiqiang Duan, and Wei Chen
75
Handwritten Digit Recognition Application Based on Fully Connected Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qintian Zhang, Shenao Xu, and Zhiwei Xu
83
Detection System of Truck Blind Area Based on YOLOv3 . . . . . . . . . . Yang Zhang, Xia Zhu, Yang Bu, Wenjing Ding, and Yilin Lu
90
Driver Fatigue Detection Algorithm Based on SMO Algorithm . . . . . . . 101 Xia Zhu Image Mosaic Technology Based on Harris Corner Feature . . . . . . . . . 111 Xueya Liu, Shaoshi Wu, and Dan Wang Image Semantic Segmentation Based on Joint Normalization . . . . . . . . 121 Jiexin Zheng, Taiwei Qiu, Lihong Chen, and Shengyang Liang DeepINN: Identifying Influential Nodes Based on Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Wei Zhang and Jing Yang Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Guangyuan Zhong, Huiqi Zhao, and Gaoyuan Liu The Research on Image Detection and Extraction Method Based on Yin and Yang Discrete Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Haini Zeng Research on Short-Term Power Load Prediction Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Lanlan Yin, Feng Mo, Qiming Wu, and Shuiping Xiong Image Repair Methods Based on Deep Residual Networks . . . . . . . . . . 160 Hongwei Deng, Ziyu Lin, Jinxia Li, Ming Yao, Taozhi Wang, and Hongkang Luo Real-Time Traffic Sign Detection Based on Improved YOLO V3 . . . . . 167 Haini Zeng Design of Ground Station for Fire Fighting Robot . . . . . . . . . . . . . . . . . 173 Minghao Yang, Xizheng Zhang, Sichen Fang, Anran Song, Zeyu Wang, and Zijian Cui
Contents
ix
Baby Expression Recognition System Design and Implementation Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Xuanying Zhu, Yaqi Sun, Qingyun Liu, Jin Xiang, and Mugang Lin Handwriting Imitation with Generative Adversarial Networks . . . . . . . 189 Kai Yang, Xiaoman Liang, Qingyun Liu, and Kunhui Wen Epidemic Real-Time Monitor Based on Spark Streaming Real-Time Computing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 Jiaxin Yang, Yaqi Sun, Xiaoman Lian, and Xiaoyang He Design and Implementation of Fruit and Vegetable Vending Machine Based on Deep Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Chengjun Yang and Yong Xu Design and Implementation of License Plate Recognition System Based on Android . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Chengjun Yang and Ling Zhou Pseudo-block Diagonally Dominant Matrix Based on Bipartite Non-singular Block Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Fangbo Hou Research on Assistant Application of Artificial Intelligence Robot Coach in University Sports Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Hongtao Pan Research on the Construction of English Teachers’ Classroom Teaching Ability System Based on Artificial Intelligence . . . . . . . . . . . . 238 Qin Miao and Jun Yang Changes and Challenges: Application of Artificial Intelligence Technology in College English Teaching . . . . . . . . . . . . . . . . . . . . . . . . 249 Dan Wang A Median Filtering Forensics CNN Approach Based on Local Binary Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Tao Zhu, Haiyan Gu, and Zenan Chen Application of Cluster Analysis in Bitcoin Deanonymization . . . . . . . . . 267 Meng Li Optimization of Prime Decision Algorithm in RSA Algorithm . . . . . . . . 277 Zhenghui Chang and Pengfei Gong Corner Point Recognition and Point Cloud Correction Based on Graham-Scan Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Bo Zhang, Yuan Xu, Lei Wang, and Shuhui Bi
x
Contents
Multiband Based Joint Sparse Representation for Motor Imagery Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Xu Yin and Ming Meng Policy Gradient Reinforcement Learning Method for Backward Motion Control of Tractor-Trailer Mobile Robot . . . . . . . . . . . . . . . . . . 303 Qiqi Wang, Jin Cheng, and Han Zhang Conditional Distribution Adaptation Toward Zero-Training Motor Imagery Brain-Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Xianghong Zhao, Weiming Cai, and Cong Liu Internal Quality Classification of Apples Based on Near Infrared Spectroscopy and Evidence Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Xue Li, Liyao Ma, Shuhui Bi, and Tao Shen Multi-modal Speech Emotion Recognition Based on TCN and Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Yifan Ye and Jing Chen Edge Perception Strategy Based on Data Fusion and Recurrent Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Yize Tang, Xinjia Wang, Junxiao Shi, Yushuai Duan, and Qinghang Zhang DQN-Based Edge Computing Node Deployment Algorithm in Power Distribution Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 Shen Guo, Peng Wang, Jichuan Zhang, Jiaying Lin, Chuanyu Tan, and Sijun Qin Research on Power-Stealing Behaviors of Large Users Based on Naive Bayes and K-means Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 Liming Chen, Xuzhu Dong, Baoren Chen, Xiaoping Qiu, Zhengrong Wu, Zhiwen Liu, and Qunying Lei Interference Control Mechanism Based on Deep Reinforcement Learning in Narrow Bandwidth Wireless Network Environment . . . . . . 368 Hao Li, Jianli Guo, Xu Li, Xiujuan Shi, and Peng Yu RLbRR: A Reliable Routing Algorithm Based on Reinforcement Learning for Self-organizing Network . . . . . . . . . . . . . . . . . . . . . . . . . . 378 Liyuan Zhang, Lanlan Rui, Yang Yang, Yuejia Dou, and Min Lei A Computation Task Immigration Mechanism for Internet of Things Based on Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . 387 Yifei Xing, Chao Yang, Hao Zhang, Siya Xu, Sujie Shao, and Shi Wang Action Recognition Model Based on Feature Interaction . . . . . . . . . . . . 397 Dengtai Tan, Changpeng He, and Yiqun Wang
Contents
xi
Semantic Segmentation of 3-D SAR Point Clouds by Graph Method Based on PointNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 Zerui Yu and Kefei Liao MEIS Medical Engineering and Information Systems Study on Monitoring and Early Warning Technology of Tick-Borne Zoonosis in Western Liaoning Province . . . . . . . . . . . . . . . . . . . . . . . . . 421 Shuyu Hu and Xiaogang Liu Research on Energy Cost of Human Body Exercise at Different Running Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 Lingyan Zhao, Qin Sun, Baoping Wang, and Xiaojun Wang Accurate Localization of Fixed Orthodontic Treatment Based on Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Xiaoli Sha Speech Stuttering Detection and Removal Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Shaswat Rajput, Ruban Nersisson, Alex Noel Joseph Raj, A. Mary Mekala, Olga Frolova, and Elena Lyakso Design of Epidemic Tracing System Based on Blockchain Technology and Domestic Cipher Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452 Chong Leng, Lai Wei, Ziqian Liu, Zhiqiang Wang, and Tao Yang Optimization of Gene Translation Using SD Complementary Sequences and Double Codons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 Dingfa Liang, Zhumian Huang, Liufeng Zheng, and Yuannong Ye Integrated Helicobacter Pylori Genome Database and Its Analysis . . . . 471 Liufeng Zheng, Mujuan Guo, Dingfa Liang, and Yuannong Ye The Algorithms of Predicting Bacterial Essential Genes and NcRNAs by Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Yuannong Ye, Dingfa Liang, and Zhu Zeng Pneumonia Recognition Based on Deep Learning . . . . . . . . . . . . . . . . . 494 Shiting Luo, Yinglin He, Jing Wang, Yuxiao Tang, and Yong Xu A Hierarchical Machine Learning Frame Work to Classify Breast Tissue for Identification of Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 J. Anitha Ruth, Vijayalakshmi G. V. Mahesh, R. Uma, and P. Ramkumar An Improved Method for Removing the Artifacts of Electrooculography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516 Huimin Zhao, Chao Chen, Abdelkader Nasreddine Belkacem, Jiaxin Zhang, Lin Lu, and Penghai Li
xii
Contents
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 Yating Wu, Qingshan She, Hongan Wang, Yuliang Ma, Mingxu Sun, and Tao Shen A New Feature Selection Method for Driving Fatigue Detection Using EEG Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Zaifei Luo, Yun Zheng, Yuliang Ma, Qingshan She, Mingxu Sun, and Tao Shen A New Strategy for Mental Fatigue Detection Based on Deep Learning and Respiratory Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 Jie Wang, Jilong Shi, Yanting Xu, Hongyang Zhong, Gang Li, Jinghong Tian, Wanxiu Xu, Zhao Gao, Yonghua Jiang, Weidong Jiao, and Chao Tang The Analysis and AI Prospect Based on the Clinical Screening Results of Chronic Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 Lingfeng Xiao, Yanli Chen, Yingxin Xing, Lining Mou, Lihua Zhang, Wenjuan Li, Shuangbo Xie, and Mingxu Sun Spatio-Temporal Evolution of Chinese Pharmaceutical Manufacturing Industry Based on Spatial Measurement Algorithms . . . 563 Fang Xia, Yanyin Cui, Jinping Liu, and Shuo Zhang Evaluating the Spatial Aggregation and Influencing Factors of Chinese Medicine Human Resources in China: A Spatial Econometric Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 Fang Xia, Jinping Liu, Yanyin Cui, and Hongjuan Wen Spatial Distribution of Human Resources Allocation Level of Chinese Traditional Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Jinping Liu, Fang Xia, Yanyin Cui, Ziying Xu, and Hongjuan Wen The Improvement Path of E-health Literacy of Undergraduates in Jilin Province Based on the Structural Equation Model . . . . . . . . . . 595 Peixu Cui, Fang Xia, Jinping Liu, and Xin Su Comprehensive Evaluation of Innovation Efficiency of Jilin Province Pharmaceutical Manufacturing Industry Based on Radar Map Feature Vector Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604 Wanying Li, Yufang He, Yanyin Cui, Zining Zhang, Fang Xia, and Ziying Xu Research on Gene Coexpression Network Based on RNA-Seq Data . . . . 616 Xiaoqian Wu and Xinghui Song
Contents
xiii
Information Sharing of Medical Resources for Emergency Rescue Based on Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 Zhipeng Gao, Heng Fu, Yijing Lin, Huangqi Li, Ze Chai, Haisheng Guo, Dezheng Wang, Yinghan Zhang, Lanlan Rui, and Yang Yang A Remote Health Diagnosis Method Based on Full Voting XGBoost Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634 Yuting Li, Yang Yang, Peng Yu, Ying Yao, and Yong Yan SCN Security and Communication Networks Threat Intelligence Sharing Model and Profit Distribution Based on Blockchain and Smart Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Huiyang Shi, Wenjie Wang, Ling Liu, Yue Lin, Peng Liu, Weiqiang Xie, He Wang, and Yuqing Zhang Research on Android Malicious URL Detection Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Zijian Ma, Zhuoyue Wang, Zhiqiang Wang, Yuheng Lin, and Yingying Du Research on Security Trust Model of P2P Network Based on Improved Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662 Guangyong Zheng and Yuming Xu Design and Implementation of Security Vulnerability Sharing Platform Based on Web Crawler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 678 Zhiqiang Wang, Ziyi Wang, Zhuoyue Wang, Zhirui Zhang, and Tao Yang Design Principle and Method of Lightweight Block Cipher Diffusion Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688 Junxia Zhao, Lang Li, and Qiuping Li Research on Application of Data Encryption in Computer Network Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697 Lanlan Yin, Feng Mo, Qiming Wu, and Yin Long Practical Provably Secure Encryption Scheme Based on Hashed Bilinear Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705 Menglin Xiao, Yun Song, and Ningning Wang A New Image Encryption Strategy Based on Arnold Transformation and Logistic Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 Xiyu Sun and Zhong Chen Time-Aware Missing Traffic Flow Prediction for Sensors with Privacy-Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Lianyong Qi, Fan Wang, Xiaolong Xu, Wanchun Dou, Xuyun Zhang, Mohammad R. Khosravi, and Xiaokang Zhou
xiv
Contents
A Proposal of Digital Image Steganography and Forensics Based on the Structure of File Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731 Chen Liu Research on MySQL Database Recovery and Forensics Based on Binlog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 Zhuoxi Zhang, Ming Yuan, and Hanwei Qian An Early Warning Model of Cybercrime Based on User Profile . . . . . . 751 Wen Deng, Guangjun Liang, Xuan Zhang, and Yuxuan Shi Anonymous Authentication Technology Review in Vehicle Networking Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758 Haoyan Zhang, Guangjun Liang, Jiacheng He, and Mingtao Ji Research on Detection of Chinese Microblog Public Opinion Analysis System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766 Jianfeng Tang and Xiang Xu An Authentication Method Combining Blockchain and Subject-Sensitive Hashing for the Data Sharing of Remote Sensing Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 Kaimeng Ding, Tingting Jiang, and Haozheng Zhang Smart Grid Data Security Sharing Mechanism Based on Alliance Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 Zhijian Si, Dahai Xiao, Chao Yang, Xiaolei Tian, Zhenjiang Lei, and Xiaoning Ma A Non-intrusive Anomaly Detection Method for Distribution Integration Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793 Dongxiao Jiang, Youqing Xu, Chenggang Li, Jiarui Wang, and Yu Wang Integrated Energy Virtual Network Service Fault Diagnosis Algorithm Under Disaster Event Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803 Libo Cui, Chunwei Guan, Jing Zhao, Yanru Wang, Hui Liu, and Wenjie Ma Research on the Endogenous Security Technology of Polymorphic Smart Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813 Huiping Meng, Qinghai Ou, Yi Jing, Jigao Song, Chenbin Qiao, and Jie Zhang FEFuzzer: Hybrid Files Fuzzing Tool . . . . . . . . . . . . . . . . . . . . . . . . . . 823 Tengfei Tu, Wei Zhang, Lu Rao, Zhao Li, and Jiani Lu Research on Data Analysis and Electronic Forensics Algorithm of Telecom Fraud Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834 Shunli Zhang and He Zhang
Contents
xv
Adversarial Unsupervised Domain Adaptation for Traffic Anomaly Detection in Convergence Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Zhuo Tao, Yang Yang, Longjun Zhao, Zhen Wang, Dandan Cui, and Zhipeng Gao Encrypted Traffic Identification Method Based on Multi-scale Spatiotemporal Feature Fusion Model with Attention Mechanism . . . . . 857 Yonghua Huo, Hongwu Ge, Libin Jiao, Bowen Gao, and Yang Yang Power Terminal Data Security and Efficient Management Mechanism Based on Master-Slave Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 Shaoying Wang, Huifeng Yang, Lifang Gao, Qimeng Li, Pengpeng Lv, Xin Lu, and Peng Lin Data Security Sharing Mechanism of Power Equipment Based on Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 Pengbo Fang, Chenjun Sun, Yangyang Lian, Zhihui Wang, Fan Xiao, Liandong Chen, and Peng Lin Design of Log Analysis System Based on Deep Learning for Operation System Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884 Jiaao Yu, Yanbin Jiao, Qing Guo, Chong Liang, Lanlan Rui, and Xingyu Chen CSDA Communication System Detection, Analysis and Application Research on Visual Dynamic Tracking Control of SCARA Robot . . . . . 895 Chunyu Zhu, Zhibin Tian, Yue Zhu, and Zhongcheng Shi Silicon Electro-optic Modulator for Photonic Ring Network On-Chip Based on Dual ITO Layer Directional Coupler . . . . . . . . . . . . . . . . . . . 903 Liang Zhixun, Yi Yunfei, Lin Fang, and Fan Yuanyuan CNTK Communication Optimization Based on Parameter Server . . . . . 909 Xinghui Song and Zhanwen Dai Research About the Influence of Digital Social Media on Learning Ability Based on “Segmentation-Outsourcing-Integration” Method . . . . 920 Yang Sun Cross-Network User Matching Based on Association Strength . . . . . . . . 927 Qiuyan Jiang, Daofu Gong, and Fenlin Liu Implementation of Error Control Coding in Flight Test Instrument . . . 937 Tenghuan Ding, Ming Liu, and Qingdong Xu Research on Airborne Power Conversion Based on Phase-Shifting Full-Bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942 Shaoshi Wu and Xueya Liu
xvi
Contents
Research on Static Missile Attitude Measurement Technology Based on Collinear Perspective Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951 Xiaobo Guo A Novel Negative Sequence Current Control System for Electrified Railway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959 Shenao Xu, Qintian Zhang, and Zhiwei Xu Research on Simulation Models of Blue Force Naval Surface Warships Group Anti-submarine Combat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966 Rui Guo and Haitao Yao The Study of Short Term Wind Power Prediction Based on MV-LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974 Dashuang Li Intelligent Fault Section Location for Distribution Network with DG Based on Hierarchical Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985 Dashuang Li Fault Analysis Based on FMECA Megawatt Wind Turbine . . . . . . . . . . 994 Weicai Xie, Fan Peng, Chaozheng Tang, and Shibo Liu Research and Improvement of Image Transmission Integrated Remote Controller for Tracked Robot in Special Environment . . . . . . . 1005 Anran Song, Xizheng Zhang, Zeyu Wang, Sichen Fang, Zijian Cui, and Minghao Yang Routing Algorithm for Wireless Sensor Network Based on GA-LEACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012 Liang Zhixun, Fan Yuanyuan, and Yi Yunfei An Overview on Developments and Researches of Axial Flux Wind Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025 Yuqi Liao and Wenhao Ai Research on Communication and Control Method of Towing Cable Inspection Robot in Urban Underground Pipe Network . . . . . . . . . . . . 1031 Sichen Fang, Zeyu Wang, Anran Song, Zijian Cui, and Minghao Yang The Study for the Effects of Distributed Generation on Power System . . . 1036 Boxiong Li and Shaoping Huang Doubly Fed Wind Power Generation System Based on Sliding Mode Variable Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044 Weicai Xie, Chaozheng Tang, Fan Peng, and Shibo Liu Research on Temperature Monitoring and Warning System for Power Cable Joints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053 Zijian Cui, Xizheng Zhang, Sichen Fang, Anran Song, Zeyu Wang, and Minghao Yang
Contents
xvii
Design of a High Accuracy Color Block Sorting Robot Based on TCS3200 Color Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1060 Zhenwu Wan, Haifeng Luo, Huaixing Wang, and Liang Huang Design of Warehouse Cooperative Robot System Based on ZigBee . . . . 1073 Linlin Liu, Wenyan Li, Ting Xia, and Zhenwu Wan Design and Development of Intelligent Reading and Writing Posture Reminder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1084 Qiming Wu, Jiahuan Li, and Lanlan Yin Design and Implementation of Intelligent Mechanical Arm . . . . . . . . . . 1090 Qiming Wu, Siyue Yu, Peng Chen, and Luyun Zhang An Improved EZW Algorithm for Image Compression . . . . . . . . . . . . . 1097 Baolin Zhou Construction of University Comprehensive Budget Management Information System Based on Big Data and Cloud Platform . . . . . . . . . 1106 Pingping Ma Discussion on the Integrated Design of Electrical Internet of Things System for Inspection Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116 Wenzhong Xia Evaluation System of Physical Education Students’ Exercise Score . . . . 1125 Hongtao Pan Mathematical Calculation of Inclusion Domain Complex Matrix of Block Eigenvalues Under Two Part . . . . . . . . . . . . . . . . . . . . . . . . . . 1133 Fangbo Hou Research on Educational Informatization Platform Based on Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1140 Guodong Liu Invulnerability Optimization of Communication Network Based on Analog Attack Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154 Lingling Xia, Xueli Ni, Zhengjun Jing, Jiayin Liu, and Yan Zhang Analysis of Open Water Performance of Integrated Motor Propeller . . . . 1160 Zhiguang Guan, Chao Wang, and Qiuhua Miao A Grain-Level Microstructure Model for Simulating of Crack Evolution Based on the CZM Method . . . . . . . . . . . . . . . . . . . . . . . . . . 1168 Zuoli Li, Qin Sun, Baoping Wang, and Xiangzhen Kong Analysis of Underwater Robot Structure Based on Fluent . . . . . . . . . . . 1178 Qingzhen Chen, Qin Sun, Zhiguang Guan, and Fulin Yu
xviii
Contents
Adaptive Structure Design of Pipe Cleaning Robot for Household Fresh Air System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1186 Zhang Hongli and Guan Zhiguang Simulation Analysis of Hydraulic Balance Circuit Based on AMESim Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192 Qin Sun, Qingzhen Chen, Zuoli Li, and Lingyan Zhao Design of Rubber Ring Automatic Assembly Device for Drawer Roller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1201 Baoping Wang, Yongjuan Wang, Qin Sun, and Wei Wang Design of Remote Operated Vehicle Based on STM32 . . . . . . . . . . . . . . 1208 Chao Wang, Zhiguang Guan, and Wei Wang Design of Underwater Robot Control System Based on STM32F407 . . . 1216 Zhang Dongsheng, Wu Hao, Guan zhiguang, and Zhao lingyan Design on Small-Scale Remotely Operated Vehicle . . . . . . . . . . . . . . . . . 1224 Qiuhua Miao, Hao Wu, and Zhiguang Guan Output Feedback Stabilizing Control Design for a Class of Single-Link Robot Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1231 Fuqiang Sun and Xuehua Yan Multi-layer Uneven Clustering for Wireless Sensor Networks . . . . . . . . 1240 Jing Liu and Shoubao Su A Cluster Heads Selection Algorithm of Wireless Sensor Network Based on Cluster Notes Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1250 Jing Liu and Shoubao Su Design of Aircraft Vibration Measuring System . . . . . . . . . . . . . . . . . . . 1260 Ruiyuan Peng, Guobo Wei, and Ruchang Huang Assembly Sequence Planning Algorithm in Collaborative Environment Based on Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266 Haijun Wang Design of Airborne Thermocouple Temperature Measurement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276 Ruiyuan Peng, Guobo Wei, and Ruchang Huang Effect of Online Interaction on College Student Satisfaction in Online Courses: A Chained Mediation Model . . . . . . . . . . . . . . . . . . . . . . . . . . 1282 Jingshuo Liu, Fang Xia, and Zixu Hao Research on Networked Airborne Testing System Architecture Based on DDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1290 Kun Zhao, Ming Liu, and Jian Li
Contents
xix
Wireless Sensing Based Gesture Recognition with Edge Computing in Twin Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297 Xuanzhong Wang, Yanfang Fu, Bingyang Han, Qinghai Ou, and Jigao Song Probe Selection Algorithm of Power Communication Network in Sparse Network Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1307 Xinnan Ha, Ye Wang, Guoli Feng, Run Ma, Xiaobo Li, and Peng Lin Design and Implementation of a Charging Station Transaction System Based on Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315 Kailin Wang, Yongliang Li, Junwei Ma, Zhenhua Yan, Shaoyong Guo, and Yong Yan Power Communication Multi-service Carrying Method Based on Wi-Fi6 Resource Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326 Cheng Zhong, Yuebin Wu, Juntao Zheng, Pengcheng Lu, Yi Li, and Sujie Shao Containerized Scheduling Method Based on Kubernetes and YARN in Big Data Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339 Wenjing Li, Yanru Wang, Wenjie Ma, Liuwang Wang, Dongdong Lv, and Hui Liu A Distributed Software-Defined Content Delivery Network Architecture Based on Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1351 Weichao Gong, Dongyan Zhao, Yidong Yuan, Wuyang Zhang, and Sujie Shao Carrier Network Fault Diagnosis Algorithm Based on Network and Business Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1361 Yutu Liang, Zhan Shi, Ying Zeng, and Song Kang Carrier Network Fault Diagnosis Algorithm Based on Dynamic Bayes Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1371 Zhan Shi, Ying Zeng, Yutu Liang, and Keqin Zhang Carrier Network Fault Diagnosis Algorithm Based on Service Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1381 Zhan Shi, Zhengfeng Zhang, Yutu Liang, and Weichao Gong Link Packet Loss Rate Inference Algorithm Based on Network Characteristics in Carrier Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1392 Zhan Shi, Ying Zeng, Yutu Liang, and Weichao Gong Carrier Network Link Loss Rate Reasoning Algorithm Based on Network Resources and Service Characteristics . . . . . . . . . . . . . . . . 1402 Zhan Shi, Zanhong Wu, Yutu Liang, and Xuchuan Huang
xx
Contents
Inference Algorithm of Link Loss Rate Based on Network Resource Characteristics in Dynamic Carrier Network . . . . . . . . . . . . . . . . . . . . . 1412 Yutu Liang, Jiajia Fu, Zhan Shi, and Keqin Zhang Research on Application Method of NB-IOT in Power Consumption Information Collection System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1422 Gang Wang, Tao Ji, Xiangdong Zhang, Wen Hu, Haiyan Xia, Bo Zou, Xiaoping Qiu, and Wenli Wang Virtual Network Resource Allocation Algorithm Based on Reliability and Distribution Strategy Under Network Slicing . . . . . . . . . . . . . . . . . 1435 Ying Zeng, Yuhang Chen, and Zanhong Wu Virtual Network Resource Allocation Algorithm Based on Active Detection in Network Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1443 Jiangang Lu, Jiajia Fu, and Jian Zhang A Phase Gradient Metasurface Antenna Working at 5G Band for Electric Power Communication Network . . . . . . . . . . . . . . . . . . . . . 1453 Cheng Zhong, Shujun Zhao, Zhengwen Zhang, and Shaoyong Guo Investigation of Directional Wide-Beam Radial Line Slot Antenna for Smart Grid Fault Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1463 Cheng Zhong, Jin Li, Bin Ding, and Shaoyong Guo Supply Chain Credit Evaluation Mechanism Integrating Federated Learning and Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1471 Qi Ma, Huifeng Yang, Dong Wang, Wei Liu, and Shaoyong Guo Research on Edge-Side Domain Name Caching Algorithm Based on Group Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1481 Siyuan Liu, Feng Qi, Shaoyong Guo, and Linna Ruan Blockchain-Oriented Query Capability Optimization . . . . . . . . . . . . . . . 1489 Kete Wang, Feng Qi, Shaoyong Guo, and Linna Ruan Resource-Aware Reliability Assurance of Service Function Chain . . . . . 1500 Shaojun Zhang, Yutong Ji, Yufan Cheng, Ying Wang, and Peng Yu Research on Network Operation Capability and Benefit Evaluation Method for 5G-Enabled Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1509 Shen Jin, Wei Deng, Ningchi Zhang, Yanru Wang, Chun Yang, and Dandan Guo Research on 5G End-to-End Simulation Test Technology for Electric Power Business in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1517 Jiakai Hao, Tianxiang Hai, Kang Yin, Ming Jin, Xiaochen Liu, Shen Wang, Yikun Zhao, and Lei Feng
Contents
xxi
Research on 5G Multipath Concurrent Transmission System and End to End Delay Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1525 Yujing Zhao, Qi Wang, Xiaoyong Qi, Lei Feng, Jing Gao, and Peng Yu 5G NR Test Technology Progresses and Challenges . . . . . . . . . . . . . . . . 1532 Jiakai Hao, Guanghuai Zhao, Mingshi Wen, Kang Yin, Tianxiang Hai, Kun Cao, Minzhao Wang, Fanqin Zhou, and Zerui Zhen Design of a Distributed Ledger-Based Reward Architecture for Collaborative Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1539 Jiaxing Wang, Lanlan Rui, Yang Yang, Miaomiao Wang, Shiyou Chen, and Zhili Wang Unbalanced Data Oversampling Method for Traffic Multi-classification in Convergence Network . . . . . . . . . . . . . . . . . . . . . 1549 Qian Zhao, Yang Yang, Longjun Zhao, Zhen Wang, Dandan Cui, and Zhipeng Gao Design and Simulation of Resource Demand Forecasting Algorithm in Vehicular Edge Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1559 Mengxiao Wu, Lanlan Rui, Shiyou Chen, Yang Yang, Xuesong Qiu, and Zhili Wang Resource Scheduling Algorithms for Burst Network Flow in Edge Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1569 Jingyang Yan, LanLan Rui, Yang Yang, Shiyou Chen, and Xingyu Chen Multi-machines and Multi-tasks Scheduling for UAV Power Inspection in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1579 Jiangang Lu, Jiajia Fu, Jian Zhang, and Keqin Zhang 5G Green Communications: Multigroup Multicasting Transmission with SWIPT in C-RAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588 Ying Zeng, Jiangang Lu, Zhan Shi, and Song Kang Optimal Computation Resource Allocation Scheme for LEO Satellite Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1598 Jiajia Fu, Zanhong Wu, Yuhang Chen, and Xuchuan Huang 3D-SAR Imaging with Improved Frequency Diverse Array Antenna . . . 1605 Qiaoying Yu, Kefei Liao, Shan Ouyang, Ningbo Xie, and Jifa Shen Primary Node Selection Algorithm of PBFT Based on Anomaly Detection and Reputation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613 Ruowen Gu, Bin Chen, and Dongyan Huang OAI-Based CU Implementation and Test with Emulators in 5G CU/DU Split Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1623 Guo Wang, Yang Liu, Haitao Liu, Jiaying Zong, and Jingxian Feng
xxii
Contents
FD-ISAR Translational Compensation Algorithm Based on Observation Subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633 Wenying Lian, Kefei Liao, and Xinghua Liu How Asymmetrical Dependency Affects the Robustness in Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1644 Zikai Liang, Kaixuan Wang, and Yaoli Li A Dynamic Task Assignment Strategy for Emitter Reconnaissance and Positioning through Use of UAV Swarms . . . . . . . . . . . . . . . . . . . . 1654 Ruonan Wang, Yu Gu, Zou Zhou, Zhehao Wang, Fangwen Xu, Jian Luo, Lulu Ma, and Hongbing Qiu Mobile Edge Computing for LEO Satellite: A Computation Offloading Strategy Based Improved Ant Colony Algorithm . . . . . . . . . . . . . . . . . . 1664 Bo Wang, Tong Feng, Dongyan Huang, and Xiaohang Li Retraction Note to: An Efficient Channel Attention CNN for Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xingwei Wang, Ziqin Guo, Haiqiang Duan, and Wei Chen
C1
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1677
IOTS Internet of Things and Smart Systems
A Double Incentive Trading Mechanism for IoT and Blockchain Based Electricity Trading in Local Energy Market Bingyang Han(B) , Yanan Zhang, Qinghai Ou, Jigao Song, and Xuanzhong Wang Beijing Fibrlink Communications Co., Ltd., Beijing 100071, China [email protected]
Abstract. In local energy market, double auction is the most frequently used trading mechanism in blockchain based electricity power trading. In the general form of double auction, the transaction orders and prices only depend on the sellers bid and buyers offer prices, regardless of the energy production efficiency of producers and electricity consumption efficiency of consumers. As a consequence, the competitiveness of renewable energy is undermined and the amount of wasted electricity is increased. With the rapid development of IoT technologies and smart grid, it becomes much easier to obtain information and status of producers and consumers. Therefore, we consider combining IoT technologies with blockchain and proposing a double incentive trading mechanism which considers the external costs of producers and the efficiency of consumers in blockchain based electricity power trading. More specifically, we put forward a metric called priority value (PV) which quantifies the external costs or the efficiency of the consumers to optimize the electricity transactions. The case study shows that our method provides more trading preference for producers and consumers which produce/consume electricity more efficiently and environmentally friendly compared with the traditional trading method. The results also indicate that our method will incent the consumption of renewable energy and stimulate electricity producers to improve the utilization efficiency of fossil fuels, which helps to reduce carbon emissions, and coal consumption, and will also encourage users to improve electricity consumption behavior and save electricity. Keywords: Blockchain · Decentralization · Local energy market · Double auction · IoT technologies · Smart meter
1 Introduction Local energy market is comprised of numerous electricity producers and residential consumers in a local community to trade electricity. The general centralized electricity trade mechanism depending on a third party is inappropriate for local electricity market due to risk of information disclosure and additional transaction cost. However, blockchain can assure producers and consumers trade automatically, fairly, securely and cost effectively in local electricity market. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 3–12, 2022. https://doi.org/10.1007/978-981-16-6554-7_1
4
B. Han et al.
Recently, blockchain has received widespread attention in local energy market, and literatures focus on researches in this area. Esther proposed a blockchain-based decentralized local energy market transaction model and mechanism [1]. A blockchain-based carbon emission trading mechanism is proposed in [2]. Claudia validates the feasibility of blockchain based distributed demand side management [3]. In [4–6] demand side management modes base on blockchain in microgrids have been proposed. Existing literatures have also introduced the pilot application projects of blockchain in local energy market. The most representative two applications are Brooklyn Microgrid [7] in the United States and Quartierstrom [8] in Switzerland, both of them established blockchain based trading platform successfully. At present, blockchain-based local energy transactions mainly use double auction [4], which only relies on the price order of producers bid prices and consumers offer prices to make a deal. This method does not take into account consumer’s energy consumption efficiency, external cost of electricity generation, and electricity production efficiency. In fact, the efficiency of electricity consumption varies among different consumers. Consumers who are aware of saving energy tend to use electricity in a more efficient way than others, and they should be awarded during transaction procedures. In addition, there are many forms of producers in local energy market, such as thermal power plants, photovoltaic power stations, wind power stations etc., which have different electricity generation efficiency and external costs. For example, thermal power plants emit greenhouse gases, harmful gases, dust and other pollutants when burning fossil fuels to generate electricity, which not only pollutes the environment but also brings harm to people’s health. Consequently, all human beings have to spend more efforts to control pollution and treat disease, while such external costs are not included in power production cost. Nowadays, with the widespread deployment of IoT devices and the rapid development of smart grid, it becomes much easier than before to obtain data of producers and consumers mentioned above. So, why not adopting IoT technologies to blockchainbased local energy transactions, which seemingly will solve the above problems? In this situation, we consider combining IoT technologies with blockchain and proposing a transaction method that considers the priority values of both producers and consumers in blockchain-based decentralized power transactions. By considering consumption efficiency, producers’ external costs and production efficiency, our method allows renewable energy generators and high efficiency producers get higher priority to sell electricity, and consumers with high consumption efficiency to buy electricity with higher priority, which directly promote the consumption of clean energy and incent consumers to take measures to save energy.
2 Blockchain-Based Local Energy Market As shown in Fig. 1, local energy market is composed of electricity generators and consumers in a community [1]. Among them, generators use various forms of energy to generate and sell electricity, such as thermal power plants, hydropower plants, photovoltaic power stations, and residents who use photovoltaic roof or wind turbines to generate electricity. Consumers purchase and consume electricity for producing and living, such as residential users, factories, shopping malls, etc. Producers and consumers trade electricity in local energy market as they need.
A Double Incentive Trading Mechanism for IoT and Blockchain
5
In a blockchain-based local energy market, all producers/consumers need to deploy a computing device that runs a blockchain virtual environment as a blockchain node. Each node has a complete backup of the blockchain, so they have equal status in the network, forming a P2P (Peer to Peer) network. Transactions are conducted automatically through smart contracts running on the blockchain. All nodes broadcast the results obtained by smart contract to other nodes. The nodes verify and finally reach an agreement about the data through a consensus mechanism, then encapsulate the transaction data into a new block and store it into blockchain. The characteristics of blockchain, i.e., decentralization, temper proof, and traceability, assure transactions to be secure and transparent. Transac on Data
Transac on Data
Blockchain
Transac on Data
Transac on Data
Energy flow
Fig. 1. An illustration of blockchain-based local energy market.
3 Methodology 3.1 System Architecture We consider the local energy market is consisted of n generators and m consumers forming a P2P network based on blockchain, and transactions are automatically executed by smart contracts running on blockchain. The network architecture is shown in Fig. 2. As shown in Fig. 3, each generator and consumer include a computing device, a smart meter, and a set of IoT terminals besides its own power generation or consumption equipment. The computing device is embedded with blockchain virtual machine to become a blockchain node, and smart contracts run on the blockchain virtual machine. The whole transaction process is automatically executed through smart contracts. An interface which send and receive data of smart meters and IoT terminals to blockchain also runs in the computing device. Quotation information is provided by the generators and consumers themselves. Based on the history data of electricity production or consumption information, the smart meter predicts the amount of electricity to be generated or consumed in the next trading period, and sends the data to blockchain through the interface. The IoT terminal collects the amount of generation electricity W kw·h per unit time and the fuel C kg to be consumed per unit time to produce W kw·h electricity, and the information is sent to blockchain through the interface.
6
B. Han et al.
Generator 1
Consumer 1
Consumer 2
Generator 2
ĊĊ
ĊĊ Generator n
Consumer m
Fig. 2. An illustration of local energy market P2P network based on blockchain.
For a consumer, when a transaction finished, the amount of purchased electricity will be written into the consumer’s smart meter, and the remaining amount electricity recorded in the smart meter will add the purchased electricity. When a consumer consumes 1 kw·h electricity, smart meter minus 1 from the remaining amount of electricity. The IoT terminals send operation data of electrical devices to blockchain, such as operating time t, consumption efficiency η, and power of electrical devices P.
Consumer
Generator Compung Blockchain device virtual machine
Smart contract
Blockchain Compung virtual machine device
Interface
Interface
Bid infomaon
Smart meter
Smart contract
IoT terminal 1
IoT terminal 2
Generaon device 1
Generaon Generaon device 2 device 3
IoT terminal 3 ĊĊ
Offer infomaon
Smart meter
IoT terminal 1
IoT terminal 2
IoT terminal 3 ĊĊ
Electrical device 1
Electrical device 2
Electrical device 3
Fig. 3. The structure of a generator and a consumer
3.2 Transaction Mechanism Transaction Priority Value. In this paper we propose a new transaction mechanism based on blockchain, taking three factors into account, that is, the electricity consumption efficiency of consumers, the external cost of generators, and the power generation efficiency of generators. We obtain a priority value (PV) of the transaction by combining these three factors with bid and offer prices. For a generator, its PV equals PVg = pg + Ce × ηg
(1)
A Double Incentive Trading Mechanism for IoT and Blockchain
ηg =
Ci /
i
7
Wi
i
ηs
(2)
Where pg is the bid price of the generator, Ce is the external cost. For renewable energy Ce equals 0, and for fossil fuel energy, it is provided by statistical academies or associations. ηg is the production efficiency of the generator, Wi is the amount of electricity generated by the i-th power generation equipment per unit time, Ci is the fuel consumed by the i-th power generation equipment to generate Wi per unit time, and ηs is the average fuel consumption for this kind of power plant. For a consumer, its PV equals PVc = pc ×
ηc ηc max
ηi Pi ti ηc = Pi ti i
(3)
(4)
i
Among them, pc is the consumer’s offer price, and ηc is the average electricity consumption efficiency of the consumer. ηc max is the highest efficiency value among m consumers. ηi , Pi , and ti are the electricity efficiency, power, and operation time of the i-th electrical equipment of the consumer. Transaction Procedure. Transactions start regularly every fixed time period. Smart meters of generators/consumers predict electricity production/consumption data in the next period and send them to blockchain. Generators and consumers send bid prices and offer prices to blockchain through an interface. The IoT terminals collect the data of generation equipment and electrical devices, i.e., W kw·h electricity to be produced per unit time, C kg fuels to be consumed per unit time to generate W kw·h electricity, electrical devices operating time t, consumption efficiency η, and the power of consumption devices P. During the transaction process, smart contract automatically calculates the PV, sorts the generators by PV ascendingly and consumers descendingly, and match them in pairs to make transactions according to the sequence order. The deal price is the average of the bid price and offer price of the matched pairs. The transaction process consists of four phases: Data collection, PV calculation, auction and settlement. Transaction process is shown in Fig. 4. Data Collection. First, all nodes broadcast the message that transactions will begin and wait for a period of time. During this period, blockchain collects data from generators and consumers, i.e., the amount of electricity to be produced by generators, bid price pg , average electricity production W , average fuel consumption C, consumer electricity demand, offer price pc , operation time t, electrical efficiency η, devices power P. PV Calculation. Smart contract calculates PV of generators and consumers respectively according to Eqs. (1)–(4).
8
B. Han et al.
Auction. Based on smart contract, Generators are sorted by PV ascendingly, and consumers are sorted descendingly, and they are matched in order to make a transaction. The transaction price is the average of the quotations of the matching parties. Settlement. Consumers pay money to generators automatically, the consumer’s smart meters record the amount of purchased electricity, and generators produce electricity according to transaction data. Then transaction results are recorded in blockchain via consensus mechanism. Finally, the transaction in this period ends and next transaction will begin.
Start
Generators/consumers send offer/demand volume W, prices Pg and Pc, cost C, efficiency η , electrical devices power P, opera on me t etc.
Data collec on
Caculate PV
PV cacula on
Execute transac on algorithm, sort generator and consumer by PV, match them in pairs to make a deal.
Auc on
Consumers pay money to generators, smart meters record the purchased volume, generators produce electricity according to transac on data.
Se lement All nodes encapsulate transac on data as a new block via consensus mechanism and store it into blockchain. End this round of transac on
Fig. 4. Transaction process flow chart
The algorithm implementation process of the auction phase is shown in Fig. 5.
A Double Incentive Trading Mechanism for IoT and Blockchain
Start
Sort generators by PV ascend, sort consumers descend
Set i=1, j=1
i th consumer offer price>=j th generator bid price
N
Y
N i++
Y
i th consumer demandnumber of consumers OR j>number of generators
Y End
Fig. 5. Process for transaction algorithm
9
10
B. Han et al.
4 Case Study We compare double auction with our method through a case study. In this case, there are 6 generators and 8 consumers. The characteristics are shown in Table 1 and Table 2. Generators G1–G3 are thermal power plants and their external costs equal 0.15 CNY.1 G4–G6 are photovoltaic power plants and their external costs equal zero. Results are shown in Fig. 6 and Fig. 7. The result shows that in the original method, G5 and G6 cannot sell out electricity in the transaction. After considering the external cost, they can get higher priority to sell. As for G1, G2, and G3, they not only have a lower transaction priority, but their transaction volume has also decreased. For consumers, those with high efficiency pay less in our method than before, while those with low electricity use efficiency are as reverse. Since electricity efficiency is determined by the device’s efficiency, power, and operation time of electrical equipment, consumers can choose equipment with high efficiency and low power, or reduce unnecessary electricity Table 1. Generators characters Generator
Bid price, CNY/kW·h
Bid size, kW·h
External cost, CNY/kW·h
Production efficiency (ηg )
PV g
G1
0.43
44
0.15
1.03
0.59
G2
0.40
23
0.15
1.06
0.56
G3
0.33
50
0.15
1.05
0.48
G4
0.33
35
0
–
0.33
G5
0.47
18
0
–
0.47
G6
0.46
24
0
–
0.46
Table 2. Consumers characters Consumer
Offer price, CNY/kW·h
Offer size, kW·h
Consumption efficiency (ηc )
PV c
C1
0.63
22
0.56
0.45
C2
0.76
13
0.43
0.41
C3
0.61
10
0.79
0.61
C4
0.53
15
0.7
0.47
C5
0.64
27
0.52
0.42
C6
0.51
11
0.65
0.42
C7
0.83
17
0.4
0.42
C8
0.55
19
0.78
0.55
1 Taking China as an example, the average external cost for a thermal power plant producing
1 kw·h of electricity is 0.12–0.19 CNY, and we take the value in this example as 0.15 CNY.
A Double Incentive Trading Mechanism for IoT and Blockchain
11
consumption. By improving electricity consumption behavior, consumers will get higher electricity efficiency and become more energy-saving. On one hand, this method can promote the consumption of new energy, promote fossil energy electricity generators to improve fuel utilization, reduce carbon emissions, and reduce fossil energy consumption. On the other hand, it can encourage users to optimize electricity usage and save electricity.
revenue/CNY
30
26.52 22.10
25 20 15
12.42
19.30 16.74
11.09
9.89
10 5 0
0.00 G1
3.20 G2
0.00 G3
G4
G5
13.17
0.00 G6
Generators before
after
cost/CNY
Fig. 6. The revenue of generators before and after considering external cost and production efficiency.
16 13.74 13.03 14 10.64 12 10.55 10.54 9.79 9.07 10 8.37 7.77 7.21 7.08 8 6.41 5.21 4.83 6 4.63 4.68 4 2 0 C1 C2 C3 C4 C5 C6 C7 C8
Consumers before
after
Fig. 7. The cost of consumers before and after considering electricity consumption efficiency.
12
B. Han et al.
5 Conclusion Based on the original double auction method, we propose a double incentive trading mechanism which considers the external costs of generators, the efficiency of consumers and generator production efficiency in smart contract based local energy markets. More specifically, we put forward a metric called priority value (PV) which combines these three factors with bid/offer prices to measure the priority of generators and consumers comprehensively to optimize the electricity transactions. Compared with the original method, our method offers higher transaction priority for consumers with high efficiency, renewable energy generators and generators with high generation efficiency, and helps them get more benefits in transactions. This method can incent consumers to improve electricity consumption efficiency, promote the consumption of renewable energy and improve energy production efficiency.
References 1. Mengelkamp, E., Notheisen, B., Beer, C., Dauer, D., Weinhardt, C.: A blockchain-based smart grid: towards sustainable local energy markets. Comput. Sci. Res. Dev. 33(1–2), 207–214 (2017). https://doi.org/10.1007/s00450-017-0360-9 2. Khaqqi, K., Sikorski, J., Hadinoto, K., Kraft, M.: Incorporating seller/buyer reputation-based system in blockchain-enabled emission trading application. Appl. Energy 209, 8–19 (2018). https://doi.org/10.1016/j.apenergy.2017.10.070 3. Pop, C., Cioara, T., Antal, M., Anghel, I., Salomie, I., Bertoncini, M.: Blockchain based decentralized management of demand response programs in smart energy grids. Sensors 18(2), 162 (2018). https://doi.org/10.3390/s18010162 4. Wang, J., Wang, Q., Zhou, N., Chi, Y.: A novel electricity transaction mode of microgrids based on blockchain and continuous double auction. Energies 10(12), 1971 (2017). https://doi.org/ 10.3390/en10121971 5. Noor, S., Yang, W., Guo, M., van Dam, K., Wang, X.: energy demand side management within micro-grid networks enhanced by blockchain. Appl. Energy 228, 1385–1398 (2018). https:// doi.org/10.1016/j.apenergy.2018.07.012 6. Stephant, M., et al.: A survey on energy management and blockchain for collective selfconsumption. In: 2018 7th International Conference on Systems and Control (ICSC). IEEE (2018) 7. Mengelkamp, E., Gärttner, J., Rock, K., Kessler, S., Orsini, L., Weinhardt, C.: Designing microgrid energy markets. Appl. Energy 210, 870–880 (2018). https://doi.org/10.1016/j.ape nergy.2017.06.054 8. Brenzikofer, A., et al.: Quartierstrom: A Decentralized Local P2P Energy Market Pilot on a Self-Governed Blockchain (2019)
A Survey on Task Offloading in Edge Computing for Smart Grid Jing Shen1 , Yongjie Li1 , Yong Zhang1 , Fanqin Zhou2(B) , Lei Feng2 , and Yang Yang2 1 State Grid Henan Electric Power Company Information Communication Company, Henan,
China 2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing, China {fqzhou2012,fenglei,yangyang_2018}@bupt.edu.cn
Abstract. With the rapid development of smart grid, traditional cloud computing architectures struggle to meet the needs of new power applications with low latency and large connectivity in the context of big data. Hence, edge computing has emerged. Edge computing is closer to the edge of the network where data is generated, enabling fast data processing and supporting swift user requests. This paper describes the system architecture of edge computing and the principles of task offloading in smart grid. It finally concludes with a summary of existing issues and future trends. Keywords: 5G · Smart grid · Edge computing · Task offloading
1 Introduction With the rapid development of smart grid, the massive access of terminal nodes and the explosive growth of intelligent devices and new applications lead to the rapid growth of network traffic and service data [1]. Power services such as control, acquisition and mobile put forward higher requirements for task processing and transmission efficiency. As the traditional cloud computing cannot meet the needs of these applications for low latency, researchers propose a new computing mode, called edge computing [2]. Figure 1 shows a typical edge computing system. To realize low-latency and high-efficiency power services, the computing mode transfers the computing load from the remote cloud to the edge nodes of the core network closer to the users. However, the computing resources and network resources of the edge node are limited. Therefore, mobile devices should carefully plan whether the computing tasks are executed locally or unloaded to the edge node or the cloud, so as to achieve the goal of the shortest task completion delay [3, 4]. This is called the computing task offloading problem in the edge computing system. Therefore, research on offloading strategies based on edge computing in smart grids is of great significance [5]. This paper introduces in detail an important research issue in smart grid – computing task offloading, that is, computing task is executed on the device or offloaded to the edge or cloud. In this paper, the optimization objectives, decision variables and algorithm © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 13–20, 2022. https://doi.org/10.1007/978-981-16-6554-7_2
14
J. Shen et al.
Fig. 1. Edge computing system
design are summarized. Section 2 gives the definition and system architecture of edge computing; Sect. 3 describes the problem of computing task offloading in edge computing; Sect. 4 looks forward to the future research direction of computing task offloading; Sect. 5 gives some conclusions.
2 Edge Computing 2.1 Definition of Edge Computing At present, there is no unified definition of edge computing in a strict sense. The definition of edge computing is shown in Table 1 [6]. The definition of edge computing given by different organizations is not exactly the same, but it expresses a common point of view, that is, to process the data near the network edge [7]. Table 1. Definition of edge computing.
A Survey on Task Offloading in Edge Computing for Smart Grid
15
2.2 Architecture of Edge Computing Edge computing takes edge devices (base station, wireless access point, router, etc.) as the bridge between cloud and terminal equipment, and extends cloud services to the edge side of the network [8]. Therefore, the edge computing system architecture mainly includes cloud, edge and equipment. Cloud: the cloud consists of several high-performance servers and storage devices matching it. Its computing and storage capabilities are very strong, and can handle the high complexity of computing requirements. Edge: edge is a device close to the end user and has certain computing and storage capacity, including but not limited to specific edge server, wireless access point, base station, gateway at all levels, routers and switches, etc. The edge devices can meet the real-time needs of users, and also help the cloud devices to filter and analyze data. Device: terminal equipment in smart grid. To ensure that the terminal equipment has a long life cycle, it is necessary to put the computing tasks with high complexity to the edge or the cloud [9]. According to different research focuses, the system architecture of edge computing can be classified into the following three types: cloud edge end three-tier architecture model, cloud edge two-tier architecture model, and edge end two-tier architecture model [10].
3 Task Offloading in Edge Computing The core problem of edge computing is task offloading. Task offloading means that the edge server allocates a certain amount of computing resources to these uploaded applications, in order to obtain a reduction in latency or energy consumption and provide a better user experience [11]. Typically, an initial and critical part of calculating offloads and resource allocations is deciding whether to offload, i.e. the offload decision. After determining whether to offload, the next question to consider is how much and what should be offloaded [12]. 3.1 The Classification of Task Offloading Generally speaking, the decision to calculate possible offloads may be one of the following: (1) Local Execution: Local Execution: The entire computation process is done locally. This is generally used for tasks with low computational power requirements [13]. (2) Full Offloading: The entire computation is offloaded to the edge server for computation and processing via a wireless channel connecting the base station. This approach, also known as the full offload problem and the binary offload problem, assumes that the applications served at the edge cannot be split and that only local computation or the entire offload to the edge server can be chosen to perform the computation [14].
16
J. Shen et al.
(3) Partial Offloading: On the premise that the computation can be split, part of the computation is processed locally and the other part is offloaded to the edge server for processing [15] (Fig. 2).
Fig. 2. Task offloading method comparison in smart grid
3.2 Task Unloading Process Task offloading often includes the following steps: service perception, task splitting, offloading decision-making, task uploading, edge node execution and result return, etc. [16]. (1) Service perception Service perception is the first step in task offloading, that is, to explore server nodes that can provide edge computing in the environment [17]. In a single cell scenario, smart devices can often only explore one edge computing server. In the scenario of multiple edge servers, smart devices can explore multiple edge computing servers that provide services. At this time, they are often faced with server selection decisions when tasks are offloaded. (2) Task splitting In order to improve the efficiency of task offloading and save network bandwidth resources, the method of fine-grained task splitting can often be used to decompose computing tasks into subtasks [18]. By making offloading decisions on subtasks in turn, the computing resources of terminals and edge servers are fully utilized. The granularity of task splitting includes method level, thread level, module level, etc. (3) Offloading decision-making Offloading decision is the core of task offloading. By analyzing the perceived edge server status and local computing status, it is judged whether the task is calculated locally or offloaded to edge server computing. It is often necessary to set optimization goals during the offloading process [19]. In most cases, optimization of time delay and energy consumption are the main factors. Offloading decisions can be divided into static offloading and dynamic offloading according to the characteristics of the environment. Static offloading means that the entire offloading
A Survey on Task Offloading in Edge Computing for Smart Grid
17
environment is relatively stable, the network structure is almost unchanged, and there is no disturbance. Dynamic offloading requires consideration of random disturbances in the environment, including random fluctuations in wireless channels, user mobility, and server load conditions. (4) Task uploading Task uploading is to upload computing tasks from intelligent devices to edge server nodes [21]. There are many ways to upload, including wired, 4G/5G, WiFi and so on. In the process of uploading, it usually consumes a certain amount of energy to send tasks and time. (5) Edge node execution and result return After the task is uploaded to the edge server, the server will immediately allocate resources to calculate the task [22]. When the calculation is complete, the result is returned. Since the computing power of the edge server is not as good as that of the cloud computing center, its computation delay should be taken into account. However, the calculation result is often very small, and the return delay can be ignored when the result is returned to the edge device. 3.3 Task Offloading Scenario In the scenario of single user task offloading, it is generally aimed at partial offloading. Reference [23] discussed the influence of environment parameters on task migration performance in the process of task migration. In the process of task migration, greedy optimization was used to optimize the performance in parallel. Reference [24] constructs the system model as Markov process under the influence of random environmental parameters such as user mobility and wireless channel disturbance, and constructs the task offloading decision of mobile devices as Markov decision process (MDP) to make optimization. In the scenario of single cell task offloading in edge computing, it usually faces the competition among multiple users. In Reference [25], for the limited computing resources in the cloud server, a multi-user task offloading algorithm based on online and offline is proposed. The algorithm adopts the fine-grained task partition method, divides the computing task into multiple parts, and processes them in the cloud server and the edge end respectively. In reference [26], the network is divided into three layers, including terminal nodes, edge servers and cloud servers. The service model of the server is based on queuing theory. In this paper, the game theory is used to analyze, and the task offloading is constructed as a non-cooperative game problem and optimized. Because edge servers are often limited in computing resources compared with cloud centers, they cannot deal with complex concurrent tasks in time. Therefore, in edge computing, multiple edge servers can use the wired link to transmit tasks through collaboration and assign complex tasks to multiple edge servers through networking, thus improving the task processing ability of edge computing servers. The task equilibrium decision of edge server can be optimized by multi-agent analysis and heuristic algorithm.
18
J. Shen et al.
4 Challenges and Future Works Nowadays, the research on edge computing offloading has made great progress, but it is still immature, and there are still important research directions and problems to be solved. For example, in terms of security, data dependence and resource utilization [27]. At the software level of edge computing, an open platform with edge computing function needs to be developed; network collaboration needs the continuous development of wireless communication technology; at the hardware level, the concept of “cloud edge end” collaboration needs to be introduced. The heterogeneity of hardware devices makes the implementation of multilateral collaboration very difficult. Therefore, how to coordinate heterogeneous devices is a challenge to make full use of resources in edge computing system. Due to the limited resources of edge devices, the service deployment scheme to avoid resource waste is also a research hotspot, but the technology is not mature. According to the timeliness of different application requirements, the corresponding services on regional edge nodes need to be created and deleted immediately [28]. Due to the dynamic user requirements and data dependence, services need to have the characteristics of dynamic migration. How to design a management platform that can simultaneously decide the deployment location and status of a large number of services is a research direction that can be explored. As a distributed machine learning architecture, federated learning is designed to achieve efficient machine learning among multiple participants or computing nodes on the basis of protecting data security of terminal devices, data privacy of individual users, and legal operation [29]. Federated learning based on edge computing has the advantages of saving network resources, shortening training delay and protecting data privacy. However, considering the heterogeneity and individual rationality of terminal devices and edge nodes (that is, the pursuit of personal income maximization without the control of the central server), and the constraints of limited computing and communication resources, the implementation of Federated learning based on heterogeneous edge computing system still faces many challenges, which requires the joint efforts of a large number of researchers.
5 Conclusion The offloading of computing tasks in edge computing is a key research problem, that is, whether the computing tasks should be performed locally or offloaded to edge nodes or clouds. Firstly, we introduce the basic concept of edge computing and summarize the system architecture of edge computing. Then, the problem of offloading computing task in edge computing is described in detail. We describe the classification, rationale and strategies for task offloading, summarize the problems in current research and provide an outlook on future research directions. Acknowledgement. This work is supported by the State Grid Henan Electric Power Company Science and Technology Project “Research on Secure Networking Technology and Service Access Simulation of 5G-integrated Energy Internet” (Grant No. 5217Q0210001).
A Survey on Task Offloading in Edge Computing for Smart Grid
19
References 1. Zhang, L., Hao, J., Zhao, G., Wen, M., Hai, T., Cao, K.: Research and application of AI services based on 5G MEC in smart grid. In: 2020 IEEE Computing, Communications and IoT Applications (ComComAp), Beijing, China, pp. 1–6 (2020) 2. Abbas, N., et al.: Mobile edge computing: a survey. IEEE Internet Things J. 5(1), 450–465 (2016) 3. Wang, S., et al.: A survey on mobile edge networks: convergence of computing, caching and communications. IEEE Access 99, 1 (2017) 4. Hu, Y.C., et al.: Mobile edge computing—a key technology towards 5G. ETSI White Paper 11(11), 1–16 (2015) 5. Kumar, N., Zeadally, S., Rodrigues, J.J.P.C.: Vehicular delay-tolerant networks for smart grid data management using mobile edge computing. IEEE Commun. Mag. 54(10), 60–66 (2016) 6. Ren, J., et al.: An edge-computing based architecture for mobile augmented reality. IEEE Network 33(4), 162–169 (2019) 7. Liu, J., et al.: Delay-optimal computation task scheduling for mobile-edge computing systems. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1451–1455. IEEE (2016) 8. Ko, H., Lee, J., Pack, S.: Spatial and temporal computation offloading decision algorithm in edge cloud-enabled heterogeneous networks. IEEE Access 6, 18920–18932 (2017) 9. Cardellini, V., et al.: A game-theoretic approach to computation offloading in mobile cloud computing. Math. Program. 157(2), 421–449 (2015). https://doi.org/10.1007/s10107-0150881-6 10. Wan, J., et al.: Fog computing for energy-aware load balancing and scheduling in smart factory. IEEE Trans. Industr. Inform. 14(10), 4548–4556 (2018) 11. Shi, W.S., et al.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016) 12. Brik, B., Frangoudis, P.A., Ksentini, A.: Service-oriented MEC applications placement in a federated edge cloud architecture. In: ICC 2020 – 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, pp. 1–6 (2020) 13. Jiang, F., Wang, K., Dong, L., Pan, C., Xu, W., Yang, K.: Deep-learning-based joint resource scheduling algorithms for hybrid MEC network. IEEE Internet Things J. 7(7), 6252–6265 (2020) 14. Yang, S., Tseng, Y., Huang, C., Lin, W.: Multi-access edge computing enhanced video streaming: proof-of-concept implementation and prediction/QoE models. IEEE Trans. Veh. Technol. 68(2), 1888–1902 (2019) 15. Huang, M., Liu, W., Wang, T., Liu, A., Zhang, S.: A cloud – MEC collaborative task offloading scheme with service orchestration. IEEE Internet Things J. 7(7), 5792–5805 (2020) 16. Feng, J., Richard Yu, F., Pei, Q., Chu, X., Du, J., Zhu, L.: Cooperative computation offloading and resource allocation for blockchain-enabled mobile-edge computing: a deep reinforcement learning approach. IEEE Internet Things J. 7(7), 6214–6228 (2020) 17. Huang, H., Ye, Q., Du, H.: Reinforcement learning based offloading for realtime applications in mobile edge computing. In: ICC 2020 – 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, pp. 1–6 (2020) 18. Song, F., Xing, H., Luo, S., Zhan, D., Dai, P., Qu, R.: A multi-objective computation offloading algorithm for mobile-edge computing. IEEE Internet Things J. 7(9), 8780–8799 (2020) 19. Zhang, J., et al.: Energy-latency tradeoff for energy-aware offloading in mobile edge computing networks. IEEE Internet Things J. 5(4), 2633–2645 (2018) 20. Lei, L., Xu, H., Xiong, X., Zheng, K., Xiang, W.: Joint computation offloading and multiuser scheduling using approximate dynamic programming in NB-IoT edge computing system. IEEE Internet Things J. 6(3), 5345–5362 (2019)
20
J. Shen et al.
21. Zhao, J., Li, Q., Gong, Y., Zhang, K.: Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks. IEEE Trans. Veh. Technol. 68(8), 7944–7956 (2019) 22. Nath, S., Li, Y., Wu, J., Fan, P.: Multi-user multi-channel computation offloading and resource allocation for mobile edge computing. In: ICC 2020 – 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, pp. 1–6 (2020) 23. Liu, K., Liao, W.: intelligent offloading for multi-access edge computing: a new actor-critic approach. In: ICC 2020 – 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, pp. 1–6 (2020) 24. Wang, F., Xu, J., Cui, S.: Optimal energy allocation and task offloading policy for wireless powered mobile edge computing systems. IEEE Trans. Wireless Commun. 19(4), 2443–2459 (2020) 25. Zhang, Q., Gui, L., Hou, F., Chen, J., Zhu, S., Tian, F.: Dynamic task offloading and resource allocation for mobile-edge computing in dense cloud RAN. IEEE Internet Things J. 7(4), 3282–3299 (2020) 26. Wei, Z., Zhao, B., Su, J., Lu, X.: Dynamic edge computation offloading for internet of things with energy harvesting: a learning method. IEEE Internet Things J. 6(3), 4436–4447 (2019) 27. Rui, L., Yang, Y., Gao, Z., Qiu, X.: Computation offloading in a mobile edge communication network: a joint transmission delay and energy consumption dynamic awareness mechanism. IEEE Internet Things J. 6(6), 10546–10559 (2019) 28. Bi, S., Zhang, Y.J.: Computation rate maximization for wireless powered mobile-edge computing with binary computation offloading. IEEE Trans. Wireless Commun. 17(6), 4177–4190 (2018) 29. Wang, S., Chen, M., Saad, W., Yin, C.: Federated learning for energy-efficient task computing in wireless networks. In: ICC 2020 – 2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, pp. 1–6 (2020)
Data Fusion of Power IoT Based on GOWA Operator and D-S Evidence Theory Huiping Meng1 , Jizhao Lu1 , Fangfang Dang1 , Yue Liu1 , Yang Yang2(B) , and Binnan Zhao2 1 State Grid Henan Information and Telecommunication Company, Zhengzhou 450000, China 2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing 100876, China [email protected]
Abstract. Data fusion can use different sources or different forms of information to describe the object more accurately. Aiming at the possibility that the equipment in power IoT may malfunction or have measurement errors, an anomaly recognition method based on multi-dimensional data fusion of sensor data in power IoT is proposed. Firstly, construct the support rank of data collected by each sensor. Then, the support degree of each system attribute data to the anomaly type is calculated and transformed into evidence. Finally, based on the basic probability assignment function of each piece of evidence, D-S evidence theory combined with ME-GOWA (The Maximum Entropy-Generalized Ordered Weighted Average) operator is used for data fusion, so as to effectively identify the anomaly type. By comparing the accuracy with other algorithms, the effectiveness and certain advantages of the proposed algorithm are clearly proved. Keywords: Data fusion · Power IoT · Anomaly recognition · D-S evidence theory · ME-GOWA
1 Introduction Sensors are widely used in the current ubiquitous power IoT, which collect various types of system attribute data in real time to help the power IoT provide intelligent, reliable and efficient services. Data fusion can associate and combine data from multiple sensor information sources, enhance the perception accuracy of features and gain a more precise description of the sensing object. The mainstream data fusion methods mainly include two categories: probability and statistics methods and artificial intelligence methods. Among them, the probability methods have the following types: One is multi-Bayesian estimation that minimizes the likelihood function of the relevant probability assignment function; The second is weighted average, which uses direct and simple arithmetical operation; The third is D-S evidence theory, which is suitable for the reasoning of uncertain problems. The artificial intelligence methods principally include: neural network based on data processing capacity and automatic reasoning capacity or combinatorial computation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 21–30, 2022. https://doi.org/10.1007/978-981-16-6554-7_3
22
H. Meng et al.
utilizing fuzzy set theory. The above methods are widely used in various fields. The D-S evidence theory can represent correct probability, incorrect probability and uncertain probability at the same time, which has advantages when processing data with high uncertainty. Therefore, this paper proposes an anomaly recognition method based on multidimensional data fusion. The innovation points are as follows: (1) Construct the support rank of each system attribute data collected by different sensors and construct the support interval of each proposition. Integrate all information, convert the system attribute data into evidence; (2) Use the improved D-S evidence theory, power IoT anomaly types are effectively identified. First of all, BPA (Basic Probability Assignment) assesses the source of evidence based on the basic probabilities of propositions, thereby generating the belief and plausibility of all propositions. The belief and plausibility of each evidence are set as the lower and upper bounds of the trust interval. Then, the descending order of interval numbers is obtained by the aggregation of ME-GOWA operator. Finally, the aggregation results are sorted to obtain the recognition results of the corresponding anomaly types.
2 Related Works Dempster firstly proposed the evidence theory in 1967. The original theory includes the concept of upper and lower probability, the principle of combining two independent information sources. Later in 1976, Shafer further refined the evidence theory. “Evidence” and “Combination” are the core of D-S evidence theory. “Evidence” refers to data that contains inconclusive information. “Combination” refers to fusion rules. The data fusion formula can combine the information denoted by the data to obtain conclusions with higher credibility. As a result, D-S evidence theory is now widely used in decision support and knowledge fusion systems. D-S evidence theory can effectually dispose the uncertainty of data, and it can also signify the probability of correctness, incorrectness and uncertainty. Therefore, it is well suited to the recognition of power IoT anomalies. One or more clusters of evidence are merged into new evidence based on data fusion formulas. Since the evidence for participation in the fusion is based on the dynamic transformation of the original data, the noise and errors in the original data will be transferred to the evidence obtained, which will affect the final fusion result. The conflict coefficient k in D-S evidence theory cannot legitimately denote conflicts, especially the two BPA are the same at times, but k is not zero. Jiang [1] introduced the concept of correlation coefficient to measure the similarity of two BPAs, and considered the non-intersection and difference between the focus elements, avoiding the problem that the correlation coefficient of BPA is unstable or insensitive. Xing X et al. [2] proposed a weighted evidence combination method based on Mconf. The evidence conflict metric factor Mconf is constructed by a modified evidence distance called mdBPA and a combination conflict called k, and the conflict evidence combination is effectively performed. Wu S et al. [3] proposed a method for obtaining evidence reliability based on relative entropy, which achieves relative reliability by calculating the relative entropy between original evidence and reference evidence.
Data Fusion of Power IoT Based on GOWA Operator
23
Jiang W et al. [4] proposed a new method based on evidence distance and uncertainty metrics, which divided the evidence into credible evidence and untrustworthy evidence, and used belief entropy to calculate the amount of information in evidence. Kushwah A et al. [5] proposed a multi-sensor fusion method using temporal evidence theory, and developed an incremental conflict resolution method to introduce time information into a multi-sensor environment, which can be used for activity detection in smart homes. Yang K et al. [6] defined a new evidence distance in combination with the traditional evidence distance and the Jousselme distance, by calculating the reliability of each evidence to determine the relative reliability and weighting factor, and the evidence is classified according to distance parameters and local conflict parameters. Wenhao Bi et al. [7] proposed an evidence conflict measurement method based on Tanimoto metric. The evidence similarity measure is used to obtain the degree of conflict between evidence. Due to some problems in practical applications, the fusion effect of evidence theory is greatly affected. For example, a serious conflict between evidence will lead to a false fusion of evidence theory, and how to obtain a basic probability assignment function according to the actual application environment. Therefore, we need to improve the evidence theory in response to these problems.
3 DFMD In this paper, we propose the DFMD (Data Fusion Based on ME-GOWA Operator and D-S Evidence Theory) for anomaly recognition in the power IoT. The method firstly constructs support ranks for data collected by each sensor. Then, for each type of sensor data, calculate the proportion of data with different ranks, the proportions are applied to transform the sensor data into evidence. Finally, use the ME-GOWA operator to fuse evidence and effectually identify the anomaly type. The concrete steps of DFMD are shown in Fig. 1.
System attribute data
Construct the support rank and the support interval
Transform data into evidence in D-S evidence theory
Use the ME-GOWA operator to fuse evidence
Result of anomaly recognition
Fig. 1. The overall process of DFMD
24
H. Meng et al.
According to the possible existences of abnormal behaviors in the power IoT, the system attribute information collected by different sensors is integrated to transform data into evidence. Different weights describing credibility are allocated to each evidence. The handling of conflicting evidence is to render a smaller weight instead of deletion. There are two reasons for this treatment: On one hand, when the quantity of evidence held is relatively small, it is impossible to clarify the conflicting evidence. Advisable weights rendered help reduce the negative influence of conflicting evidence on results. When the quantity of evidence held is relatively large, the weight of conflicting evidence is approximately zero at this time, the negative influence on result can be ignored. On the other hand, sometimes it may be impossible to figure out which evidence is conflicting and which original data has errors. In addition, the existence of noise and redundancy in datasets is ubiquitous. Hence the data fusion method adopted must be able to fit the unknown and complex datasets. Common anomalies in power IoT equipment include overload, system oscillation, short circuit and component failure, which are represented as propositions in the recognition framework, and the recognition framework is represented as = A1 , A2 , · · · , Ap . Suppose the power IoT has a certain number of sensors, each of which monitors and records one of the system attributes such as voltage, temperature and vibration. The support rank of each system attribute data can be calculated as follows: vij − vij (1) γij = 4p max vij − min vij Where γij represents the support rank of the i-th system attribute data collected by j-th sensor to the anomaly type, p represents the number of propositions, vij represents the i-th data collected by sensor Sj , vij denotes the average value of the system attribute data collected by sensor Sj , max vij and min vij respectively denote the minimum value and maximum value of the system attribute data collected by sensor Sj . The evidence generated by the sensor data denoted as Ej , j = 1, 2, · · · , n, and the BPA of the evidence Ej is expressed as mj . Considering the conflict of evidence in Zadeh’s paradox, a set of uncertainties = (A1 , A2 ), (A2 , A3 ), · · · , Ap , A1 is added to the evidence. The number of elements in the set of uncertainties is p. For example, mj (A1 , A2 ) expresses the BPA of an evidence pointing to A1 orA2 , mj Ap , A1 expresses the BPA of an evidence pointing to Ap or A1 . In the expanded set of propositions, define the support interval for each proposition: Ik = [k − 1, k] k = 1, 2, · · · , 2p
(2)
Calculate the number of system attribute data with support rank γij in the same support interval Ik as support number c(R). So, the BPA of a position R in the expanded set of propositions is: c(R) mj (R) = 2p k=1 c(R)
(3)
Since the power gird may not only appear various anomalies, the data collected may be subject to varying degrees of measuring error. Thus, specific types of anomalies
Data Fusion of Power IoT Based on GOWA Operator
25
must be identified to facilitate the implementation of preventive measures by relevant personnel. In this paper, the improved D-S evidence theory is adopted to acquire the BPA of each evidence generated from the sensor dataset and perform data fusion process on the basis of BPA. Firstly, BPA evaluates sources of evidence and generates belief and plausibility functions for all propositions; Secondly, construct the trust interval using the belief function value and plausibility function value as the lower and upper bounds; Then, the ordered descending arrangement of the number of intervals is obtained by the ordered weighted average operator aggregation; Finally, the aggregation results are also sorted in descending order to obtain anomaly recognition results. Firstly, the belief function Belmj and the plausibility function Plmj of each proposition are given by the mj of the evidence, so that the trust interval BIij of each evidence is obtained. mj (R) Belmj (Ai ) = (4) R⊆Ai
Plmj (Ai ) =
mj (R)
(5)
R∩Ai =∅
the trust interval of the BPA Where Ai and R are the propositions in 2 . Therefore,
of the evidence Ej on the proposition Ai is BIij = Belmj (Ai ), Plmj (Ai ) . Calculate the BPA of each sensor data to obtain a trust interval matrix, expressed as BI : ⎞ ⎛ BI11 BI12 · · · BI1n ⎜ BI21 BI22 · · · BI2n ⎟ ⎟ ⎜ (6) BI = ⎜ . .. .. ⎟ ⎝ .. . ··· . ⎠ BIm1 BIm2 · · · BImn Second, the confidence interval is sorted in descending order. For the sakeof simplicity, the i-th row of the matrix is represented as BIi: = BIi1 BIi2 · · · BIin . Since BIij is the confidence interval of the BPA of the evidence Ej on Ai , then BIi: includes the confidence interval of all evidence on the proposition Ai . According to the OWA (Ordered Weighted Average) operator, the distance Dij
between the trust interval BIij = Belmj (Ai ), Plmj (Ai ) and [0,0] is calculated, namely: 2 1 2 1 Belmj (Ai ) + Plmj (Ai ) + Belmj (Ai ) − Plmj (Ai ) Dij = (7) 3 4 The descending order of Di1 ∼ Din is expressed as Diπi (1) ∼ Diπi (n) , where πi (j) represents the j-th distance in which Di1 ∼ Din are arranged from large to small, j = 1, 2, · · · , n. Obviously, Diπi (1) > Diπi (2) > · · · > Diπi (n) . Since Dij is larger, the trust intervals of the BPA of the evidence Ej is larger, thereby obtaining the order of BIi: . Then, the sorted interval number BIiπi is aggregated, and the product of each interval number is calculated as Prodi (j), j = 1, 2, · · · , n. Prodi (j) =
j s=1
BIiπi (j)
(8)
26
H. Meng et al.
Where Prodi (j) is also the interval number. The upper bound is the product of the upper bound of the j-th largest interval number of BIiπi , and the lower bound is the product of the lower bound of the j-th largest interval number of BIiπi . Applying the improved OWA operator named ME-GOWA (The Maximum EntropyGeneralized Ordered Weighted Average) operator, the calculation formula of MEGOWA operator is: wj ∈ [0, 1],
n
wj = 1
(9)
j=1
ME − GOWAw (x1 , x2 , · · · , xn ) =
n 1 λ λ wj x(j)
(10)
j=1
Where x(j) is the j-th largest element in x1 , x2 , · · · , xn . The weight vector of the ME-GOWA operator is mainly determined by two functions based on the uncertainty orness measure constraint, one is the dispersion and the other is the orness, as shown in Eqs. (9) and (10): Disp(W ) = −
n wj ln wj
(11)
j=1
1 (n − j)wj n−1 n
Orness(W ) =
(12)
j=1
The dispersion represents the entropy of the probability assignment function, and the orness indicates the optimism of the decision maker. The closer its value is to 1, the closer the ME-GOWA operator is to the “or” operator; the closer its value is to 0, the closer the ME-GOWA operator is to “and”. The more uniform the weight is, the closer the ME-GOWA operator is to 0.5. When considering the orness constraint condition, that is, the value orness(W) is given, the closer the weight value is, the better, so the weight of the OWA operator should be as equal as possible, and solved by Eq. 13. 1 min 2n 1 s.t. n−1
n
n−1 j=1
wj wj −
wj+1 wj+1
(13)
(n − j)wj = α, 0 < α < 1
j=1
Satisfy orness(W ) = α, α ∈ [0, 1]. When α = 0.5, use the ME-GOWA operator weight assignment to weight the aggregation of each interval number product Prodi (j), and evaluate all propositions by combining all sources of evidence, denoted as BIi,W : 1 Prodi (j) n n
BIi,W =
j=1
(14)
Data Fusion of Power IoT Based on GOWA Operator
27
Finally, the interval number of each proposition BIi,W (i = 1, 2, · · · , m) is calculated according to the ME-GOWA operator and sorted in descending order, then the recognition result of the consumption attack can be obtained. For the sake of simplicity, U and BI L . The the upper and lower limits of BIi,W are respectively expressed as BIi,W i,W distance Di between the interval number BIi,W and [0,0]: 2 1 2 1 2 U + BI L U − BI L Di = d BIi,W , [0, 0] = BIi,W BI + (15) i,W i,W i,W 3 4 The descending order of D1 ∼ Dm is expressed as Dπ (1) ∼ Dπ (m) , where π (i) represents the i-th distance of D1 ∼ Dm from large to small, i = 1, 2, · · · , m. Obviously, Dπ (1) > Dπ (2) > · · · > Dπ (m) . Since Di is larger, the interval number BIi,W of the proposition is larger, thereby obtaining the order of BIi,W : BIπ (1),W > BIπ (2),W > · · · > BIπ (m),W . So that the final recognition result of the proposition Rπ (i) is: Dπ (i) Rπ (i) = n i=1 Dπ (i)
(16)
4 Simulation 4.1 Datasets The sensor dataset used in this paper contains sensor data sampled from the real power IoT environment. Sampling data is measured by five sensors, each sensor monitors a total of 100 system attribute data under different time periods. The normal condition and two types of corresponding power IoT anomalies are selected as the recognition framework = {A1 , A2 , A3 }. Considering the conflict of evidence in Zadeh’s paradox, a set of uncertainties = {A1 A2 , A2 A3 , A3 A1 } is added to the evidence. And then calculate the support rank of all data and the support interval of each proposition to obtain propositions’ BPA. The BPA of five pieces of evidence is as follows (Tables 1 and 2): Table 1. Temperature sensor dataset Resource
BPA
E1
m1 (A1 ) = 0.15, m1 (A2 ) = 0.65, m1 (A2 , A3 ) = 0.20
E2
m2 (A1 ) = 0.10, m2 (A2 ) = 0.78, m2 (A3 ) = 0.12
E3
m3 (A1 ) = 0.29, m3 (A2 ) = 0.53, m3 (A3 , A1 ) = 0.18
E4
m4 (A1 ) = 0.11, m4 (A2 ) = 0.82, m4 (A1 , A2 ) = 0.07
E5
m5 (A1 ) = 0.18, m5 (A2 ) = 0.71, m5 (A2 , A3 ) = 0.05, m5 (A3 ) = 0.06
28
H. Meng et al. Table 2. Vibration sensor dataset
Resource
BPA
E1
m1 (A2 ) = 0.12, m1 (A2 , A3 ) = 0.75, m1 (A3 ) = 0.07, m1 (A3 , A1 ) = 0.06
E2
m2 (A1 ) = 0.19, m2 (A3 ) = 0.63, m2 (A3 , A1 ) = 0.18
E3
m3 (A1 ) = 0.09, m3 (A2 ) = 0.22, m3 (A3 ) = 0.69
E4
m4 (A1 ) = 0.13, m4 (A2 ) = 0.11, m4 (A2 , A3 ) = 0.05, m4 (A3 ) = 0.71
E5
m5 (A1 ) = 0.21, m5 (A1 , A2 ) = 0.33, m5 (A3 ) = 0.46
4.2 Simulation Results According to the fusion of all data collected by temperature sensors, the classic D-S algorithm [8] is compared with the DFMD, as shown in Fig. 2. The normal condition, overload and short circuit are selected as the recognition framework. The proposition A2 owns the highest support in both algorithms. DFMD’s support for propositions A1 , A2 and A3 are 0.1457, 0.8073 and 0.0469, and its support for proposition A2 is significantly higher than its support for proposition A1 and A3 . Furthermore, DFMD’s support for the correct proposition A2 is 12.20% higher than that of the classic D-S evidence theory. It can be seen from the comparison of the results that the DFMD can correctly process the dataset of system attribute data collected by sensors and obtain accurate fusion results. The result of anomaly recognition is A2 , which is an overload anomaly.
Fig. 2. Temperature sensor dataset fusion result
According to the fusion of all data collected by vibration sensors, the classic D-S algorithm is compared with the DFMD, as shown in Fig. 3. The normal condition, weak impact and component failure are selected as the recognition framework. The proposition A3 owns the highest support in both algorithms. DFMD’s support for propositions A1 , A2
Data Fusion of Power IoT Based on GOWA Operator
29
and A3 are 0.1777, 0.0851 and 0.7372, and its support for proposition A3 is significantly higher than that for proposition A1 and A3 . In addition, DFMD’s support for the correct proposition A3 is 9.96% higher than that of the classic D-S evidence theory. It can be seen from the comparison of the results that the DFMD can correctly process the dataset of system attribute data collected by sensors and obtain accurate fusion results. The result of anomaly recognition is A3 , which is a component failure anomaly.
Fig. 3. Vibration sensor dataset fusion result
5 Conclusion Aiming at the equipment in power IoT may malfunction or have measuring error, this paper employs the data collected by various sensors in power IoT, and proposes the DFMD to conduct anomaly recognition. Construct the support rank of each system attribute data collected by different sensors and construct the support interval of each proposition. And integrate all information to convert the data into evidence. Then, in the light of the trust interval generated by basic probability assignment of each proposition, the ME-GOWA operator is used in the data fusion process, and the anomaly type is effectually identified. The simulation is performed on sensor datasets. The effectiveness and superiority of DFMD is proved through the comparison of accuracy between our algorithm and other algorithms. Acknowledgment. This work is supported by Science and Technology Project of State Grid Henan Information & Telecommunication Company “Research and Application of IoT Terminal Edge Access and Intelligent Protection Technology”.
30
H. Meng et al.
References 1. Jiang, W.: A correlation coefficient for belief functions. Int. J. Approximate Reason. 103, 94–106 (2018) 2. Xing, X., Cai, Y., Zhao, Z., Cheng, L.: Weighted evidence combination based on improved conflict factor. J. Discrete Math. Sci. Cryptogr. 19(1), 173–184 (2016) 3. Wu, S., Chen, G.C.: Combination of conflicting evidence based on relative entropy. Appl. Mech. Mater. 724, 318–322 (2015) 4. Jiang, W., Zhuang, M., Qin, X., Tang, Y.: Conflicting evidence combination based on uncertainty measure and distance of evidence. Springerplus 5(1), 1–11 (2016). https://doi.org/10. 1186/s40064-016-2863-4 5. Kushwah, A., Kumar, S., Hegde, R.M.: Multi-sensor data fusion methods for indoor activity recognition using temporal evidence theory. Pervasive Mob. Comput. 21, 19–29 (2015) 6. Yang, K., Feng, Y.: The combination method of conflict evidence based on classification correction. In: 2016 International Conference on Network and Information Systems for Computers (ICNISC), pp. 213–217 (April 2016) 7. Bi, W., Zhang, A., Yuan, Y.: Combination method of conflict evidences based on evidence similarity. J. Syst. Eng. Electron. 28(3), 503–513 (2017) 8. Yager, R.R.: On the aggregation of prioritized belief structures. IEEE Trans. Syst. Man Cybern. A: Syst. Hum. 26(6), 708–717 (2002)
Edge Task Offloading Method for Power Internet of Things Based on Multi-round Combined Auction Yi Ge, Ying Wang, and Yufan Cheng(B) State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China {geyi,wangy}@bupt.edu.cn, [email protected]
Abstract. As a key technology of power Internet of Things, multi-access edge (MEC) computing promotes the deep integration of Internet of Things (IoT) and smart grid. However, most of the current studies carry out task offloading and resource allocation from the perspective of benefit to the user side, ignoring the motivation of the edge side to provide services. In this paper, a multi-round combined auction algorithm is proposed for the power Internet of Things scenario of multi-terminal business-multi-edge IoT agent, which can meet the needs of different levels of business while maximizing the edge side profits. The experimental results show that this method can effectively improve the edge profit, and has a good effect on the success rate of task offloading and the resource utilization of the edge IoT agent. Keywords: Edge computing · Power Internet of Things · Resource allocation
1 Introduction In recent years, the deep integration of IoT technology and smart grid and the explosive growth of terminal business data have put forward higher requirements for power Internet of Things [1]. MEC is a key technology in the technical architecture of the Internet of Things for power [2]. Its essence is to provide users with computing and other related services close to the source of data generation to save network transmission costs [3]. As a product of edge-side fusion MEC technology in smart grid, edge IoT agent has the functions of gateway, computing and storage. At present, most of the literatures carry out task offloading and resource allocation from the point of view of benefit to users, ignoring the motivation of providing services at the edge. This paper focuses on the power IoT scenario of multi-terminal services-multi-edge IoT agent. Based on the perspective of resource owners, this paper seeks to maximize edge profits while meeting the needs of different levels of business. A multi-round combined auction algorithm is proposed to describe the service relationship between edge IoT agent and terminal service by using auction model. The main contributions of this paper are as follows: (1) Dynamic pricing was carried out based on task urgency, and bidding was conducted with
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 31–41, 2022. https://doi.org/10.1007/978-981-16-6554-7_4
32
Y. Ge et al.
comprehensive consideration of the available resource capacity of the edge IoT agents and the distance between it and the terminal business. As more tasks were unloaded to the edge for processing, the success rate of business unloading was effectively improved. (2) A pre-allocation strategy of resource blocks considering the edge side profits, the service level and the number of bids is proposed to logically divide the resources owned by the edge IoT agents, which effectively improves the resource utilization rate of the edge IoT agent. (3) Dynamic programming is adopted to realize joint allocation of resources. Compared with the existing MRIAM [4] and DPDA [5] methods, the proposed method in this paper can obtain higher edge side profits.
2 Related Work The resource allocation scheme on the edge has always been the focus of researchers. Literature [6, 7] considered a single edge IoT agent system. [6] assumed that there was only one user and that the state of the wireless channel was time-varying. The goal was to minimize the total energy consumption of the user side through the joint allocation of computing resources and communication resources. [7] considered minimizing the total energy consumption on the user side in the case of multiple users. [8] considers the situation of multiple edge IoT agents, which is more in line with the actual needs. The goal is to minimize the task processing time under a cooperative three-layer computing network. [9] designed an improved branch-and-bound algorithm for the joint optimization strategy of computing offload, subcarrier allocation and computing resource allocation, so as to reduce user energy consumption. However, the above literatures only consider the requirements of the user side for task offloading and resource allocation, ignoring the service motivation of the edge side. Some literature uses economic models to allocate resources to the edge side. [10] designs a resource trading mechanism with incentive compatibility between the terminal side and the edge side, but it only considers the case that one edge device can only serve one user. [11] designed a single-round two-way auction scheme, in which each edge device could handle multiple terminal tasks, but only considered users’ one-dimensional demand for computing resources. [12] designed a user-centered auction-based mechanism without considering the benefits of the edge side. Different from the literatures mentioned above, this paper, based on edge side profits, designs a multi-round combined auction algorithm based on business urgency for the power IoT scenario of multi-terminal service and multi-edge IoT agent to jointly allocate communication resources and computing resources, so as to maximize edge side profits while meeting business needs.
Edge Task Offloading Method for Power Internet of Things
33
3 Problem Formulation The system model is composed of multiple terminal services and multiple edge IoT agents. The set of edge IoT agents is expressed as S = {1, 2, . . . S}. φs = (Rs , Cs ) represents the resource state of the edge IoT agent S, Rs represents the communication resources owned by the S-th edge IoT agent, that is, the number of sub-channels, and Cs represents the number of unit computing resources owned by the S-th edge IoT agent. The set of all terminal services is represented by I = {1, 2, . . . I }. taski = {di , ci , dli } represents the i terminal service. di represents the amount of service data. ci represents the number of CPU cycles needed to calculate the service, and dli represents the maximum acceptable delay to complete the service. The transmission rate from terminal service to edge IoT agent is defined as pi gi,s (1) ri,s = Bis log2 1 + σis2 Local computing power is defined as CPU frequency, then the local processing time of terminal business taski is ti,l =
ci fi l
(2)
The energy consumption of terminal business taski processed locally is ei,l = δi ci
(3)
where δi = 10−11 fil represents the energy consumed per CPU cycle [13]. Since the amount of data returned by the task is small, the transmission time only considers the task upload time, and the time returned by the result is ignored [6]. The total processing time is expressed as ti,s =
di ci + ri,s fi,s
(4)
The energy consumption of terminal service i unloading to the edge IoT agent is the transmission energy consumption of the task, so the energy consumption of unloading is ei,s = pi
di + ci qs ri,s
(5)
Where qs is the energy consumption required by the edge IoT agent s to calculate a bit of data.
34
Y. Ge et al.
Here, the main consideration is to which edge IoT agent the terminal service is unloaded to. Therefore, for the question of whether to unload or not, it only needs to satisfy the requirement that the running time and energy consumption of the task calculated for unloading do not exceed the delay and energy consumption of local execution. In other words, the unloading can be carried out when Eqs. (6) and (7) are satisfied. ti,s ≤ ti,l
(6)
ei,s ≤ ei,l
(7)
In combination with Eqs. (1), (3), (5) and (7), it can be obtained that the minimum bandwidth required when terminal service chooses to unload is Bis∗ =
pi di (δi ci − ci qj ) log2 1 + pi his /σis2
The number of subchannels requested when a user unloads a task is nis = Bis∗ /b
(8)
(9)
In combination with Eqs. (6) and (8), it can be obtained that the minimum computing resources required by edge IoT agent s when terminal service chooses to unload are fis∗ =
fi,l ci ∗ ci − di fi,l /Bis log2 1 + pi his /σis2
(10)
Assuming that there is a minimum allocation unit of computing resources, the number of unit computing resources requested by terminal business unloading is mis = fis∗ /funit (11) αic is represented as the bidding price of unit computing resources of terminal business, and αib is represented as the bidding price of unit communication resources of terminal business. Then the user’s resource bidding price for the edge IoT agent is μis = αic mis + αib nis
(12)
βsc and βsb are used to represent the unit cost of service node for computing resources and communication resources respectively. The cost of marginal IoT agent s to allocate resources to users is νis = βsc mis + βsb nis
(13)
Edge Task Offloading Method for Power Internet of Things
35
X = {xis }I ×S is defined as the connection matrix, where xis is represented as the decision variable, xis = 1 is established when the connection is established between the terminal service i and the edge IoT agent s, otherwise xis = 0. The objective of this paper is to maximize the benefits of the edge IoT agent while satisfying the resource limitations of the edge IoT agent and ensuring the quality of service. Therefore, the planning problem is expressed as max
S s=1
s.t. C1 : C2 : C3 :
I i=1 I
Us = I
S I j=s i=1
xis αic − βic mis + αib − βib nis
xis mis ≤ Cs , ∀s ∈ S
i=1
(14)
xis nis ≤ Rs , ∀s ∈ S xis ≤ 1, ∀s ∈ S, xis ∈ {0, 1}
i=1
4 Algorithm Design Based on Auction Model 4.1 Tender Submission Stage After receiving the information of the edge IoT agent, the terminal business determines its preference for the edge IoT agent by considering the available resource capacity of the edge IoT agent and the distance between it and the terminal business. The calculation formula of preference degree is lis = ρ
Rs Cs 1 +γ +λ nis mis dis
(15)
where, ρ,γ and λ are respectively the available communication resource factor, the available computing resource factor and the distance factor, and dis represents the distance between the terminal service and the edge IoT agent. We propose a bid approach based on business urgency. The urgency of terminal business i is defined as the ratio of the difference between the current time and the generation time and the original time constraint, as shown in Eq. (16): χi =
tnow − tstart dli
(16)
Terminal business evaluates unit computing resources
unit communication
and i and bi , bi respecresources of the edge IoT agent within a range, which are cLi , cH L H tively. Terminal business makes dynamic pricing based on business urgency, and the bids of computing resources and communication resources of computing units are as follows: i − cLi ) αic = cLi + χi (cH
(17)
αib = biL + χi (biH − biL )
(18)
36
Y. Ge et al.
4.2 Resource Block Pre-allocation Phase In order to realize the differentiated service of terminal business, the resources of the edge IoT agents are logically divided into P resource blocks, which serve the corresponding level of terminal business. At the same time, the initial resource allocation strategy should consider the bid level and number of terminal service jointly because it involves the interests of the edge IoT agents. To sum up, the initialization of resource blocks can be obtained by the following formula: p
Ns p Cs
=
i=1 ω Ns i=1 p Ns
p Rs
=
i=1 ω Ns i=1
αic × mis
p
× Cs + (1 − ω) αic × mis
p
× Rs + (1 − ω) × nis
(19)
p=1
λbi × nis λbi
p × Ns × Cs P p p × Ns
p × Ns × Rs P p p × Ns
(20)
p=1
p
Among them, Ns represents the number of terminal services of grade p in the request set of edge IoT agent s, the former item including ω is related to the bid of terminal services, and the latter item including 1 − ω is related to the level and load of terminal services. 4.3 Winner Determination Stage In order to maximize the benefit obtained by the edge IoT agents, the resource allocation problem of the edge IoT agents is abstracted into a two-dimensional knapsack problem, and the dynamic programming method is adopted to select users for service. Algorithm1 req describes the winner determination algorithm, where Is represents the terminal business set that requests the edge of the IoT agent s service, and Iun represents the terminal business set that fails to win the bid.
Edge Task Offloading Method for Power Internet of Things
37
Algorithm 1: Winners determination algorithm based on dynamic programming
01 Input: request information of terminal business source capacity of edge IoT agent 02 for i=1: do do 03 if do 04 for 05 for do 06 if do 07 if 08
xis
09 10 11 12 13 14 15 16 17 18 19 20 21 22
, re-
do
0
else
xis
end if else
end if end for end for else
x
0
is 23 24 25 end if 26 end for 27 Output: edge IoT agent profits connection matrix X
unbid terminal business set
Algorithm 2 describes the multi-round combined auction algorithm proposed in this paper, in which an edge IoT agent is represented as one round of decisions, and there are S rounds of decisions.
38
Y. Ge et al.
Algorithm 2: Multi-round combinatorial auction algorithm based on business urgency
01 Input: the set of terminal business, the set of edge IoT agent, the bid C of terminal business, the resource of edge IoT agent, and the unsuccessful set . 02 Initialization 03 for 04 for do 05 Update bids for unit computing resources and unit communication resources according to formula (16) to calculate the urgency of the business 06 Terminal business i calculates the priority coefficient of the edge IoT agent according to Equation (17), sorts the priorities, determines the sorting set, and sends bid requests to the edge IoT agent in the set successively. 07 for do 08 After receiving the request information of terminal service, req
edge IoT agent s obtains the request set s 09 Classify the bid vector by business level 10 Preallocate resource blocks 11 Execute algorithm 1 12
EM
S
Es
s 1
13 end for 14 Update the unsuccessful collection 15 for do 16 Update its bid according to Equations (17) and (18) 17 end for 18 end for 19 Output: connection matrix , total profit , unbid matrix
5 Experimental Evaluation The setting similar to [14, 15] is considered here, where multiple edge IoT agents and multiple terminal services are randomly distributed in an area of 500x500m2 .The number of sub-channels Rs and the number of unit computing resources Cs owned by the edge IoT agents are randomly selected in {40, 50, 60}. In Eq. (17), ρ, γ and λ are 0.4, 0.4 and 0.2 respectively. The main simulation parameters are shown in Table 1. The proposed method is compared with MRIAM and DPDA.
Edge Task Offloading Method for Power Internet of Things
39
Table 1. Parameter settings Parameter
Value
The minimum unit of allocation of communication resources
10 MHz
The smallest unit of allocation of computation resources
2 GHz
Terminal transmitted power
20 dBm
The noise power
−100 dBm
Terminal computing capacity
0.1–1 GHz
The Size of service data
10–100 kB
CPU cycles required for the task
100–1000 Megacycle
Bidding price per unit resource
6–12
Cost price per unit resource
3–9
Fig. 1. Success rate of offloading
Fig. 2. Resource utilization
Figure 1 shows the success rate of task unloading under three different methods. It can be found that the success rate of task offloading of the method proposed in this paper is higher than that of the other two methods. This is because in the tender submission stage, we conduct dynamic pricing based on task urgency, which can effectively avoid some failure of task migration. DPDA, though dynamic pricing, is not based on business urgency. MRIAM is based on a fixed percentage of pricing strategy. Figure 2 shows the resource utilization rate of each edge IoT agent under three different methods. Here, the resources of five edge IoT agents set A–E are increasing in turn. It can be found that the method proposed in this paper is more resource efficient. This is because we considered the load of the resource blocks during the pre-allocation phase. Edge IoT agent can dynamically adjust the size of each resource block according to the business request submitted in each round of auction. DPDA is a single-round auction, while the MRIAM method divides resource blocks equally, without specific combination of load conditions. Figure 3 (a) shows the curve of the total profits of the edge side changing with the number of terminal services when the number of edge IoT agents is 5. As can be seen
40
Y. Ge et al.
from the figure, our method can achieve higher total profits, because the proposed method mainly focuses on the total profits of the edge IoT agents under the electric IoT, and uses dynamic programming to achieve joint resource allocation. DPDA takes into account the interests of both parties in the resource auction market. MRIAM uses Vickery auction to determine the winner of each round, and the winner receives the second highest price for the transaction. Similarly, the graph of the total profits at the edge as shown in Fig. 3(b) changes with the number of edge IoT agents can be interpreted.
(a)
(b)
Fig. 3. The total profits under the three different methods
6 Conclusion In this paper, we propose a multi-round combined auction algorithm, which aims to meet the needs of different levels of business while maximizing the edge profits. Based on the business urgency, the algorithm dynamically priced the bid in submission stage and effectively improved the success rate of task offloading. A resource pre-allocation strategy considering terminal service bid with its level and number is proposed to effectively improve the resource utilization rate of the edge IoT agents. The dynamic programming is used to realize the joint allocation of resources and improve the edge profits. Acknowledgments. This work was supported by State Grid Corporation of China Headquarters Technology Project, “The Research and application of key technologies for dynamic deployment of network resources based on cloud-edge collaboration” (5700-202014179A-0-0-00).
References 1. Xie, G., et al.: Data demand model based on ubiquitous power Internet of Things. In: International Conference on Computer Network, Electronic and Automation (ICCNEA), pp. 333–336 (2020) 2. Nie, Z., et al.: Key technologies and application scenario design for making distribution transformer terminal unit being a containerized edge node. Autom. Electr. Power Syst. 44(3), 154–161 (2020)
Edge Task Offloading Method for Power Internet of Things
41
3. Jiang, X., et al.: A survey on multi-access edge computing applied to video streaming: some research issues and challenges. IEEE Commun. Surveys Tutorials (99), 1 (2021) 4. Zhang, L., et al.: Joint service placement and computation offloading in mobile edge computing: an auction-based approach. In: 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), pp. 256–265 (2020) 5. Sun, W., et al.: Double auction-based resource allocation for mobile edge computing in industrial Internet of Things. IEEE Trans. Ind. Inform. (10), 4692–4701 (2018) 6. Zhao, P., Tian, H., Qin, C., Nie, G.: Energy-saving offloading by jointly allocating radio and computational resources for mobile edge computing. IEEE Access 5, 11255–11268 (2017). https://doi.org/10.1109/ACCESS.2017.2710056 7. Fan, W., Liu, Y., Tang, B., Fan, W., Wang, Z.: Computation offloading based on cooperations of mobile edge computing-enabled base stations. IEEE Access 6, 22622–22633 (2018). https:// doi.org/10.1109/ACCESS.2017.2787737 8. Wang, Y., et al.: Cooperative task offloading in three-tier mobile computing networks: an ADMM framework. IEEE Trans. Veh. Technol. 68(3), 2763–2776 (2019) 9. Yang, X., et al.: Energy efficiency based joint computation offloading and resource allocation in multi-access MEC systems. IEEE Access 7, 117054–117062 (2019) 10. Jin, A., et al.: Auction-based resource allocation for sharing cloudlets in mobile cloud computing. IEEE Trans. Emerg. Top. Comput. 1 (2015) 11. Yue, Y., et al.: A double auction-based approach for multi-user resource allocation in mobile edge computing. In: 2018 14th International Wireless Communications and Mobile Computing Conference (2018) 12. Yang, D., et al.: Crowdsourcing to smartphones: incentive mechanism design for mobile phone sensing. In: International Conference on Mobile Computing and Networking ACM (2012) 13. Wu, F., et al.: A strategy-proof auction mechanism for adaptive-width channel allocation in wireless networks. IEEE J. Sel. Areas Commun. 34(10), 2678–2689 (2016) 14. Xu, J., et al.: Joint service caching and task offloading for mobile edge computing in dense networks. In: IEEE Infocom – IEEE Conference on Computer Communications (2018) 15. Poularakis, K., Llorca, J., Tulino, A.M., et al.: Joint Service Placement and Request Routing in Multi-cell Mobile Edge Computing Networks. IEEE (2019)
VEC-MOTAG: Vehicular Edge Computing Based Moving Target Defense System Bingchi Zhang, Shujie Yang(B) , Tao Zhang, Weixiao Ji, Zhongyi Ding, and Jiahao Shen State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China {zbc3,sjyang,zhangtao17,jwx,2020111592dzy,plus}@bupt.edu.cn
Abstract. Nowadays, with the rising of intelligent vehicles and the gradual maturity of the concept of intelligent transportation, the demand of vehicles for communication and computing power in VANET is rapidly increasing. Therefore, a new network computing mode, vehicular edge computing, is used in the VANET environment to make the data processing and analysis of vehicles closer to the edge, so as to improve the network performance and quality of service. However, a large number of distributed edge servers also make the network structure more complex, which will lead to a great increase in the risk of attacks on edge servers distributed near the road. Also, the collapse of network services will bring potential dangers to traffic safety. Moving Target Defense (MTD) as a popular active defense technology can screen out malicious users and block them to protect the system’s service through reconstructing the mapping between agents and customers under shuffling mechanism. However, due to the dynamic VANET environment, the existing MTD technology is difficult to be directly used in it. In this paper, we propose a vehicular edge computing based MOTAG system (VECMOTAG), which can protect edge computing nodes from DDoS attacks. Different from the traditional mechanism, we formulated a series of constraints to improve the shuffling algorithm. In this way, the system can distinguish malicious users and guarantee the quality of service of the network. The simulation proves that VEC-MOTAG system has good defense capability and can ensure the quality of network service. Keywords: Vehicular edge computing · Moving target defense · MOTAG · Shuffling
1 Introduction Vehicular ad hoc network (VANET) is a new type of point-to-point wireless network, which is used to organize the communication and interaction between vehicles (V2V), vehicles and infrastructure (V2I) and other types of nodes (V2X) [1]. With the rapid increasing demand for vehicle network services and the large volume of information to be processed, vehicular ad hoc network regarded as an important part of future network traffic is used to obtain services or share data through the communication between © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 42–50, 2022. https://doi.org/10.1007/978-981-16-6554-7_5
VEC-MOTAG: Vehicular Edge Computing
43
vehicles, pedestrians or infrastructure. In order to improve the computing performance and network service quality in vehicle environment, edge computing technology is considered as an appropriate solution in VANET. In edge computing, data processing and analysis take place near the terminal equipment. Edge devices play an intermediary role between cloud and transportation facilities. Servers (edge nodes) with computing and storage capabilities are deployed near the vehicle network. When computing and storage services are closer to users (at the edge), edge computing services provide better quality of service. In addition, powerful telecommunication and computing systems are needed to support network applications in vehicle networks [2]. In VANET environment, onboard unit in the vehicle collect data, which is further processed and stored by the edge server. Due to the performance of computing devices in general vehicles is difficult to meet the requirements, so the computing nodes or data centers of edge computing are usually deployed on the road side units (RSUs) [3, 4]. Furthermore, the security problems occurred in traditional edge computing and cloud computing also happen in the edge computing environment of the VANET, such as DDoS attacks. This will bring great potential danger and threat to traffic safety. At present, moving target defense is one of the most effective technologies against DDoS attacks in traditional networks. A. Stavrou team proposed MOTAG, which can protect the service access of authenticated clients from DDoS attacks. MOTAG can continuously shuffle the distribution from client to agent to isolate internal attacks from innocent clients [5, 6]. C. Xu team proposed to use reinforcement learning technology to guide the network to carry out route mutation in MTD to avoid attacks [7–9]. However, considering of the characteristics of VANET, such as high dynamic topology, strict delay requirements, high mobile speed and limited transmission range, the traditional MOTAG model based on shuffle and MTD technology based on route mutation are not suitable. In order to solve the above problems, this paper innovatively optimizes the shuffle algorithm in MOTAG according to the characteristics of VANET network, and establishes the VEC-MOTAG model. The main contributions of our work are as follows: • The original MOTAG technology is generally used in the traditional network service scenarios. In this paper, MOTAG is innovatively used in the edge computing environment of the Internet of vehicles to protect users from DDoS attacks [10]. • The current VANET environment cannot effectively defend against DDoS attacks and trace the source of the attacks. In this paper, we will introduce a trust mechanism based on behavior score to evaluate the credibility of the user’s identity in the Internet of vehicles, and block the untrusted users to ensure the security of the network. • The original shuffle algorithm only considers how to quickly scramble users and assign them to agents, and does not consider many constraints in the network. In this paper, SMT is used to model many constraints in VANET network and guide the strategy of shuffle algorithm, which can ensure the security performance and the quality of network service. The rest of paper is organized as follows: In Sect. 2, the related works are summarized and briefly reviewed. And in Sect. 3, we will introduce thread model and the mechanism of the system in detail. Then we propose an Edge computing based MOTAG system and
44
B. Zhang et al.
give an improved shuffling algorithm in Sect. 4. Next, we made a series of simulations in Sect. 5, where the results show the defense performance of the system and the influence on the QoS of the VANET environment. Last, we conclude this paper in Sect. 6 with a glance at future developments.
2 Related Work Considering the highly dynamic nature of the Internet of Vehicles, the existing proxyswitching moving target defense system is based on traditional network services, so it cannot adapt well to the complex and changeable Internet of Vehicles topology. MOTAG [6] uses a set of indirect proxies to relay the data traffic between innocent clients and protected servers, by continuously “moving” secret proxies to new network locations, and at the same time adjusting the allocation of clients to proxies to isolate internal malicious attackers, can effectively inhibit the external attackers attempt to attack protected server directly. Although MOTAG enforces additional computing tasks to help reduce the throughput of the attacker, it also places a considerable burden on innocent clients. Therefore, the PoW method is suitable for protecting client authentication because authentication packets are rarely sent and delays are more tolerant. However, due to their high overhead, they are not suitable for protecting application data communications. Especially in mobile scenarios, high-frequency calculations caused by handover will seriously affect the timeliness of vehicle-to-vehicle, and vehicle-to-roadside unit communication. The cloud DDoS defense proposed by Quan Jia is improved on the basis of MOTAG, not only for general Internet services that support identity verification, but also for protecting open Internet services designed for anonymous users. Moreover, only a small amount of card shuffling can successfully mitigate large-scale DDoS attacks, but each shuffle will cause the user to perceive a delay of several seconds [5]. SQ-RM [8], DQ-RM [9], and CQ-RM [7] utilize a centralized controller to precalculate the routing mutation space, and can adaptively adjust the learning rate and mutation period. Among them, DQ-RM also uses a nonlinear neural network to approximate the Q value function, which speeds up the convergence speed when the state action space is large. But this requires the network to be relatively stable. Under the circumstance of violent oscillations in the Internet of Vehicles network, a stable routing path cannot be formed.
3 Threat Model and System Overview In this paper, we focus on protecting the vehicle ad hoc network against network flooding attacks. In the scenario of the network of vehicles in edge computing, edge devices providing network services are often the main targets of attackers. We assume that attackers disguise as ordinary users and obtain services from edge devices such as RSU. In this condition, attackers will launch reconnaissance attacks to obtain IP address and port information in order to pinpoint the target RSUs. And then the attackers will guide the botnet they have controlled to launch large-scale DDoS attacks to the edge devices. In the meanwhile, attackers usually do not have any malicious behaviors and they also have
VEC-MOTAG: Vehicular Edge Computing
45
many technical means to hide their reconnaissance attack intension. Thus, it is difficult for the normal flooding defense system and traditional intrusion detection system to trace the source [11]. In VANET environment, the components of VEC-MOTAG system include the cloud center servers, edge servers, and clients. the cloud central servers have powerful performance to solve some complex works such as solving constraints, shuffling the clients, rebuilding the mapping between edge servers and clients and making credit scores in a short period. The central server only can communicate with the edge servers. The edge servers are located near the road to provide a closer network services to guarantee the QoS and they are usually deployed on RSUs. The clients are mainly vehicles, pedestrian and some infrastructures. The RSUs and the clients have a communication range in the wireless network. The vehicles also can be regard as routers to transfer data packages.
Fig. 1. Overview of the VEC-MOTAG architecture
The working mechanism of the VEC-MOTAG is shown in Fig. 1. If RSU wants to provide services for vehicles, the RSUs should first determine which vehicles are available. There are two ways for vehicles to access RSUs. The first is that the vehicle is in the communication range of RSU and it can establish connection directly. And the second is the RSU will chose a vehicle in its communication range and regard it as a router. Then it will transfer the data packages to the other vehicles in its communication range until the destination vehicle receive them. For example, in Fig. 1, v2 can connect with edge server directly and v3 can connect with edge server through v2. In order to ensure the quality of communication, the time of hops should be limited. After obtaining the information, the cloud central server will solve the shuffle algorithm combined with several constraints which need considering in VANET environment and reestablish the mapping between clients and edge servers. Meanwhile, the central server will give risk marks for each client according to the condition that the edge server being attacked. In this way, after several round of shuffling, the malicious clients will get a higher risk mark than the normal user. If the mark is higher than the threshold, the client will be blocked. Furthermore, the risk mark is a cumulative value, so it can effectively prevent malicious users from obtaining server trust by stopping the DDoS attack intermittently.
46
B. Zhang et al.
4 System Design: Vehicular Edge Computing Based MOTAG System In this section, we will propose an improved MOTAG system which can adapt to the complex edge computing based VANET environment. The core of the system is the improved shuffling algorithm. For adapting to the variable VANET environment, we combine a series of constraints with the shuffling algorithm and the constraints can be formalized with SMT as following: 4.1 Accessibility Constraint Due to our system needs to shuffle the clients and assign them to edge servers which are accessible to achieve the purpose of defending against DDoS attacks. It is necessary to judge and record the edge servers which the clients can access. There are two ways for clients to access the edge servers: 1. if the vehicle is in the communication range of a RSU, the clients can directly establish a connection with the edge server, then the edge service server is considered reachable; 2. If the vehicle is not in the communication range of RSU, it can use other vehicles in its transmission range to forward the request and repeat this step, if it can reach an edge server in limited hops, the RSU is also regarded as an available edge server. We assume that there are n RSUs as edge servers which are denoted as s1 , s2 , s3 , . . . sn , and the set of all vehicle in the range of the VANET service range is The set of vehidenoted as Vt = {v1 , v2 , v3 , . . . vm }, and m is the total number of vehicles. cles that vehicle vi can communicate with can be described as Cvi = vi1 , vi2 , vi3 , . . . , vik , d where k is the number of vehicles it can communicate and where vi ∈ Vt , (1 ≤d ≤ k). The vehicle which RSU si can connect directly is Csi = vsi1 , vsi2 , . . . , vsil , where vst i ∈ Vt , (1 ≤ t ≤ l). For the second condition, we need to find the other vehicles which can connect with ri within p hops. For the first hop, the set of the vehicles can be written as: Cs1i = Cvsi1 ∪ Cvsi2 ∪ . . . ∪ Cvsil
(1)
Furthermore, we can formulate the set of vehicles that can access in p times of hop as: p
Csi = Cvp−1 ∪ Cvp−1 ∪ . . . ∪ Cvp−1 si1
si2
(2)
sil
p−1
p−1
where vsij represents the vehicle j which in the set of Csi . If p equals 1, the superscript will be omitted. Thus, the set of whole vehicles Csti which can access si is: p
Csti = Cs1i ∪ Cs1i ∪ . . . ∪ Csi
(3)
VEC-MOTAG: Vehicular Edge Computing
47
4.2 QoS Constraints Network service quality is the most important aspect of users, and many factors may have effect on the QoS in the VANET environment such as the transmission distance, the number of hops and, load balance and the vehicles’ moving direction. Thus, in order to ensure the quality of service, these factors need to be considered in the process of shuffling. Here we propose an algorithm to assign probability for the available edge server si according to QoS influence and it can be described as: pi = αwdi + βwhi + γ wri
(4)
Where wdi represents the impact rate of transmission distance between the vehicle to edge server si and the whi is the impact rate of the time of hops from the vehicle to si . wri represents the impact rate of the angle θi between the moving direction and si . Assume that set of the RSUs that the vehicle available is Ri = s1 , s2 , . . . sq . Because the signal transmission loss is usually proportional to the third power of the distance, so the calculation formula is: 3
wdi
mid
= q
(5)
t3 t=1 md
In which mid is the distance from the vehicle to the RSU si . And as for whi , we define that the QoS will reduce to ε for each hop. It can be represented as: ε ni whi = q
t=1 ε
(6)
nt
Where ni represents the number of hops from the vehicle to si . Then, we define the moving direction of the vehicle is c and the bi represents the direction of vehicle to RSU. So wri can describe as: θi wri = q
(7)
t=1 θt
Where, θi = arccos
bi · c bi × c
In order to ensure that the sum of probabilities is 1, that is, satisfy that α + β + γ = 1.
(8) q
t=1 pt
= 1, it must
4.3 Capacity Constraints Because of the limited bandwidth and performance of edge servers, the number of users they can serve simultaneously is also limited. Therefore, it is necessary to restrict the number of access users of a single edge server and it can be formulated as: i i ucurrent ≤ umax −1
(9)
48
B. Zhang et al.
i i ucurrent is the number of users that has access to the edge server si , and umax is the maximum capacity of si . According to the above constraints, we can sort out the shuffle action space and the probability of candidate edge server being selected in the next slot. In this way, we can protect against DDoS attacks by shuffling algorithm and guarantee the quality of service of edge server in VANET.
5 Evaluation In order to better simulate the vehicle behavior and the deployment environment of the Internet of vehicles, we extract the road information of a block in Beijing through the map. The results are as Fig. 2:
Fig. 2. Abstract road diagram
Fig. 3. Blocked ratio results
For tracking the system performance, we put 100 vehicles and 10 RSUs in this area. The initial position of vehicles is randomly generated on the road, the direction of motion is defined according to the direction of the road, the speed is limited in 20 m/s. the RSUs are set on roads sites and they are fixed. They are evenly distributed on the road and the location of them are pre-defined. And we will generate malicious users randomly from clients in our experiments. They will launch DDoS attacks of random intensity. Due to in the VANET environment, it is meaningless for clients to choose a server without considering constraints because it may select a server which is not in accessible range, we compare our VEA-MOTAG with Non-MOTAG system. Non-MOTAG model only consider the three constraints and choose the best one as the edge server in this slot but not do the shuffling process. The blocking mechanism for malicious users is the same as the VEC-MOTAG system. The defense result is shown as follows: As shown in Fig. 3. We can find that both of the two methods have a nice performance on DDoS defense. In the figure, the Non-MOTAGs blocked ratio increases rapidly between 40 to 60 time slots and the value growing faster than VEC-MOTAG. The reason is that the strategy of Non-MOTAG only choose the best service. Thus, in the Vehicle intensive area, the density of potential malicious users will increase which will cause the cumulative attack strength in the same edge server will growth violently. After blocking them, the mark of malicious users in the area with few vehicles will rise slowly, so they will be banned after
VEC-MOTAG: Vehicular Edge Computing
49
a long time from most malicious users are banned. Due to the constraints based shuffling algorithm, the distribution of malicious users in edge servers is relatively average, the blocked ratio is relatively uniform and has a higher speed to block all malicious users.
Fig. 4. Erroneous blocked ratio results
Fig. 5. Total attack strength results
And Fig. 4 shows the comparison of erroneous blocked ratio between the two system. This ratio indicates the proportion of ordinary users who are regarded as malicious users and blocked. We can find that VEC-MOTAG model can perfectly distinguish malicious users and ordinary users. When all malicious users are banned, ordinary users are not affected at all, and the false blocking rate is always kept at 0. While Non-MOTAG model can quickly block malicious users, many ordinary users are also mistakenly blocked. This will lead to a great impact on the network service quality of ordinary users. Figure 5 indicates the total attack strength in the system. This value reflects the total strength of DDoS attacks on edge servers in the VANET environment. The higher the attack strength is, the worse the QoS of the edge servers to other clients in the region.
6 Conclusion In this paper, we propose a VEC-MOTAG system, which can protect edge computing nodes from DDoS attacks. Different from the traditional shuffling algorithm, we formulated a series of constraints such as coverage constraint, QoS constraint and capability constraint to improve the shuffling algorithm. In this way, the system can distinguish malicious users and guarantee the QoS of the network. And we simulate the Internet of vehicles environment based on edge computing and deploy the system in it. The test results show that the VEC-MOTAG system is proved to have good defense capability and can ensure the QoS. Acknowledgment. This work is supported by the BUPT Excellent Ph.D. Students Foundation CX2020123.
References 1. Luo, G., et al.: Software-defined cooperative data sharing in edge computing assisted 5GVANET. IEEE Trans. Mob. Comput. 20(3), 1212–1229 (2021)
50
B. Zhang et al.
2. Cui, J., Wei, L., Zhong, H., Zhang, J., Xu, Y., Liu, L.: Edge computing in VANETs – an efficient and privacy-preserving cooperative downloading scheme. J. Sel. Areas Commun. 38(6), 1191–1204 (2020) 3. Al-Heety, O.S., Zakaria, Z., Ismail, M., Shakir, M.M., Alani, S., Alsariera, H.: A comprehensive survey: benefits, services, recent works, challenges, security, and use cases for SDN-VANET. Access 8, 91028–91047 (2020) 4. Luo, G., et al.: Cooperative vehicular content distribution in edge computing assisted 5GVANET. China Commun. 15(7), 1–17 (2018) 5. Jia, Q., Wang, H., Fleck, D., Li, F., Stavrou, A., Powell, W.: Catch me if you can: a cloudenabled DDoS defense. In: 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 264–275 (2014) 6. Jia, Q., Sun, K., Stavrou, A.: MOTAG: moving target defense against internet denial of service attacks. In: International Conference on Computer Communication and Networks (ICCCN), pp. 1–9 (2013) 7. Xu, C., Zhang, T., Kuang, X., Zhou, Z., Yu, S.: Context-aware adaptive route mutation scheme: a reinforcement learning approach. IEEE Internet Things J. 8 (2021) 8. Zhang, T., Kuang, X., Zhou, Z., Gao, H., Xu, C.: An intelligent route mutation mechanism against mixed attack based on security awareness. In: 2019 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2019) 9. Zhang, T.: DQ-RM: deep reinforcement learning-based route mutation scheme for multimedia services. In: International Wireless Communications and Mobile Computing (IWCMC), pp. 291–296 (2020) 10. Chai, X., Wang, Y., Yan, C., Zhao, Y., Chen, W., Wang, X.: DQ-MOTAG: deep reinforcement learning-based moving target defense against DDoS attacks. In: International Conference on Data Science in Cyberspace, IEEE, pp. 375–379 (2020) 11. Poongodi, M., Hamdi, M., Sharma, A., Ma, M., Singh, P.: DDoS detection mechanism using trust-based evaluation system in VANET. IEEE Access 7, 183532–183544 (2019). https:// doi.org/10.1109/ACCESS.2019.2960367
AIA Artificial Intelligence and Applications
Short-Term Wind Power Forecasting Based on the Deep Learning Approach Optimized by the Improved T-distributed Stochastic Neighbor Embedding Xing Deng1,2 , Feipeng Da1(B) , and Haijian Shao2 1 School of Automation, Key Laboratory of Measurement and Control for CSE,
Ministry of Education, Southeast University, Nanjing 210096, Jiangsu, China [email protected] 2 School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212003, Jiangsu, China
Abstract. Exact short-term wind control estimating modeling is of widespread significance to the steady operation of the wind control framework, particularly after wind control is associated to the framework in large-scale. Recurrent neural networks (RNNs) with numerous hidden-layers could be a common and agent strategy of profound learning approach, the estimating capacity can be utilized to construct the precise and dependable short-term wind control estimating approaches. Be that as it may, RNN’s generalization capacity is ordinarily restricted since the parameters in covered up layer ordinarily depends on human involvement. The most reason of this paper is to outline the data into the space where test is more divisible based on the factual examination, to accurately analyze the organize neurons and advance the optimization of the RNN’s demonstrate design, in this way to build the high-accuracy of the short-term wind power forecasting. Tests based on NREL dataset are connected to confirm the execution of the proposed RNN-related approaches, the short-term wind power forecasting accuracy is individually progressed by 7.01% and 14.53% refer to 1- and 2-h ahead in one year over the results that accomplished through the conventional approaches. Keywords: Recurrent neural networks · Hidden-layer cluster · Architecture analysis and optimization
1 Introduction Energy is the essential material basis for human survival [1]. Approximate 2% sun radiant energy into wind energy each year based on the estimation of wind power generation expert, the installed capacity is up to 10TW, and it is expected to grow even faster in future. Multiple Energy Sources Hybridization sources at all levels of transmission grid can be integrated through precise wind energy forecasting. Wind control determining in wind ranches can make strides the financial matters of network planning with renewable © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 53–65, 2022. https://doi.org/10.1007/978-981-16-6554-7_6
54
X. Deng et al.
vitality lattices and the operational security of wind ranches, but the instability, aberrance as well as the moo vitality thickness of wind speed can to decrease the unwavering quality of control framework operation. As a result, accurate wind speed forecasting is becoming more and more important for wind power grid and power system operation [1]. Recurrent neural networks (RNNs) with the ability in processing the recursive time series at arbitrarily moment, usually treated as a computational model, and has been broadly utilized in dermatologist-level classification [1, 2], power conversion [3, 4], stack determining [5, 6] and fault-tolerant prescient [7, 8]. In any case, the dishonorable examination of hidden-layers data still truly hampers the execution of profound learning. For energetic characteristic dialect signals, the various leveled representations of the natural dialect had been extricated by a profound learning strategy by Youthful et al. [9] to analyze the various characteristic dialect preparing (NLP) assignments. Tests demonstrated that different hidden-layers advantage the proficiency improvement of the NLP. Be that because it may, the model’s execution was still compelled especially since the information of multilayer hidden-layer isn’t quantitatively taken care of. Undoubtedly for data with higher estimations, the significant learning methodology can still successfully move forward the desire precision of assistant structure, open surface run with rea (ASA) talking to neighborhood and nonlocal structure [10] etc. There are besides various sorts of information modes in hidden-layer since of the contrasts of input variables, the prescient precision and taking care of proficiency of significant learning can be advance moved forward within the occasion that hidden-layer information can be legitimately managed with. Sak et al. [11] designed an asynchronous stochastic gradient descent method in combined with clustering analysis to optimize the two-layer deep learning approach as well as the topology of each layer. Experiments showed that the convergence speed of this method is faster than that of the depth feedforward neural network with multiple parameters. However, this method fails to cover of all the hidden-layer information, resulting the performance is only effective on single network model with medium size. Baytas et al. [12] utilized the clustering strategy to analyze the hidden-layer information depicted by programmed encoding of the proposed time-aware LSTM network. Test assessment appeared that this strategy can successfully optimize the hidden-layer topology, capture the framework of sporadic arrangements of time arrangement, and altogether progress the generalization capacity of LSTM. In any case, this strategy can as it utilized the hidden-layer component to handle the data with steady time, but it is deficiently to analyze the timing temporal problem. Shwartz-Ziv et al. [13] considered the hidden-layer data as the “Black Box” of profound neural arrange or inward organization, and gave a the comprehensive examination based on the common data values that come from the covered up layers. Based on the sketched out talk, the most issues considered in this paper centers the right investigation of the hidden-layer data of profound learning, in specific, endeavor to illuminate the issues that the hidden-layer data scope diminished over time. The information [14] is mapped into the include space where sample partition is less demanding based on the data assessment comes about determined by unsupervised clustering strategies, so as to stream the number of the covered up layer neurons, optimize the design and move forward the RNN’s determining precision of the short-term wind control.
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
55
The structure of the rest of this paper is organized as follows, the fundamental theoretical analysis and preparing stream of the proposed approach is given in Sect. 2. The experimental evaluation based on NREL is given to demonstrate the performance of the designed steps in proposed approach in Sect. 3, and the Sect. 4 summarized the conclusion of this paper and the future works.
2 Proposed Approaches 2.1 Wind Power Forecasting Model Architecture Design The dataset utilized in this paper come from NREL, and every sample comes with two (pow) (win) , netpower (MW) yt at the t-th time occasion, areas: wind speed (M/S, 80M) xt separately. The model of the wind power forecasting is form as: (pow) . (pow) (pow) (win) (win) (1) yt+k = yt ,...,yt−py ,...,xt , ..., xt−px where py and px are positive integrability characterized by Lipschitz remainders and (pow) (win) and the wind speed xt . The considered as slacks related with the wind power yt following inequality holds because the tanh function etc. is typically used as the nonlinear activation function, ∂hi T ≤ (2) U diag f (hi−1 ) ≤ γ u γ f ≤ 1 ∂h i−1 where hi is the features related to the hidden-layers, and γ u , γ f are the positive constants with respect to the variables matrix U and nonlinear function diag f (hi−1 ) . The most reason of this paper is to define the measurable examination of the hidden-layer data and after that optimize the model’s design. The processing stream system of this paper is given in Fig. 1. Firstly, the available inputs are given for the deep learning based on the model inputs selection. Secondly, t-SNE strategies is connected to analyze the dispersion likelihood of the hidden-layer data in RNNs, and get the right number of the hidden-layers neurons for estimating. At last, the execution of the proposed approaches is assessed based on the NREL data information, and then compared with the profound learning approaches. 2.2 Data Preprocessing of the Wind Power Time Series According to the characteristics of real data, the corrupt data of data analysis must be in consideration because the these data destroy the structure of employed data, in particular, the time continuity of time series, to some extent, these data cannot be eliminated because they may contain useful information for the further analysis. In arrange to appraise the obscure amount between two known information or verifiable information based on the accessible data, and relationship between information can provide valuable data for the degenerate information or lost esteem, in this way information interpolation is valuable for the assessment of degenerate information since the information encompassing the
56
X. Deng et al.
Fig. 1. The processing flow framework of the proposed approach.
lost information is accessible. Time arrangement examination and relapse examination are the two factual strategies for information introduction such as closest neighbor introduction, cubic spline introduction and piecewise cubic hermite addition etc. In insights, the middle supreme deviation (Frantic) can be utilized for vigorous measurement related to the drift of given time arrangement. Frantic strategies are utilized to distinguish the exceptions of time arrangement viably, which is delicate to the nearness of outliers and it can expel the non-robust measurements. The outlier-sensitive cruel and standard deviation gauges with the outlier-resistant middle are appropriately handled based on the frantic channel strategies. In addition, data normalization can also lead to pivotal quantities. More precisely, the sampling distribution does not depend seriously on the (pow) (win) have an impact on the wind power yt , value of given data. Because wind speed xt the following data normalization is given to improve the forecasting accuracy of wind power. xi = C +
xi − xmin , xmax = xmin , i = 1, . . . N , C ∈ (0, +∞) xmax − xmin
(3)
where x i , i = 1,…,N, represent input data, xmax and xmin represent maximum and minimum value of x i , respectively. Taking into account the accuracy of regression process, the value of given data can be normalized over the range [C, C+1].
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
57
2.3 Metrics Design for Wind Power Forecasting (win)
(pow)
The forecasting relationship between wind speed xt and the output power yt the wind turbine can be described as, ⎧ (win) (win) ⎪ 0, xt < vCI , xt > vCO , ⎪ ⎨ 2 3 vCI (pow) (win) (win) PR = yt xt − v3 −v3 PR , vCI < xt < vR , v 3 −v 3 ⎪ R CI ⎪ ⎩ R CI (win) vR < xt < vCO , PR
of
(4)
where PR is the rated capacity of the wind turbine (kW), vR , vCI and vCO are the rated wind speeds, cut-in wind speeds and cut-out wind speeds, respectively. Cut-in wind speed is the minimum wind speed of wind turbine grid-connected power generation, and cut-out wind speed is the maximum wind speed of wind turbine grid-connected power generation. Wind power generating electricity will not available when the wind speed is less than the cut-in wind speed or greater than the cut-out wind speed. The wind turbine force is rated when the wind speed is greater than or equal to the rated wind speed, and the cut-out wind speed is also less than the cut-out wind speed. Based on the outlined discussion, the wind power forecasting accuracy of wind power is significantly related to the accuracy of wind speed forecasting. T-distributed stochastic neighbor embedding (t-SNE) is one of the best performing data clustering and visualization methods at present, and can be considered as the preferred method to solve the aforementioned problem. However, t-SNE has large memory consumption and high time requirements, and the data set’s divisibility usually estimated by projecting high-dimensional data sets into low-dimensional space, such as two-dimensional or three-dimensional space, always lacks of the objective evidence due to the lack of performance criteria. In addition, the specified distance used to measure the similarity between the current points and corresponding centroids is always difficult to handle due to the dimension of the multi-elements are not consistent with each other. In order to overcome the aforementioned issues, the tensor product between two arbitrary matrices named Kronnecker product is introduced in the following defined cluster assignment probability distribution Q approach to guarantee the soft assignment probabilities stricter, 2 − v+1 2 λpi ⊗ zi 22 + zi − μj v qij = 2 − v+1 2 2 v j λpi ⊗ zi 2 + zi − μj
(5)
where ⊗ indicates that the corresponding elements of the vector are multiplied, λ and v are constants, zi and μj are the embedded data points and the corresponding jth cluster centroid, respectively. pi is the element related to the auxiliary (Target) distribution P, and defined as, (6) pi = exp dist zi , μj ε Tensor is an extended representation of vectors in high-dimensional space, and it is an extremely important form of data analysis in extracting data features. Matrix usually
58
X. Deng et al.
corresponds to the algebraic structure, but the tensor corresponds to the geometry structure, in other words, the latter is more suitable to the data structure of high-dimensional data. In addition, the significant advantage of this method is mainly focused on that not only the appropriate distance can be defined according to the data distribution, but also fast convergence speed can be also implemented by using the various kinds of data sampling methods even if the embedded data set has a higher dimension. The processing algorithm is provided in Algorithm 1. In particular, sampling algorithms suitable for data distribution can be added based on the characteristics of the data and do not adversely affect for the similarity evaluation of the embedding point and the corresponding cluster center, which obviously is conducive to promote the improvement of computational efficiency and improve the accuracy of similarity evaluation. The pseudo-code of the t-SNE with the improved metric is given in Algorithm 1. Algorithm 1. The improved t-SNE clustering algorithm.
Inputs : dataset
=
,
, ...,
Parameters : iterations , constants , ,learning rate Output : low-dimensional data representation
Compute pariwase affinities with perplexity Perp by using
Solution initialization, from Compute low-dimensional affinities qij by using
endfor end
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
59
3 Experiments The information utilized for the exploratory assessment come from the NREL, with the interim from Jan 1, 2004 to Dec 31, 2004. Note that, the information preprocessing, Lipschitz remainder and wavelet investigation are still be utilized to progress the quality of the utilized information, gauge the demonstrate arrange and maintain a strategic distance from the meteorological time series’ nearby temporal highlight that will engendered over time, respectively. The wind rose in one year is appeared in Fig. 2. The clustering analysis and results evaluated by the conventional t-SNE (TSNE) and moved
(a)
(b)
(c)
Fig. 2. Wind rose. Cluster initialization
Cluster initialization
Iteration 10: error is 47.9631
Iteration 10: error is 47.3684
Iteration 100: error is 0.9477
Iteration 100: error is 0.4476
Fig. 3. The clustering result of the hidden-layers information.
60
X. Deng et al.
forward SNE (ITSNE) are appeared in Fig. 3, and the corresponding convergence speed of the cost function is provided in Fig. 4.
Fig. 4. The convergence speed comparison of the cost function.
Table 1. The performance evaluation Iteration
TSNE error
ITSNE error
Error floating (%)
10
47.96
47.36
−1.25
20
46.92
46.25
−1.43
30
39.31
38.43
−2.24
40
38.65
37.92
−1.89
50
1.28
0.66
−48.44
60
1.20
0.60
−50.00
70
1.10
0.54
−50.91
80
1.04
0.50
−51.92
90
0.99
0.47
−52.53
100
0.94
0.44
−53.19
Aver.Err
17.94
17.32
−6.30
Cost (s)
328.96
612.79
86.28
Wind rose related to wind speed denotes a succinct view that wind speed and direction are typically distributed in the whole. The maximum probability of occurrence of wind speed is mainly focused on low wind speed and the corresponding proportion is more than 16% (January-April), 23% (May-August) and 25% (September-December). In addition, the corresponding wind speed direction is northwest, southeast and north. The outlined statistical results indicate that both of the power and direction of the wind speeds in different months are differently distributed. The model order estimation is provided in Table 1 and shown in Fig. 5 (Table 2).
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
61
Fig. 5. The model order estimation.
Table 2. The performance evaluation Variables
Description
Model order via Lipschitz
Model order via trend estimation
Considered model order
X1
70 m WS
45
5
5
X3
50 m WS
4
10
4
X5
30 m WS
3
13
3
X7
10 m WS
45
7
7
Based on the results that reported in Table 1. The improved algorithm has faster convergence speed and lower prediction error, which reduces the average error by 6.3% and the cost function value by 86.28%. Surmised number of the hidden-layers and comparing neurons can cover the complete the data required for modeling, 6 can be treated as the number of covered up layer neurons. Both Levenberg-Marquardt (LM) and Quasi-Newton strategies (BFGS) are utilized in RNNs. This paper considered the k = 6,12 steps (corresponding to 1 and 2 h) ahead of real wind-power estimating. 80% and 20% of the all dataset are treated as the training and testing samples. The final wind power forecasting error is provided in Table 3. In Table 5, LM-NN: Multilayer Perceptron with LM learning methods; FNN: Fuzzy Neural Network; NARX: Nonlinear autoregressive exogenous neural network model; LM-RNN: RNNs with LM training methods; BF-RNN: RNNs with BFGS training methods; PA-LM- RNN: the optimized model of LM-RNN; PA-BF-RNN: the optimized model of BF-RNN. K-fold cross-validation (K-CV) and “meshgrid search” are utilized for parameters selection in Support Vector Regression. K-CV is that each subset of k folds subset is used for testing one time, then other dataset are employed for training, the number of cross validation of total dataset is k. “Meshgrid search” is to try every possible parameters pairs (c, g) values. The best accuracy of parameters pairs (c, g) can be investigated based on K-CV, where c and g are respectively penalty factor and kernel function parameters.
62
X. Deng et al. Table 3. Forecasting error
Seasons
Steps
ET
LM-RMSE
PA-LM-RMSE
BF-RMSE
PA-BF-RMSE
Spring
6
1647.96
0.2220
0.2152
0.2075
0.1946
Summer
1661.76
0.2041
0.1987
0.1172
0.1086
Autumn
1689.96
0.2350
0.2240
0.1655
0.1599
Winter
1665.33
0.2721
0.2269
0.2585
0.2271
Spring
1651.23
0.6154
0.5261
0.5526
0.4809
Summer
12
1705.68
0.3124
0.3089
0.3406
0.2688
Autumn
1678.93
0.4447
0.3791
0.4548
0.3677
Winter
1719.54
0.5235
0.4351
0.4336
0.3652
ET: Slipped by time in seconds by the proposed approach; LM-RMSE: RMSE using the RNNs with LM preparing on the testing tests; PA-LM-RMSE: LM-RMSE gotten by the optimized show; BF-RMSE: RMSE utilizing the RNNs with BFGS preparing on the testing tests; PA-BF-RMSE: BF-RMSE obtained by the optimized model.
The state trajectories of penalty factor and kernel function parameters are shown in Fig. 6. The forecasting results obtained by the SVR are given in Table 4. The number of the input-hidden-output layers are 1-2-1. 6 neurons with respect to 4 selected features in the input layers. The number of hidden nodes is experimentally set according to the empirical formula. The learning rate and convergence goal are both 0.01. Tansig and purelin are separately chosen as the activation function for hidden and the output layer in the neural network, respectively. 6-steps ahead wind power forecasting results in August, 2004 is shown in Fig. 6.
Fig. 6. State directions of punishment calculate and part work parameters in SVR.
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
63
Table 4. Forecasting results based on SVR Kernel function
BCVM
Bc Bg
RBF Filtered 0.00117006 64 0.03125 SF
Foin
RM
RS
Et
14654 0.000380821 0.996187 307.904170
Filtered 0.00115364 81 0.037037 18103 0.000378552 0.996208 542.848084
BCVM: Best Cross Validation Mean squared error, Bc: Best c, Bg: Best g, Foin: Finished optimization iteration number, RM: Regression Mean squared error, RS: Regression Squared correlation coefficient, Et: Elapsed time in seconds.
The RMSE obtained by the optimized RNN in Spring, Summer, Autumn and Winter compared with the RNN has been reduced by 3.06%, 2.65%, 4.68% and 16.61% (6-steps ahead, LM-learning method), 6.22%, 7.34%, 3.38% and 12.15% (6-steps ahead, BFGS-learning method), 14.51%, 1.12%, 14.75% and 16.89% (12-steps ahead, LM-learning method), 12.98%, 21.08%, 19.15% and 15.77% (12-steps ahead, BFGSlearning method). Compared with the LM-NN, FNN and NARX model, the best forecasting accuracy has been improved about 10.59%, 7.85% and 4.51% (6-steps ahead), 14.52%, 9.15% and 4.76% (12-steps ahead). Based on the outlined discussion, the analysis of traditional methods can be obtained by comparing experimental results, compared with the traditional model prediction methods, this method can effectively improve the prediction performance of the model by optimizing the hidden-layer topology of the forecasting model (Fig. 7).
Fig. 7. 1 h (6-steps ahead) wind power forecasting results in August, 2004.
64
X. Deng et al. Table 5. Forecasting results comparison
Methods
LM-NN
FNN
NARX
LM-RNN
BF-RNN
PA-LMRNN
PA-BF-RNN
RMSE 6-steps
0.2785
0.2511
0.2177
0.2330
0.1872
0.2162
0.1726
RMSE 12-steps
0.5158
0.4621
0.4182
0.4740
0.4454
0.4123
0.3706
4 Conclusions This paper basically analyzes the hidden-layer information of RNNs by utilizing the moved forward t-SNE, and the t-SNE is utilized to analyze the movement probability of the hidden-layer information and optimize the building of RNNs. Since the proper managing with of hidden-layer information can clearly lessen the chance of over-fitting as a run the show caused by as well various neuron centers, so the number of secured up layer neurons is streamlining and so the RNNs generalization capacity is progressed. At long final, the execution of the proposed approaches is surveyed based on the data from NREL, and compared with the noteworthy learning approaches. The exploratory assessment shows up that the examination of the RNNs hidden-layer information can successfully stream the number of the secured up layer neurons, optimize the building and make strides the generalization capacity of RNNs. Acknowledgement. This research is supported by National Natural Science Foundation of China (No. 61806087, 61902158).
References 1. Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–130 (2017) 2. Hinton, G.: Deep learning-a technology with the potential to transform health care. JAMA 320(11), 1101–1102 (2018) 3. Deng, X., Shao, H.: Deep learning approach with optimizatized hidden-layers topology for short-term wind power forecasting. Energy Eng. 117(5), 279–287 (2020) 4. Liu, H., Mi, X.W., Li, Y.F.: Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network. Energy Convers. Manage. 156, 498–514 (2018) 5. Shi, H., Xu, M., Li, R.: Deep learning for household load forecasting-A novel pooling deep RNNS. IEEE Trans. Smart Grid 9(5), 5271–5280 (2018) 6. Kong, W., Dong, Z.Y., Hill, D.J., Luo, F., Xu, Y.: Short-term residential load forecasting based on resident behaviour learning. IEEE Trans. Power Syst. 33(1), 1087–1088 (2018) 7. Shao, H., Deng, X.: AdaBoosting neural network for short-term wind speed forecasting based on seasonal characteristics analysis and lag space estimation. Comput. Model. Eng. Sci. 114(3), 277–293 (2018)
Short-Term Wind Power Forecasting Based on the Deep Learning Approach
65
8. Coelho, I.M., Coelho, V.N., Luz, E.J.D.S., Ochi, L.S., Guimarães, F.G., Rios, E.: A GPU deep learning metaheuristic based model for time series forecasting. Appl. Energy 201, 412–418 (2017) 9. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018) 10. Shao, H., Deng, X., Jiang, Y.: A novel deep learning approach for short-term wind power forecasting based on infinite feature selection and recurrent neural network. J. Renewable Sustainable Energy 10(4), 043303 (2018) 11. Sak, H.C.S.I., Senior, A., Beaufays, F.C.C.O.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth Annual Conference of the International Speech Communication Association, Singapore, Sep 14–18 (2014) 12. Baytas, I.M., Xiao, C., Zhang, X., et al.: Patient subtyping via time-aware LSTM networks. In: 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Halifax, Nova Scotia, Canada, Aug 13–17 (2017) 13. Shwartz-Ziv, R., Tishby, N.: Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810 (2017) 14. National Renewable Energy Laboratory G D S W. https://www.nrel.gov/gis/wind.html [EB/OL].
Adaptive Image Steganographic Analysis System Based on Deep Convolutional Neural Network Ge Jiao1,2(B) 1 College of Computer Science and Technology, Hengyang Normal University, Hengyang
421002, Hunan, China 2 Hunan Provincial Key Laboratory of Intelligent Information Processing and Application,
Hengyang 421002, Hunan, China
Abstract. Adaptive steganography emplaces the message into the hard-to-detect noise area or the complex texture area of the image, so the steganography analysis method based on artificial design features needs to design a very complex feature extraction algorithm to detect the steganography image. In view of the advantages of deep learning steganography in automatic extraction of image features and high detection accuracy, an image steganography analysis algorithm is designed by using deep convolutional neural network, and an image steganography analysis system based on deep learning is developed. The system realizes the functions of neural network structure analysis, image steganography and verification, steganography analysis and so on. The system simplifies the operation process of algorithm comparison, reduces the complexity of algorithm performance evaluation, and verifies the feasibility of the proposed algorithm. Keywords: Image steganographic analysis · Feature extraction · Deep learning · Convolutional neural networks
1 Introduction In order to counter adaptive steganography, steganography analysis is divided into special steganography and general steganography according to the applicability of its algorithm. Special steganographic analysis is mainly a kind of steganographic analysis algorithm generated by a specific steganographic algorithm, such as RS method [1], SPA method [2] and WS method [3]. Due to the great limitations of this kind of steganographic analysis algorithm, even if the detection accuracy is high, it is still facing the situation of being eliminated by The Times. In recent years, various kinds of steganography have emerged, and the general steganography analysis technology with machine learning as the main technology has stepped onto the stage of The Times. Traditional machine learning methods use high-dimensional and complex statistical feature extraction methods, the most representative of which is the feature extraction method of SRM rich model proposed by Fridrich and Kodovsky et al. [4–6]. This method needs to consider quite complex statistical characteristics, so scholars need to continuously increase the difficulty of design, which is time-consuming and laborious. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 66–74, 2022. https://doi.org/10.1007/978-981-16-6554-7_7
Adaptive Image Steganographic Analysis System
67
In order to solve the problems of the rich model method, many scholars use deep learning to conduct image steganography analysis. In 2015, Qian et al. [7] proposed a Steganalysis framework based on convolutional neural network, which realized feature extraction by using Gaussian nonlinear activation function. In 2016, Xu et al. [8] improved the network structure proposed by Qian by introducing a batch normalization layer to prevent the network from falling into local optimization, and tested the S-Uniward and Hill stegography algorithms, and the accuracy of detection was no less than that achieved by SRM. In order to deepen the number of layers of the network and learn deeper features, Wu et al. [9] introduced residual network to the problem of image steganalysis. The biggest feature of residual network is skip connection, which can well extract effective statistical features and achieve better detection effect. In 2017, Ye et al. [10] proposed a 10 layer network structure, and the pretreatment layer introduced a highpass filter (HPF), and adopted the truncated linear unit (TLU) as the activation function, better able to extract steganographic weak signal, the network to dozens of layer of the deep web also don’t have to worry about degradation problems, and can learn more effectively to the characteristics of shallow network learning. In 2018, Yedroudj et al. [11] proposed a Yedroudj Net network adopting the concept of Alex-net, which followed all high-pass filtering cores in SRM, and the weights of all filtering cores did not participate in the back propagation in the network training process. In 2019, Zhu et al. [12] proposed the Zhu-Net network, which used 25 3 × 3 filter cores and 5 5 × 5 filter cores to replace the original 30 5 × 5 filter cores during preprocessing, thus reducing the parameters of the preprocessing layer and making the model easier to fit. Deng et al. [13] for the first time introduced global cosquare differential into the field of steganalysis based on deep learning, and in the training process, they used the iterative square root calculation method to accelerate the fitting speed of the network. Therefore, using the deep learning model to design an image steganography analysis algorithm with strong analysis ability for image adaptive steganography is of great significance to ensure the security of image information.
2 Adaptive Image Steganography Analysis Method Most steganalysis algorithms based on deep learning use convolutional neural networks for feature extraction. This paper combines the characteristics of deep learning and steganography algorithms, and based on the CNN framework, proposes an adaptive image steganalysis algorithm based on deep convolutional neural networks. The deep network can extract steganalysis features more effectively, reduce the cost of manual design features in traditional steganalysis, and improve the accuracy of steganalysis. 2.1 Deep Convolutional Neural Network Structure The self-adaptive image steganalysis model based on deep convolutional neural network is shown in Fig. 1. The first layer of the model structure is the high-pass filter layer, which is a special convolution layer with a convolution kernel size of 5 × 5. The HPF layer can speed up the convergence of the CNN model. The second to seventh layers are convolutional layers, and the output of the previous convolutional layer is used as
68
G. Jiao
the input of the latter convolutional layer. Each convolutional layer contains convolution, nonlinear activation and pooling operations. In order to make the noise residual extracted by the HPF layer have 0 symmetry, an absolute activation layer is added to the first convolutional layer; in order to solve the existing gradient explosion and zero gradient problems, before TanH and ReLU Use the batch standardization layer to get more accurate training results; for the selection of nonlinear activation function, TanH is selected for the first two layers, and ReLU is selected for other convolutional layers and fully connected layers, and the activation result part is sent to the pooling part; pooling The operation uses average pooling, and the last layer of convolution uses global pooling to reduce the dimensionality. The eighth layer is the classification layer, which is composed of a fully connected layer and an activation layer. The classification label is generated by the activation function Softmax.
Fig. 1. Image steganography analysis model.
2.2 Steganalysis Method Adaptive image steganography analysis method based on deep convolutional neural network includes the following steps: Step 1: The grayscale image is embedded with the adaptive image stegography algorithm to make the stegography image data set. The S-UNIWARD and HILL algorithms with an embedding rate of 0.4 are used to embed the secret information in the image. The obtained image and the previous image are used to form a data set. The data set is divided into training set and test set according to a certain proportion. Step 2: The grayscale image of the data set is obtained through the high-pass filtering layer to obtain the residual image. The residual image is trained on the deep convolutional neural network model, and then the network parameters and structure are constantly adjusted to continue the training. Finally, the optimal deep convolutional neural network model is screened out. The high-pass filtering layer preprocesses the image to get the residual image. The purpose of high-pass filtering is to enhance the signal-to-noise ratio in the image and suppress the influence of the image content, so as to help the network learn more effective
Adaptive Image Steganographic Analysis System
69
features. Let X be a feature image with a size of m × n to be extracted, and the calculation formula for the corresponding noise residual is Res = X ⊗ FKV
(1)
Among them ⊗ is the convolution operation, F KV is the filter kernel used for preprocessing, which can be expressed as: ⎛ ⎞ −1 2 −2 2 −1 ⎜ 2 −6 8 −6 2 ⎟ ⎟ 1⎜ ⎜ ⎟ (2) ⎜ −2 8 −12 8 −2 ⎟ FKV = ⎟ 12 ⎜ ⎝ 2 −6 8 −6 2 ⎠ 1 2 −2 2 1 Use the KV kernel to extract the high-frequency feature map of the image to obtain the residual image; use the verification set to evaluate the deep convolutional neural network model to detect the image data of the verification set, and judge the fit of the model based on the performance of the model on the verification set, And then continuously adjust the network parameters and structure, continue training, and finally screen out the best deep convolutional neural network model. Step 3. Select the grayscale image to be detected, and then extract the high-frequency features of the input image using 5 × 5 FKV high-pass filtering kernel. Step 4: Input the extracted high-frequency features into the optimal deep convolutional neural network model for stegography image detection, and output the detection results. Input the high-frequency features of the image into the optimal deep convolutional neural network model. After 6 convolutional layers, the learned features are transferred to the full-connection layer, whose functional form is: n Xin−1 ∗ Wi,j + bnj (3) Yjn = i
Where, Yjn represents the j-th feature map of the nth layer of the fully connected layer, n represents the learnable Xin−1 represents the i-th feature map of the n-1th layer, Wi,j n weights of the feature maps i and j, and bj is the nth layer The learnable bias parameter of the j-th feature map; the Softmax activation function receives the output of the fully connected layer and generates a classification label, which has the form: e xi y i = 2
j=n e
xj
(4)
The value of i is 1 or 2, indicating that there are two types of classification. xi and yi respectively represent the input and output of neuron i. Softmax-Loss is used as the objective loss function. It is a combination of Softmax and multi-class logistic regression, and its function form is: L_ loss = − log yi (i = 1, 2)
(5)
70
G. Jiao
In the training process of the whole network, parameters of the convolutional layer and the full connection layer are optimized to minimize the target loss function, and the effective extraction of steganographic analysis features is realized. Finally, detection results are obtained by classification, with 1 as the steganographic image and 0 as the original image.
3 Model Training 3.1 Experimental Data and Platform The deep learning platform was Google TensorFlow, the GPU model was NVIDIA GTX1080, and dataset was BossBase. BossBase contains 10,000 512 × 512 grayscale images. In order to meet the test requirements, each image in the dataset is partitioned according to the size of 256 × 256, and then the S-UNIWARD and HILL stegorithm is adopted to embed the data into 40,000 carrier images with the embedding rate of 0.4 BPP. Therefore, this dataset contains a total of 40,000 pairs of cover and steganographic graphs. During model training, 20,000 pairs of cover images and stegography images were randomly selected as the training set, and the remaining 20,000 pairs were selected as the test set. The parameters were set as follows: minibatch was 64, the maximum number of iterations was 1000, the learning rate was initialized to 0.001, the gradient descent was 0.9, and the weight attenuation rate was 0.001. The Bn layer is used in the network, because Bn has the characteristic of improving the network generalization ability, so drop out and L2 regular term parameters are removed. Reset the training set randomly every time you complete the training. For HPF layer, the learning rate is set to 0, so that its parameters are fixed and not updated. The process of model training is shown in Fig. 2.
Fig. 2. Model training process.
3.2 Experimental Results and Analysis In the study of image steganography, the HPF layer can accelerate the convergence speed of the network. When the training iteration is 630 times, the model proposed in this paper reaches the optimal level. As the number of iterations increases, the accuracy value of the model constructed in this paper with HPF layer is much higher than that without HPF layer, as shown in Table 1.
Adaptive Image Steganographic Analysis System
71
Table 1. Comparison of loss and accuracy with and without HPF layer. Loss
Accuracy
HPF layer
21.578
0.753625
Without HPF layer
44.305
0.4994
In order to make the statistical model consider the symmetries in noise residuals, an absolute activation layer is added to the first convolution layer. The experimental comparison results of deep convolutional neural network before and after removing the ABS layer are shown in Table 2. As can be seen from Table 3, the accuracy of the network without ABS layer fluctuates greatly and is extremely unstable and difficult to converge. Table 2. Comparison of loss and accuracy with and without ABS layer. Loss
Accuracy
ABS layer
21.578
0.753625
Without ABS layer
24.262
0.749325
The purpose of the TanH activation function is to map the real number to the range of [−1,1], but because the gradient of the function is small, basically less than 1, it goes through a deeper neural network and goes through the chain derivation rule of back propagation, May cause gradient drop. The RELU activation function has only two gradients, which largely avoids the disappearance of the gradient. In order to compare the performance of the two activation functions in the network model, in order to select the best activation function, a comparative experiment was designed. The experimental results show that the first and second layers of the activation function choose the TanH function, and the third to sixth layers of the ReLU function are better than all the TanH functions. The results are shown in Table 3. Table 3. Comparison of loss and accuracy under different activation function. Activation function
Loss
Accuracy
TanH + ReLU
21.578
0.753625
TanH
26.912
0.7497
The deep convolutional neural network and the traditional three-layer CNN are combined to carry out steganalysis on steggenized images with HILL and S-UNIWARD adaptive steggenization algorithm at three different embedding rates. The loss values and precision values are shown in Table 4. Table 4 illustrates that in steganographic analysis, an appropriate increase in the number of convolutional layers is conducive to
72
G. Jiao Table 4. Comparison of loss and accuracy under different models. Algorithm
bpp
D-CNN
3layer_CNN
HILL
0.1
40.466%
44.349%
0.2
44.348%
44.349%
0.4
17.672%
44.348%
0.1
41.036%
44.349%
0.2
44.347%
44.349%
0.4
17.943%
44.348%
S-UNIWARD
extracting features with stronger robustness. High-level convolution layer can effectively extract a wider range of information from the image, which is conducive to obtaining a more effective expression of steganographic analysis. The feature extraction method using global statistical information can more effectively show the changes before and after steganography.
4 System Design 4.1 System Framework Design In order to verify the effectiveness of the image steganography analysis algorithm based on deep learning and solve the complexity of the performance evaluation of the image steganography analysis algorithm, an image steganography and analysis system is developed using Tkiner, OpenCV, Matlab and background data interaction technologies. The architecture of the system is shown in Fig. 3. The system integrates the functions of image steganography, steganography algorithm comparison, neural network structure analysis, image steganography detection based on deep learning, etc., which greatly simplifies the operation process of algorithm performance analysis. The system mainly includes three main function modules, which are image steganography and verification, steganography analysis and verification, and network structure analysis. Among them, image steganography and verification include image steganography and algorithm
Fig. 3. System framework.
Adaptive Image Steganographic Analysis System
73
verification, steganography analysis and verification include information extraction and steganography analysis, network structure analysis includes network parameter structure and network model comparison. 4.2 System Function (1) The main part of the steganography and verification module is to complete the image steganography and the analysis and verification of the steganography effect. Stegography function part mainly includes the functions of reading image, entering information to be encrypted, key saving, stegography image saving and viewing the original image. Stegography algorithm verification part mainly includes algorithm stegography effect comparison, embedded position analysis, histogram comparison and parameter analysis of peak signal-to-noise ratio and pixel change rate. (2) Image steganography analysis and verification module is mainly to complete the function of image steganography analysis and information extraction. Steganalysis mainly includes reading pictures or folders, single sheet analysis, batch analysis, analysis process and analysis result display. Information extraction mainly includes the functions of selecting steganalysis image, inputting embedding key, displaying steganalysis image, extracting information display and decrypting ciphertext. (3) The network structure analysis module mainly carries on the comparative analysis to the network parameters of the deep convolutional neural network used in the steganalysis module. Network structure analysis mainly includes the network model and network parameter analysis of two parts, compared with traditional shallow convolutional neural networks including training long contrast, precision value and loss of contrast, the network structure of display, parameter setting of comparative analysis mainly includes the presence of HPF contrast, the presence of pooling layer contrast, contrast different activation functions and presence of ABS layer contrast. (4) The specific process of system implementation network structure analysis is as follows: a) The Linux server is connected to the cloud server and the network model is trained; b) In the training process, the accuracy and loss values generated are uploaded to the cloud server in real time, and the weight and bias models in the training process are saved to the local area; c) Upload and download network data through cloud server; d) The system interface connects to the database and calls data from the cloud server; e) Parameter analysis and visualization of data.
5 Conclusion The image steganography analysis model designed by deep convolutional neural network can improve the efficiency of image steganography analysis and the accuracy of steganography detection. On the basis of this algorithm, an image steganography analysis system based on deep learning is developed. The system integrates image steganography, steganography algorithm comparison, neural network structure analysis, image steganography detection based on deep learning and other functions, greatly simplifying the operation process of algorithm performance analysis for users.
74
G. Jiao
Acknowledgement. This work is supported by the Scientific Research Fund of Hunan Provincial Education Department (19B082), the Science and Technology Development Center of the Ministry of Education-New Generation Information Technology Innovation Project (2018A02020), the research supported by Science Foundation of Hengyang Normal University (19QD12), the Science and Technology Plan Project of Hunan Province (2016TP1020), the Application-oriented Special Disciplines, Double First-Class University Project of Hunan Province (Xiangjiaotong [2018] 469), the Hunan Province Special Funds of Central Government for Guiding Local Science and Technology Development (2018CT5001), the Subject Group Construction Project of Hengyang Normal University (18XKQ02), the First Class Undergraduate Major in Hunan Province – Internet of Things Major (Xiangjiaotong [2020] 248, No. 288).
References 1. Fridrich, J., Goljan, M., Du, R.: Detecting LSB steganography in color, and gray-scale images. IEEE Multimedia 4, 22–28 (2001) 2. Dumitrescu, S., Wu, X., Wang, Z.: Detection of LSB steganography via sample pair analysis. In: Petitcolas, F.A.P. (ed.) IH 2002. LNCS, vol. 2578, pp. 355–372. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36415-3_23 3. Fridrich, J., Goljan, M.: On estimation of secret message length in LSB steganography in spatial domain. International Society for Optics and Photonics, pp. 23–34 (2004) 4. Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In: 2012 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 234–239. IEEE (2012) 5. Fridrich, J., Kodovsky, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 3, 868–882 (2012) 6. Sedighi, V., Cogranne, R., Fridrich, J.: Content-adaptive steganography by minimizing statistical detectability. IEEE Trans. Inf. Forensics Secur. 2, 221–234 (2015) 7. Qian, Y., Dong, J., Wang, W., Tan, T.: Deep learning for steganalysis via convolutional neural networks. In: Media Watermarking, Security, and Forensics 2015, pp. 9409–94090 J. International Society for Optics and Photonics (2015) 8. Xu, G., Wu, H.Z., Shi, Y.Q.: Structural design of convolutional neural networks for steganalysis. IEEE Signal Process. Lett. 5, 708–712 (2016) 9. Wu, S., Zhong, S., Liu, Y.: Deep residual learning for image steganalysis. Multimedia Tools Appl. 9, 10437–10453 (2018) 10. Ye, J., Ni, J., Yi, Y.: Deep learning hierarchical representations for image steganalysis. IEEE Trans. Inf. Forensics Secur. 11, 2545–2557 (2017) 11. Yedroudj, M., Comby, F., Chaumont, M.: Yedroudj-net: an efficient CNN for spatial steganalysis. In: Proc. of the 2018 IEEE Int’l Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 2092−2096. IEEE (2018) 12. Zhang, R., Zhu, F., Liu, J., Liu, G.: Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis. IEEE Trans. Inf. Forensics Secur. 15, 1138– 1150 (2019) 13. Deng, X., Chen, B., Luo, W., Luo, D.: Fast and effective global covariance pooling network for image steganalysis. In: Proc. of the ACM Workshop on Information Hiding and Multimedia Security, pp. 230−234 (2019)
RETRACTED CHAPTER: An Efficient Channel Attention CNN for Facial Expression Recognition Xingwei Wang , Ziqin Guo , Haiqiang Duan , and Wei Chen(B)
ER
Department of Electronic Engineering, School of Information Science and Technology, Fudan University, Shanghai 200433, China [email protected]
TE
D
C
H
A
PT
Abstract. In order to overcome the difficulty of extracting facial expression features by neural network model, and the problems of complicated training process and parameter redundancy caused by stacking deep network structure, this paper proposes a CNN model that introduces an attention mechanism. In this paper, facial expression images are used as the object, and the research of cfacial expression recognition based on convolutional neural network is carried out. Based on the construction of natural expression feature views in the natural environment, the automatic data enhancement technology for deep convolutional neural networks is introduced, and the attention mechanism is combined to adjust the weights of different channels, and the facial expression recognition with texture information extraction as the traction is established deep learning model and facial expression recognition mechanism. At the same time, we propose a combined loss function. The effectiveness of this method is verified on the FER2013, FERplus, CK+, SFEW and RAF-DB data sets, and good results have been achieved.
C
Keywords: Convolutional neural network · Expression recognition · Attention mechanism
A
1 Introduction
R
ET
R
Facial expression recognition algorithms can be divided into traditional machine learning methods and deep learning-based methods. Deep learning methods can obtain more complex and abstract features, so the performance is improved compared with traditional machine learning methods. In the facial expression recognition (FER) problem, the feature representation finally learned by the neural network model based on the deep learning method has a great influence on the final classification accuracy. Many expression recognition models based on deep learning are built on the single-branch CNN framework [1–3]. Compared with the traditional FER algorithm, the performance has been greatly improved, but it is also difficult to accurately determine from the original overall image. Facial expression is one of the most common and natural ways for human beings to convey emotional state and intention. Mehrabian’s research shows that in interpersonal communication, the information conveyed by facial expression accounts for a very large proportion, up to The original version of this chapter was retracted: The retraction note to this chapter is available at https://doi.org/10.1007/978-981-16-6554-7_184 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022, corrected publication 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 75–82, 2022. https://doi.org/10.1007/978-981-16-6554-7_8
76
X. Wang et al.
PT
ER
55%, 38% from the speaker’s tone, and only 7% depends on the speaker’s content [4, 5]. It can be seen that facial expression plays an indispensable role in the process of information exchange between people. It is of great practical significance to build a system that can automatically analyze facial expressions in the fields of medical treatment, education and driverless vehicles [6]. With the development of hardware and technology, researchers have almost solved the problem of non spontaneous facial expression recognition in the experimental environment, and began to march towards spontaneous facial expression recognition in the natural environment. In view of the success of deep learning in recent years, many researchers have trained deep models on public data sets [7–9]. At present, the research of natural expression recognition focuses on solving two problems. First, the balance between recognition accuracy and computational efficiency of deep learning model. Second, illumination, head posture and occlusion are independent of facial expression.
2 Related Work
R
ET
R
A
C
TE
D
C
H
A
General facial expression recognition methods include four steps, namely face detection, face correction, feature extraction and expression recognition [10]. However, due to different expression definition methods and different expression data forms (2D, 3D and thermal images [11]), there are some differences in the execution process(see Fig. 1).
Fig. 1. Expression recognition process
There are three modes of facial expression recognition data, namely twodimensional, three-dimensional and thermal imaging. Facial expression feature extraction is mainly based on texture information, but the utilization rate of color information is low [12]. In addition, 3D images need higher dimensional geometric operations, and thermal images contain insufficient texture information. Therefore, two-dimensional gray image is the most important research object [13]. There are two main definitions of facial expression. The first is continuous action units. There are about 44 action units based on facial coding system, which researchers later supplemented. Through the combination of single or multiple action units, several
RETRACTED CHAPTER: An Efficient Channel Attention CNN for Facial Expression
77
A
PT
ER
facial expressions are further formed. Because of the complexity of computation, most researchers take the expression composed of a limited set of action units as the recognition object. The second is the discrete basic expression. At the beginning, there are six basic expressions: anger, disgust, fear, joy, sadness and surprise, as shown in Fig. 2. Gao Wen and others [14] described six basic expressions in detail according to the actual characteristics, and built a model based on them. Basic expressions are widely concerned because they are easy to understand and easy to experiment. Later, researchers added the expressionless expression into it to become the seventh basic expression neutral [15, 16]. The proposed method is also based on these seven basic expressions.
C
H
Fig. 2. Six basic emotions. From left to right: disgust, fear, joy, surprise, sadness, anger
3 Method
A
C
TE
D
Facial expression recognition is an important step towards natural and harmonious human-computer interaction. At present, the application of human-computer interaction is still in its infancy, the machine can not accurately understand human emotions and needs, and can only complete the specified steps according to the steps set by the program. This leads to the weak interaction between the machine and the user, and the essential problem is the poor effect of facial expression recognition algorithm.
R
3.1 Face Detection and Correction
R
ET
In the face detection part, we use support vector machine combined with gradient direction histogram. Firstly, the feature vectors are constructed by calculating the gradient direction of the local region, and then all the feature vectors are input into the classifier. If the output result is positive, the face position is returned, specifically, the coordinates of the upper left corner and the lower right corner of the rectangle are detected. Compared with other methods, this method balances accuracy and computation speed better, and is more suitable for online recognition applications. The details of calculating the pixel gradient in the image are shown in the formula. fx (x, y) = f (x + 1, y) − f (x − 1, y)
(1)
fy (x, y) = f (x, y + 1) − f (x, y − 1)
(2)
m(x, y) =
fx (x, y)2 + fy (x, y)2
(3)
78
X. Wang et al.
θ(x, y) = arctan fx (x, y)/fy (x, y)
(4)
Where m and θ are magnitude and direction, respectively. In the face correction part, we use the millisecond set method proposed in, use gradient enhancement to train several regression trees, and then use decision tree set to calculate 68 landmarks including eye contour, nose and mouth contour. 3.2 Improved Network Framework Based on DenseNet
TE
D
C
H
A
PT
ER
Dense convolutional neural network (DenseNet) is a unique convolutional neural network (CNN) architecture, which can minimize the trainable parameters through dense connection mode and many dimension reduction layers (Fig. 3).
C
Fig. 3. Dense convolutional neural network
R
ET
R
A
In fact, dense convolutional neural network has two key super parameters, namely growth rate K and dense block number n. The growth rate represents the number of convolution layer filters, which determines the growth rate of the feature graph. We use 2 × 2 average pooling instead of 2 × 2 maximum pooling, because it forces the correspondence between feature maps and categories to be more adaptive to convolution structure. And maximum pooling discards three-quarters of the information, while average pooling takes all the information into account. In addition, the average pooling is robust to the input spatial transformation because it sums the spatial information. Mean normalization is actually a generalization function, which can prevent dense connections from falling into the problem of over fitting. In order to further improve the performance of the network, we introduced ECANet. ECAnet is a local cross channel interaction strategy without dimension reduction, which can be effectively implemented by fast one-dimensional convolution. In addition, we propose a function of channel dimension to determine the kernel size of one-dimensional convolution adaptively, which represents the coverage of local cross channel interaction. ECA module can be flexibly integrated into the existing CNN architecture. The overall network structure of the algorithm in this paper is shown in the figure below (Fig. 4).
79
PT
ER
RETRACTED CHAPTER: An Efficient Channel Attention CNN for Facial Expression
A
Fig. 4. The improved network structure
3.3 Loss Function
C
H
In the algorithm model proposed in this paper, we use a combined loss function. It consists of three parts: softmaxloss and centerloss. The definition of softmaxloss is as follows: log
D
Lsoftmax = −
m
T
ωyT +byi n i i=1 e
(5)
TE
i=1
eωyi +byi
R
ET
R
A
C
Where m represents the batch size, n represents the number of categories, is the face feature vector, is the category label, and represents the weight and bias. The softmaxloss function can realize the classification between different categories, but it cannot control the internal situation of the same category. On the other hand, the central loss function can represent the intra-class distance of a certain category. For details, please refer to the following formula. 2 1 (6) Lc = xi − cyi 2 2 Where xi represents the face feature vector, and cyi is the class center. The overall loss in this paper is defined as follows: L=−
m i=1
eωyi +byi T
log
n i=1 e
ωyT +byi i
2 1 + α · xi − cyi 2 2
(7)
Where α is a super parameter, which adjusts the weight of the two parts of loss.
4 Experiment 4.1 Experimental Platform Our model is trained and processed on NVIDIA GeForce GTX 1080Ti graphics card. It has 3584 CUDA units, 11 Gb GDDR5X memory, 1582 MHz boost frequency and 11.5
80
X. Wang et al.
TFlops single precision floating-point operation. In the software, our algorithm design is based on Python 3.6 and Pytorch deep learning toolkit. 4.2 Experimental Results We trained three DenseNet models on FER2013, FERPLUS and FERFIN datasets. For the discrete case, the setting of super parameters is slightly different (Table 1).
Network layer
Feature size
Convolution layer
48 × 48
Dense block
48 × 48
Transition layer
48 × 48 24 × 24
Dense block
24 × 24
Transition layer
24 × 24 12 × 12
Dense block
12 × 12
Transition layer
12 × 12 6×6
Dense block
6×6
Transition layer
6×6 3×3
Densenet-1
(1 × 1,3 × 3) × 12
Densenet-3
PT
3 × 3 conv (1 × 1, 3 × 3) × 12
(1 × 1,3 × 3) × 6
(1 × 1, 3 × 3) × 12
H
(1 × 1,3 × 3) × 12
A
1 × 1 conv 2 × 2 AvgPooling
(1 × 1,3 × 3) × 12
C
1 × 1 conv 2 × 2 AvgPooling
(1 × 1, 3 × 3) × 12
C
TE
D
(1 × 1,3 × 3) × 12
A
R 1×1
ET
Classification layer
Densenet-2
ER
Table 1. Three densely convolutional neural model architectures.
6×6 7D-Softmax
(1 × 1,3 × 3) × 24
1 × 1 conv 2 × 2 AvgPooling (1 × 1, 3 × 3) × 12
(1 × 1,3 × 3) × 16
1 × 1 conv 1 × 1 conv 2 × 2 AvgPooling 2 × 2 AvgPooling 3×3 10D-Softmax
3×3 7D-Softmax
R
On the FER2013 dataset, the verifying accuracy of DenseNet-3 is 76.62%, which exceeds 71.16% of the first group in the challenge. We believe that there are two reasons why DenseNet-3 can achieve this result without using any integration method and very few parameters. First of all, feature reuse method increases the input size of subsequent volume layers, and enables subsequent layers to learn new features while accepting prior knowledge of network. Secondly, the setting of dense connection and bottleneck layer greatly reduces the parameters of the network, making the network extract more compact and more distinctive features. On the FERPLUS dataset, the accuracy of DenseNet-2 is 90.58%. This is 4.69% higher than Barsoum’s VGG13. The parameter of DenseNet-2 is 41 times less than that of VGG13. When using DenseNet-1, this number is 92 times, and the accuracy is reduced by 0.52%. In FERFIN, the same densenet-2 achieves 90.89%
RETRACTED CHAPTER: An Efficient Channel Attention CNN for Facial Expression
81
verification accuracy, which validates the assumption that there are noise categories. Because the categories in the database are clearer, dense convolutional neural network has learned more robust features. All results are listed in Table 2. Table 2. Experiment results of accuracy FER2013
FERPLUS
FERFIN
DenseNet-1
75.911%
89.06%
89.25%
DenseNet-2
77.55%
90.58%
90.89%
DenseNet-3
76.62%
90.67%
91.90%
PT
ER
Models
A
5 Conclusion
C
A
References
TE
D
C
H
Facial expression recognition is a complex and non-traditional image processing research due to the particularity of face image data. At the same time, human beings usually show a collection of multiple emotions when expressing emotions, which greatly increases the difficulty of recognition in actual application scenarios. Many researchers believe that dynamic data can extract more useful features to recognize spontaneous facial expression, which is a future research topic worthy of attention. In the future work, we plan to detect spontaneous emotions by considering time information, while still introducing lightweight algorithm to ensure the real-time application of the model in practical applications.
R
ET
R
1. Wu, H.P., Lu, Z.Y., Zhang, J.F., Li, X., Zhao, M.Y., Ding, X.D.: Facial expression recognition based on multi-features cooperative deep convolutional network. Appl. Sci. 11(4), 1428 (2021) 2. Indira, D.N., Sumalatha, L., Markapudi, B.R.: Multi facial expression recognition (MFER) for identifying customer satisfaction on products using Deep CNN and Haar Cascade Classifier. IOP Conf. Ser. Mater. Sci. Eng. 1074(1) 012033 (2021) 3. Cai, Y.X., Gao, J.W., Zhang, G., Liu, Y.G.: Efficient facial expression recognition based on convolutional neural network. Intell. Data Anal. 25(1), 139–154 (2021) 4. Hazourli, A.R., Djeghri, A., Salam, H., Othmani, A.: Multi-facial patches aggregation network for facial expression recognition and facial regions contributions to emotion display. Multimedia Tools Appl. 80(9), 13639–13662 (2021) 5. Sikkandar, H., Thiyagarajan, R.: Deep learning based facial expression recognition using improved Cat Swarm Optimization. Journal of Ambient Intelligence and Humanized Computing 12(2), 3037–3053 (2020). https://doi.org/10.1007/s12652-020-02463-4 6. Multimedia; Findings on multimedia reported by investigators at Erciyes University. Static facial expression recognition using convolutional neural networks based on transfer learning and hyperparameter optimization. J. Eng. (2020) 7. Cen, S.X., Yu, Y., Yan, G., Yu, M., Yang, Q.: Sparse spatiotemporal descriptor for microexpression recognition using enhanced local cube binary pattern. Sensors 20(16), 4437 (2020)
82
X. Wang et al.
R
ET
R
A
C
TE
D
C
H
A
PT
ER
8. Bao, J., Wei, S.S., Lv, J.F., Zhang, W.L.: Optimized faster-RCNN in real-time facial expression classification. In: Proceedings of 2019 2nd International Conference on Communication, Network and Artificial Intelligence (CNAI 2019). Advanced Science and Industry Research Center, Science and Engineering Research Center, p. 8 (2019) 9. Hu, Q.D., Shu, Q., Bai, M.Z., Yao, X.M., Shu, K.X.: FERCaps: a capsule-based method for face expression recognition from frontal face images. In: Proceedings of 2019 International Conference on Power, Energy, Environment and Material Science (PEEMS 2019). Advanced Science and Technology Application Research Center, p. 6 (2019) 10. Ly, T.S., Do, N.T., Kim, S.H., Yang, H.J., Lee, G.S.: A novel 2D and 3D multimodal approach for in-the-wild facial expression recognition. Image Vis. Comput. 92, 103817 (2019) 11. Signal Processing; Study Data from Guangdong University of Technology Update Understanding of Signal Processing. Occlusion expression recognition based on non-convex low-rank double dictionaries and occlusion error model. Electron. Newswkly. (2019) 12. Fu, Y.F., Ruan, Q.Q., Luo, Z.Y., Jin, Y., An, G.Y., Wan, J.: FERLrTc: 2D+3D facial expression recognition via low-rank tensor completion. Signal Process. 161, 74–88 (2019) 13. Pollux, P.M.J., Matthew, C., Guo, K.: Gaze patterns in viewing static and dynamic body expressions. Acta Psychol. 198, 102862 (2019) 14. Kang, K., Ma, X.: Convolutional Gate Recurrent Unit for Video Facial Expression Recognition in the Wild, p. 6. Engineering Society of China (2019) 15. Michael Revina, I., Sam Emmanuel, W.R.: Face expression recognition with the optimization based multi-SVNN classifier and the modified LDP features. J. Vis. Commun. Image Represent. 62, 43–55 (2019) 16. Xiao, Y., Wang, D., Hou, L.: Unsupervised emotion recognition algorithm based on improved deep belief model in combination with probabilistic linear discriminant analysis. Pers. Ubiquit. Comput. 23(3–4), 553–562 (2019)
Handwritten Digit Recognition Application Based on Fully Connected Neural Network Qintian Zhang1(B) , Shenao Xu2 , and Zhiwei Xu1 1 Hunan Provincial Key Laboratory of Wind Generator and Its Control,
Hunan Institute of Engineering, Xiangtan 411101, China 2 School of Mechanical and Automotive Engineering, South China University of Technology,
Guangzhou 510640, China
Abstract. At present, the handwritten digit recognition the problem has received more attention because it has a large standard and easy-to-use mature data set such as the MNIST data set, simple 0–9 digit recognition has been regarded as an entry problem in the field of computer vision. The paper Introduce the characteristics and applications of handwritten digit recognition at first. Then, the traditional research methods and their shortcomings is pointed out; the concept of deep learning are introduced. Taking the convolutional neural network as an example, the key technical characteristics of the convolutional neural network in detail is introduced. Finally, an example explain the application of convolutional neural network in handwritten digit recognition. Keywords: MNIST · Deep learning · Convolutional neural network · Handwritten digits
1 Introduction At present, the use of computers for handwritten digit recognition is widely used, and it is a method for humans to interact with computers. There are many methods for handwritten digit recognition, and the performance of the recognition method usually depends on some attributes, such as the size of the digit, writing style, and recognition rate. One of the main challenges of handwritten digit recognition is the inconsistency of personal handwriting style (i.e., digit size and style), and the type of equipment that collects handwriting. Therefore, it is necessary to have a system that can automatically recognize handwriting patterns with high recognition rates. In the past handwritten digit recognition, many recognition methods have been proposed, such as the digital recognition method of fully connected neural network and the gradient descent method. In this discussion, the use of fully connected neural network technology can reduce the number of classification errors of handwritten digits, reduce the time it takes to complete the recognition, and increase the accuracy of the model to 97%.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 83–89, 2022. https://doi.org/10.1007/978-981-16-6554-7_9
84
Q. Zhang et al.
The concept of Artificial Neural Networks (ANN) originated in 1943. Warren McCulloch and Walter Pitts [1] first created an ANN calculation model based on mathematics and algorithms, called the M-P model. This model describes the mathematical theory and network structure of artificial neurons by simulating the principles and processes of biological nerve cells, and proves that a single neuron can realize logical functions, thus opening the era of ANN research. The structure of a basic ANN consists of three components: input layer, hidden layer, output layer, and is usually a fully connected neural network (Full Connected Neural Networks, FCNN). The meaning of full connection is that each neuron of the current layer is connected to all neurons of the previous layer, that is, the output of the previous layer of neurons is used as the input of the current layer of neurons, and each connection has a weight and is located in the same There is no connection between the neurons in the layer. The FCNN structure diagram is shown in the Fig. 1:
Fig. 1. FCNN structure diagram
Deep Neural Network (DNN) refers to an ANN with more than one hidden layer. DNN is easy to cause the problem of gradient disappearance, that is, when the parameter update rate of the front hidden layer is lower than the rate of the back hidden layer, it shows that as the number of hidden layers increases, the accuracy of the model decreases. In order to solve the problem of vanishing gradient, functions such as relu can be used instead of Sigmoid function as the activation function. The basic structure of the current DNN is based on the RELU function as the activation function.
Handwritten Digit Recognition Application
85
2 Key Technologies of Fully Connected Neural Network in Handwritten Digit Recognition 2.1 Activation Function The activation function is a very important part of the neural network. If there is no activation function, then the output of the neural network is always just a linear combination of each input, so the function of the activation function is to add some kind of non-linear mapping. The following are several common activation functions. 2.2 Sigmoid Function The sigmoid function will map the input to the range of (0,1), the larger value will be mapped to 1, and the smaller value will be mapped to 0. It is intuitively consistent with the distinction between the active and inhibited states of neurons. Mathematical form: 1 1 + e−z e−z 1 f (z) = = 2 1 + e−z 1 + e−z 1 + e−z−1 1 1 1− = f(z)(1 − f(z)) = 2 = 1 + e−z 1 + e−z 1 + e−z f(z) =
(1)
2.3 Relu Function Mathematical form: f(z) = max(0, z)
f (z) =
0, z ≤ 0 1, z > 0
(2)
The RELU function has the advantages of fast calculation speed and simple derivation. And unlike the Sigmoid function, the ReLU function does not have a gradient saturation zone, and there is almost no gradient dispersion. The output of some neurons may be 0, which increases the sparsity of the network and reduces over-fitting. In this paper, the RELU function is used as the activation function. 2.4 Loss Function When our input data passes through the neural network, we get a set of output data. We want to measure the quality of our model, give our model a score, or the ultimate goal we want to optimize, we need to define a loss function. Calculate our output value and the true value through the loss function to obtain the loss value (loss). In order to make
86
Q. Zhang et al.
the model better and fit the real situation, we need to find a suitable network weight to minimize the output loss. The most commonly used for classification problems is Cross Entropy. Cross entropy can measure the degree of difference between two different probability distributions in the same random variable. In machine learning, it is expressed as the difference between the true probability distribution and the predicted probability distribution. The smaller the value of cross entropy, the better the prediction effect of the model. Cross entropy is often standard configuration with softmax in classification problems. Softmax processes the output results so that the sum of the predicted values of multiple classifications is 1, and then calculates the loss through cross entropy. 2.5 The Core Code of the Fully Connected Neural Network in the Application of Handwritten Digit Recognition The fully connected network consists of three basic layers, namely the input layer, the hidden layer and the output layer. There are five hidden layers in this paper, all of which are fully connected. The hidden layer outputs the 28*28-bit pixel value through linear transformation into 10 bits. The specific code is as follows (Fig. 2):
Fig. 2. The specific code of the core part of the fully connected neural network
Handwritten Digit Recognition Application
87
3 MNIST Data Set In this discussion, the MNIST handwritten digit database is used as the experimental data. There are 60000 pictures in the MNIST database as the training data, and the other 10,000 pictures are used as the test data. Each picture shows a number from 0–9. Contains the corresponding labels of each training data, where the label set contains 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 altogether 10 classification data. This data set was developed by the National Institute of Standards and Technology (NIST), and each image in the database is a 28 × 28 pixel grayscale image. The training data set is shown in Fig. 3.
Fig. 3. MNIST partial data example
3.1 Analysis of Results Data is an inseparable part of the learning process, it provides the guiding ideology of the experiment and the establishment of an accurate CNN. In the case study of this discussion, 30,000 samples were randomly selected from 60,000 samples as training examples, another 3,000 samples were used for verification, and the last 10,000 test samples were used as test examples. Experiments show that after 10 iterations, the loss rate and accuracy rate tend to stabilize. The final accuracy rate is more than 97%. The experimental results are shown in Fig. 4.
88
Q. Zhang et al.
Fig. 4. The correct rate and loss value of the model after 10 rounds of training
Handwritten Digit Recognition Application
89
References 1. Wang, Y.S., Wang, X.K.: A contraband detection system and deployment method based on convolutional neural network. Technol. Innov. Appl. 23, 136–138 (2020) 2. Cao, H.J., Wu, Z.M.: Linear filtering analysis and simulation of noisy images. Inf. Technol. Inform. 8, 50–52 (2017) 3. Yan, T., Zhou, Q.: Deep Learning Algorithm Practice (Based on Theano and TensorFlow). Electronic Industry Press, Beijing (2020) 4. Lai, X.W.: Application of TensorFlow reading data in simple image recognition. Mod. Inf. Technol. 3(12), 98–99 (2019) 5. He, S.: Application of convolutional neural network in handwritten digit recognition. Comput. Knowl. Technol. 16(21), 13–15 (2020) 6. Wei, F., Shan, L.: Research on handwritten digit recognition technology based on CNN optimization 7. Zhang, T., Yang, J., Song, W., et al.: Improved convolutional neural network model design method. Comput. Eng. Des. 40(7), 1885–1890 (2019) 8. Ma, Y.Y., Shi, J.R.: Convolutional neural network and its application in handwritten digit recognition. J. Hubei Inst. Technol. 37(6), 66–72 (2017) 9. Yin, X.W., Wang, Z.Z., Meng, Q.L., et al.: Research on handwritten digit recognition based on improved LeNet-5. Inf. Commun. 32(3), 17–18 (2019) 10. Lv, H.: Design of handwritten digit recognition system based on convolutional neural network. Intell. Comput. Appl. 9(2), 54–56 (2019). 62 11. Xing, M.: Design and implementation of handwritten digit recognition model based on TensorFlow. Electron. Technol. Softw. Eng. 2, 56 (2019) 12. Dai, H., Chen, H.M., Li, Z.S.: Digital recognition based on convolutional neural network. J. Guizhou Normal Univ. (Nat. Sci. Ed.) 35(5), 96–101 (2017) 13. Chen, Y., Li, Y.Y., Yu, L., et al.: Handwritten digit recognition system based on convolutional neural network. Microelectron. Comput. 35(2), 71–74 (2018) 14. Zhang, Q.H., Wan, C.X.: Overview of convolutional neural networks. J. Zhongyuan Inst. Technol. 28(3), 82–90 (2017) 15. Zheng, Y.P., Li, G.Y., Li, Y.: A review of application research of deep learning in image recognition. Comput. Eng. Appl. 55(12), 20–36 (2019) 16. Zhang, Q., Zhang, R.M., Chen, B.: Overview of research on image recognition based on deep learning. J. Hebei Acad. Sci. 3(15), 28–36 (2019) 17. Liu, J.W., Xie, H.J., Luo, X.L.: Research progress in the application of generative confrontation networks in various fields. Acta Autom. Sin. 45(10), 1–38 (2019) 18. Zhang, P., Cui, M.T., Xie, Q., et al.: Research on plant image recognition method based on deep convolution generative adversarial network. J. Southwest Univ. Natl. (Nat. Sci. Ed.) 45(2), 185 (2019)
Detection System of Truck Blind Area Based on YOLOv3 Yang Zhang(B) , Xia Zhu, Yang Bu, Wenjing Ding, and Yilin Lu Jinling Institute of Technology, Nanjing 211169, China
Abstract. This letter presents a blind area detection system of the truck based on YOLOv3 algorithm, aiming at the problem that the driver cannot fully observe the surrounding environment because of the multi-directional blind area of the truck. Firstly, the system modified the loss function and the number of anchor frames to realize multi-scale detection. Then it also improved the feature extraction network, the detection accuracy and speed has been greatly improved. In addition, the system also uses a residual neural network as the feature extraction layer to make the prediction output module faster. Finally, Yolov3 draws on the idea of feature pyramid network to predict multi-scale feature graphs, that is, at three different scales, each cell on each scale will predict three boundary boxes. The simulation results show that the blind spot detection system based on YOLOV3 can achieve real-time detection effect, and improve the accuracy of blind spot detection on the condition that the detection speed is maintained. Keywords: Target detection · YOLOv3 · Van blind area · Machine vision · Loss function
1 The Introduction Van blind area [1–4] the research site has multiple visual blind spots due to the truck [5], so that drivers cannot fully observe the surrounding environment, easy to happen traffic accidents. Therefore, a reliable and safe detection system of freight car blind spot is developed [6–8]. It has great significance to the traffic safety of road vehicles. In this paper, a combination of scientific research projects, through YOLOV3 [9–11]. The algorithm greatly improves the detection accuracy and detection speed, and realizes the accuracy and rapidity of detection. The design of the detection system through the multi-angle and all aspects of the detection of the truck blind area, and timely warning, for the truck driver to provide real driving convenience, so that the highway traffic has a higher safety guarantee.
2 YOLOv3 2.1 Introduction to YOLOV3 Algorithm YOLO [12, 13] the full name of the algorithm is You Only Look Once, which was first proposed by Joseph Redmon et al. Its principle is to take the whole picture as © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 90–100, 2022. https://doi.org/10.1007/978-981-16-6554-7_10
Detection System of Truck Blind Area Based on YOLOv3
91
the input of the neural network, and finally directly output the position and category information of the regression target box in the output layer. The biggest innovation of YOLO algorithm is that object detection is solved as a regression problem and the endto-end structure is realized.YOLOV1 algorithm is better than CNN in detection speed [14] The algorithm has been improved to some extent, but it also has some defects such as low target positioning accuracy, poor detection effect for small targets and large amount of computation.YOLOv2 is improved on YOLOV1 algorithm with reference to SSD [15] The network, which uses a 3*3 convolution kernel, doubles the number of channels after each pooling operation, providing some improvement in detection speed, but still has some shortcomings that need to be improved. YOLOV3 algorithm is based on YOLO and YOLO2 algorithm to optimize the algorithm to achieve multi-scale detection and modify the loss function [16–18] and the number of anchor boxes improves the feature extraction network. The detection accuracy and speed have been greatly improved. It uses a residual neural network (Darknet-53) as the feature extraction layer to make the speed faster. In the prediction output module, YOLO V3 draws on FPN [19] (Feature Pyramid Network) algorithm to predict the multi-scale Feature map, that is, at three different scales, each cell on each scale will predict three boundary boxes, as shown in Fig. 1.
Fig. 1. Bound box prediction
According to YOLO9000, dimension clusters are used to predict bounding boxes. The network predicts four coordinates for each bounding box. tx ty tw th (,) is the offset value of the cell relative to the upper left corner of the image; (,) is the length and width of the corresponding scale anchor box; and the width and height of the bounding box have been pre-set. It is predicted to correspond to the following equation: cx cy pw ph pw ph bx = σ (tx ) + cx by = σ (ty ) + cy bw = pw etw bh = ph eth
(1)
92
Y. Zhang et al.
In the process of image training, the method of sum loss by square error is usually used. If the calibration of the true value of a coordinate forecasting is t*, gradient calibration of true value minus the we predicted value: ˆt* − t*. The truth value can be easily deduced back from the above equation. Yolov3 uses logistic regression to predict the object score for each bounding box. If the bounding box overlaps more with the truth object than the other bounding boxes, the value is 1; If the bounding box has a dimension size prior and a position prediction of the bounding box. The width and height of the cuboid were predicted as the offset of the center of mass of the cluster, and the Sigmoid function was used to predict the central coordinates of the box relative to the location of the filter application. The network position shown in Fig. 2 is represented by the upper-left coordinate of the network, the dashed box represents the anchor box, and the blue box represents the offset of the target candidate box relative to the anchor. The target candidate box is located in the grid (,), the center of the grid is the anchor box center, and are the width and height of the anchor box respectively, and are the length and width offset of the target candidate box relative to the anchor box, and are the logistic regression function, cx cy pw ph bh bw σ σ (x) = 1+e1 −x .
Fig. 2. Schematic diagram of target candidate box
2.2 YOLO3 Network Structure Yolov3 consists of three modules as shown in Fig. 3, which are Darkent53 Feature extraction module, FPN (Feature Pyramid Network) Feature Pyramid and prediction branch module respectively. The network structure of Darknet53 feature extraction module. This module is composed of five residual structures, and each residual module reduces the output feature size to half of the original size. The output characteristic maps of residual modules 3, 4, and 5 are used as the input of FPN structure.
Detection System of Truck Blind Area Based on YOLOv3
93
Fig. 3. Structure diagram of YOLOV3
The network structure of the FPN feature pyramid module, and the DBL structure consists of the Conv layer. The Batch Normalization layer is composed of the Leaky Relu activation layer and the Batch Normalization layer. The FPB module contains the feature information of three prediction branches of large size, medium size and small size target detection. The predicted branching structure. FPN structure is mainly used to generate fusion preferential gross energy 1,2,3 features containing multi-scale feature information, and then DBL structure and volume layer of size 1*1 are used to generate the final feature Output 1,2,3 of the model.
3 System is Introduced 3.1 System Architecture Diagram Figure 8 is the architecture diagram of the yolov3-based blind spot detection body system for freight cars. First, the image is collected through the camera, and the collected pictures are transmitted to the processor through the camera. The OpenCV algorithm is used to preprocess the image first, and then through the Coco data set and TinYyolov3. After filtering, the image is analyzed and the image logo is overlaid (Fig. 4).
Fig. 4. System architecture diagram
94
Y. Zhang et al.
3.2 The System Design First, cameras are installed around the truck (Figs. 5 and 6).Van startup camera equipment, electric start, camera will blind spots inside the truck around, the camera photographs processor via Bluetooth, the processor through a series of algorithm to judge whether there is dangerous for truck driver inside the blind spot of the things, if detected blind area there are dangerous for truck driver, driving indoor display images will switch to the blind area within the things can be dangerous for truck driver, and driving indoor acoustics can be dangerous to broadcast blind area in name, position, will move, how fast speed, etc. If the van is in reverse, a display in the cab shows the scene behind the van. Figure 7 Working flow chart of freight car blind spot detection system.
Fig. 5. System application scene view
Fig. 6. Post-view of system application scenario
Detection System of Truck Blind Area Based on YOLOv3
95
Fig. 7. Working flow chart of freight car blind spot detection system
4 The System Test 4.1 The Test Environment The operating system used in this experiment is Ubuntu16.04, the hardware used is Raspberry Pi 4B, the system environment is Open Euler 20.09, the graphics card model is NVIDIA GTX2060 8G, the experiment is run based on GPU, the development environment is PyCharm, and the development kit used is OpenCV, NumPy and Imutils. 4.2 Test Indicators In the training model of this experiment, different parameters are set through the change of visual training output parameter loss curve to obtain multiple training models, Precision, Recall, AP, MAP and detection speed were used as indicators to measure the model. Among them, the detection speed is very important to meet the real-time detection requirements for the target detection in the freight car blind area. Loss Function In statistics, the loss function is often used as a parameter estimate and is a function that represents the difference between the estimated value and the true value of a data instance. In machine learning, the loss function is used to estimate the degree of difference between the expected value and the real value of the training model. L(Y , f (x)) f (x) Y in general, the smaller the value of the loss function, the better the robustness of the training model. The YOLOV3 loss function consists of target location loss function,
96
Y. Zhang et al.
confidence loss and target classification loss Losscoor Lossconf Looscls . Loos = Losscoor + Lossconf + Looscls
(2)
Target location loss takes the mean square error (MSE) as the objective function of the loss function. Losscoor Firstly, the values between each prediction region and the corresponding target real region are calculated. IOU Where, is the area of the target real region, is the area of the predicted region, ∩ is the intersection, and union is. Area(A) Area(B) The Intersection of two regions is obtained by the ratio of the Intersection area between the real region and the predicted region to the area that the real region and the predicted area tend to be Union. The region whose value is greater than the threshold is screened out through the preset threshold IOU IOU IOU . IOU =
|Area(A) ∩ Area(B)| |Area(A) ∪ Area(B)|
(3)
Losscoor , where is the probability of containing a target in the forecast air, represents the abscissa of the center point of the target prediction area, represents the vertical coordinate of the center point of the target prediction area, represents the width of the area, and represents the height of the area. mask x y w h i Is the fifth prediction box, is the prediction branch number of the game, is the total number of prediction boxes corresponding to the real box, the value represented by the subscript is the value of the prediction box, and the value represented by the subscript is the value of the target prediction box i j n p t. Losscoor =
3 n
mask
i=1 j=1
× [(xp )j − (xt )j ]2 + [(yp )j − (yt )j ]2
(4)
+ [(wp )j − (wt )j ]2 + [(hp )j − (ht )j ]2 Precision (Precision) Accuracy rate is the ratio between the number of correct targets detected and the sum of the number of detected targets, and is a measure of the accuracy of detection model. Its calculation formula is as follows: p=
TP TP + FP
(5)
Recall Rate The recall rate is the ratio between the number of correct targets detected and the sum of the number of real targets marked manually, and is a measure of the total error rate of the detection model. Its calculation formula is as follows: p=
TP TP + FN
(6)
Detection System of Truck Blind Area Based on YOLOv3
97
(MAP, Mean Average Precision) Accuracy rate - recall rate (PR) curve is drawn according to the corresponding accuracy and recall rate curve. Average accuracy (AP) is the area value under the precision-recall rate (PR) curve, which is obtained by integrating the precision-recall function. The average accuracy is the average of the correct target of the test over multiple test sets. AP =
1
P(R)dR
(7)
0
Map is obtained by averaging the average precision (AP), and the formula for calculating the mean average precision map is as follows: AP mAP = (8) N Detection Speed (FPS, Frames Per Second) For the detection algorithm model, the detection speed is an important index, and the target detection in the truck blind area is to realize the real-time detection of the target. In this experiment, the detection speed of the model is calculated by stimulating the time spent on the test data set, and the unit is frames per second. 4.3 The Experimental Process In this paper, the YOLOV3 algorithm was used to get the model to detect and visualize the objects in the blind area of freight cars in the test set through pictures, as shown in Figs. 8 and 9. In the process of project experiment, the highway scene is simulated and the camera is installed on the model truck to realize tracking and recognition through the target detection objects such as bicycles, pedestrians and obstacles, as shown in Figs. 10, 11, 12 and 13.
Fig. 8. Visual detection of objects in the blind area of freight cars
98
Y. Zhang et al.
Fig. 9. Application of YOLOV3 algorithm in the image
Fig. 10. Human vehicle detection
Fig. 11. Small object detection
Detection System of Truck Blind Area Based on YOLOv3
99
Fig. 12. Low object detection
Fig. 13. Large object detection
5 Conclusion With the rapid development of logistics industry under the Internet environment, road transportation has become the pillar of transportation, and the existence of truck blind area will cause great harm to truck driving and road traffic. In this paper, YOLO3 algorithm is applied to the tracking and automatic detection system in the truck blind area. Through experimental simulation and practical application, the feasibility of the system and the rapidity, accuracy and robustness in the process of blind area detection are verified. The cost and implementation of the technology are close to the market and relatively reasonable. Of course, the system also has some shortcomings, such as low recognition rate for small things. Therefore, it is necessary to further optimize the algorithm in the future. From the perspective of future development, the blind spot detection system of freight cars will be widely used.
References 1. Yang, S.: Research and implementation of vehicle blind zone warning based on monocular vision. Zhejiang University (2015)
100
Y. Zhang et al.
2. Zhao, Z., Zhang, Y.: Overview of vehicle blind zone monitoring system. Autom. Electr. Appliances 10, 2–11 (2018) 3. Liu, H.: Research on vehicle detection and tracking algorithm in blind zone. Beijing University of Technology, March 2017 4. Sun, W.: Research on causes mechanism and countermeasures of big truck traffic accidents. Intelligent Computer and Application (2014) 5. Yuan, J., Chai, L.: Vehicle blind zone detection system. Intell. Comput. Appl. 6, 12–26 (2017) 6. Liu, Y.: Research on object recognition of intelligent driving vehicle based on machine vision. Jilin University, pp. 4–22 (2017) 7. Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2018) 8. Han, Y.: Research on motion vehicle detection and tracking algorithm based on video image. Harbin Institute of Technology (2015) 9. Chandra, V., Sarkar, P.G., Singh, V.: Mitral valve leaflet tracking in echocardiography using custom Yolo3 (2018) 10. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. Comput. Vis. Pattern Recogn. 18(12), 4308 (2018) 11. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018) 12. Huang, J., Li, L.: Research on binocular recognition method based on improved YOLO. Comput. Digital Eng. 64(04), 808–811 (2018) 13. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Pattern Recognition and Computer Vision, pp. 6517–6525. IEEE, New York (2017) 14. Ren, S., He, K., Ross, G., et al.: Faster R-CNN: towards real -time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017) 15. Li, F., Meng, L.: Pedestrian detection algorithm based on feature pyramid SSD. J. North China Univ. Sci. Technol. (Nat. Sci. Ed.) 12(15) (2020) 16. Yang, K., Xu, Y., An, X.: Vehicle detection method based on deep learning. Comput. Netw. 44(19), 58–61 (2018) 17. Zhang, X., Gao, H.: Deep learning based autonomous driving technology overview. J. Tsinghua Univ. (Sci. Technol.) (2018) 18. Fen, X., Feng, X.: Pedestrian detection based on motion compensation and HOG/SVMClassifier. In: 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics (2013) 19. Wang, F., Wang, L., Zhang, R., Zhao, Y., Wang, Q.: Pedestrian detection algorithm based on fusion FPN and Faster R-CNN. Data Acquisit. Process. 5(15) (2019)
Driver Fatigue Detection Algorithm Based on SMO Algorithm Xia Zhu(B) Jinling Institute of Technology, Nanjing 211169, China
Abstract. In this paper, a real-time detection technology of driver fatigue state was proposed, which can effectively judge whether the driver had entered the fatigue state. In this paper, the real-time driver fatigue detection algorithm based on machine vision was studied, and a fatigue state detection algorithm based on SMO selection strategy was proposed. The SMO algorithm was used to detect the facial features in video, especially the eye area. The CNN neural network structure was used for the training test. The experimental results showed that this method can effectively detect fatigue and give feedback to the driver by sending out warning sound to prompt the driver to rest, thus reduced the accidents caused by fatigue driving. Keywords: Face location · Feature parameters · Sequential minimal optimization · Fatigue detection
1 Introduction According to the Statistical Communique of the People’s Republic of China on the 2019 National Economic and Social Development released by the National Bureau of Statistics on February 28, 2020, the annual death toll of road traffic accidents in China in 2019 was 1.80 per 10,000 vehicles, a decrease of 6.7%, and the number of casualties still ranks first in the world [1]. There are many reasons for these traffic accidents, among which fatigue driving is the least easy to control. Fatigue driving usually happens on the highway at night. According to statistics, traffic accidents caused by fatigue driving have accounted for 20%–30% of road traffic accidents, accounting for about 45% of the total number of serious traffic accidents. Because of this huge data proportion, fatigue driving has also become the most important cause of traffic accidents. How to effectively judge the driver’s fatigue condition, to effectively reduce the probability of accident, to reduce the possibility of traffic accident, it appears that this technology has a very important significance. A long time ago, due to the limitations of testing tools and methods, only the physiological information of the driver can be used to detect the fatigue information of the driver. Although it is more complex to detect in this way, the results detected by this way are very accurate. In 1994 by the research and development of Japan’s pioneer heart rate to test whether or not the driver is tired, by detecting and warning system, the method to detect very inconvenient, you need to use the tool to contact the driver, this will © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 101–110, 2022. https://doi.org/10.1007/978-981-16-6554-7_11
102
X. Zhu
give the driver brings a lot of convenience, but was very complicated and tedious, but due to test the driver’s body information directly, so notice this method detected result is very accurate. In 2001, Qiangji et al. from Rensselaer Polytechnic Institute in New York, USA, designed a hardware system to locate human eyes by designing camera by improving Perclos method. This system detects the pupil of human eyes and comprehensively determines whether human eyes are tired according to the saturation of detected pupil information. Seeing Machines company, Australia in 2003 by a tiny sensor installed on the dashboard, developed a driver condition monitoring system, using this system, you can get the driver’s head and facial features, according to these facial features comprehensive judgement whether drivers have a doze and glance left and right of the situation, in order to determine the driver fatigue information, and feedback to the driver. Volvo in 2015 for the driver’s safety on the XC60 series cars installed driver safety warning system, by installing the windshield and the car rear view mirror camera to real-time monitoring vehicles and lane marks between the distance, the warning system and the different, in front of him is by judging the current form of the vehicle trajectory, to judge whether the car belongs to the normal driving conditions, when the driving conditions appear abnormal, to tip the driver by means of alarm, reduce traffic risks. Domestic colleges and universities also have quite achievements. In March 2018, Wu Han university of science and technology Han Zheng designs a random forest cascade regression method based on improved the key to detect human face feature points, the essence of this algorithm is by testing the driver’s eye and mouth area information, through the blink, and yawning mouth open degree and frequency of the combination of all of the above information is integrated to show tested the fatigue status of [1]. High Yuan in May 2018, xi ‘an university of technology in this paper, a fusion of multiple fatigue characteristics of fatigue state detection algorithm, this algorithm with Han Zheng has the same place, they need to get the driver’s eye and mouth more information, but this algorithm combines the driver’s head movement process combines three information integrated judgment fatigue information [2]. In June 2018, Li Dewu from Hunan University of Technology took into account that there would be errors in image segmentation due to light problems when doing face image segmentation, and took advantage of the clustering characteristics of skin color in chroma space to adopt the face positioning method based on skin color segmentation [3]. In May 2019, Li Qingchen from Zhengzhou University designed a fatigue state detection system which integrated multiple fatigue indicators. This system solved the deviation caused by the influence of the actual test environment due to special circumstances and carried out fatigue detection based on the result of the combination of surface, head, light and other information [4]. There are many definitions of fatigue state. For the human body, fatigue state means that the ability to work is reduced and the ability to respond is weakened. Most of the manifestations of these states can be reflected in the face of people. The fatigue state detection adopted in this paper is to judge whether the driver is in a fatigue state through the duration of the closed state of human eyes. Because this article take the closed state of the human eye to see the driver’s fatigue state, so in terms of feature extraction is positioned in the human eye localization, and extract the information to the human eye, the main steps of the human eye feature extraction is that how to accurately detect faces and accurate positioning to the position
Driver Fatigue Detection Algorithm Based on SMO Algorithm
103
of the human eye, this article’s approach is by means of machine vision in the powerful image processing library, through the SMO algorithm for training of a large number of human eye open and close, to achieve the extraction of accurate location information to the human eye.
2 Fatigue State Recognition Based on Machine Vision 2.1 Face Feature Location In face positioning, the first step is to detect the position of feature points, which are mainly on the left side of the nose, the lower side of the nostril, the pupil position, the lower side of the upper lip and so on. After obtaining the position of these feature points, the face can be corrected by position-driven deformation. The face localization method adopted in this paper is through Detect Face function in OpenCV library, which can extract the face information in the picture and the key feature points of the human face based on its own training results. With the help of this function, the image data collected by the camera can be grayed first, and then processed by this function, and then the precise positioning of the face can be obtained. In addition to the face information, this function also contains the information of each feature key point in the face, such as the human eyes, nose, mouth and other feature information, which can be extracted from different parts according to individual needs, to carry out more accurate detection and processing of the face. 2.2 Extraction of Eye Feature Parameters Based on SMO Selection Strategy In face detection, feature point training based on SMO selection strategy adopts four features: the position of the eyebrows, the central point of the chin, the position of the left cheek and the position of the right cheek. The SMO algorithm is used to classify the training features. SMO is the extreme of the decomposition algorithm. It only selects two operators to deal with each problem, optimizes them, updates SVM according to the optimized results, and observes the changes caused by the update [7]. In the actual process of image processing, update is an accompanying operation, which calculates the optimization of the algorithm in time and the accuracy of processing in the steps. The method adopted by SMO is to first calculate its constraint problem, and then solve the original minimization problem with constraint after calculating the constraint. Simple enumeration: use A1 for the first multiplier and A2 for the second multiplier. Since there are fewer multipliers, it is easier to express the constraint relationship among multipliers. The constraint situation is as follows, as shown in Fig. 1. It can be seen from Fig. 1 that due to the existence of boundary constraints, the required multipliers are all inside the box shown in Fig. 2. Combined with the existence of linear equality constraints, these multipliers are located on the diagonal, and the purpose of SMO algorithm is to find the existence of an optimal solution on these line segments. To better reflect the algorithm, first, when the calculation results show that the two multipliers are unequal, the boundary constraint relation of the line segment will be as follows: L = max(0, a2 − a1 ), H = min(C, C + a2 − a1 )
(1)
104
X. Zhu
Fig. 1. Boundary constraints
If the calculation results show that the two multipliers are equal, the boundary constraint relation of the corresponding line segment will be as follows: L = max(0, a2 + a1 − C), H = min(C, a2 + a1 )
(2)
Because of the change of boundary constraint relation, the relation function originally applied to the diagonal is no longer applicable, and the corresponding relation function is changed as shown in the following formula: → − → − → − x2 , → x1 , → (3) x1 + K − x2 − 2K − x2 η=K − x1 , → In general, any boundary function is represented by an explicit objective function, which indicates that the direction constrained by the linear equation will have a minimum value and a maximum value only greater than 0. In the case described above, the minimum value of the function is calculated using the following function: α2new = α2 +
y2 (E1 − E2 ) η
(4)
Here, if Ei = Ui-Yi, is used to represent the error of the ith training sample. Through this, the following formula can be used to represent the minimum value of the constrained condition: ⎧ ⎫ ⎨ H if α2new ≥ H ; ⎬ new,clipped α2 = α2new if L < α2new < H (5) ⎩ ⎭ L if α2new ≤ L Here, it is assumed that s = y1y2, then the first Lagrangian operator not calculated before can be represented by the calculation result of the second Lagrangian operator: new,clipped
α1new = a1 + s(α2 − α2
)
(6)
Of course, this is not absolute, and everything has its particularity. Suppose there is a non-positive objective function, for example, if Kernelk does not meet Mercer’s conditions, then there will be uncertainty in the calculated objective function, and it is
Driver Fatigue Detection Algorithm Based on SMO Algorithm
105
impossible to correctly judge what kind of existence the objective function is. Assuming that even if Kernelk meets the calculation conditions, the objective function with a function value of 0 cannot be excluded from the calculation, which means that no operator is obtained. For example, there is a training sample in which many vectors have the same input, so the above function value will be 0. In order for SMO to handle this kind of problem, more consideration is needed, and more formulas are added to satisfy the required conditions, such as the last of the following formula: → − → − (7) x1 , → x1 − sα2 K − x1 , → x2 f1 = y1 (E1 + b) − α1 K − → − → − f2 = y2 (E2 + b) − sα 1 K − x1 , → x2 − α2 K − x2 , → x2
(8)
L1 = α1 + s(α2 − L)
(9)
H1 = α1 + s(α2 − H )
(10)
1 → − → − → − 1 ψL = L1 f1 + Lf2 + L21 K − x1 , → x1 + L2 K − x2 , → x2 + sLL1 K − x1 , → x2 2 2
(11)
→ − → − → − 1 1 ψH = H1 f1 + Hf2 + H12 K − x1 , → x1 + H 2 K − x2 , → x2 + sHH1 K − x1 , → x2 2 2
(12)
SMO algorithm selection strategy for the whole training set to outer loop to achieve the objective of the global search, calculated by this method can clearly does not meet the conditions for the number of samples, according to the calculated does not meet the KKT conditions of sample size to judge whether need to be optimized algorithm, if not, do not need to be optimized, the sample itself is the best, otherwise need to optimize these does not meet the conditions of operator. In this way, all samples are optimized to meet the standard of KKT condition. After the first multiplier is selected, the second multiplier is selected. For the second multiplier, the SMO will select the one with the smallest function value as the second multiplier, so that optimization from the smallest can improve the efficiency of optimization. If the condition is still not satisfied after optimization, SMO is adopted to search all non-boundary samples, so that the objective function value obtained from the multiplier is the minimum. If it still fails, SMO search is used to cover the entire training set, to better find the multiplier that meets the conditions, that is, to find the best classification. 2.3 Training Test Based on the Opening and Closing Degree of Human Eyes Positive and negative sample data sets need to be constructed during training. This paper is to establish an eye state classifier, the eyes have two states, one is open state, one is closed state. It is the closed eye that is being tested. The training method adopted this time is to realize the prepared grayscale information of the left and right eyes. 70% of the grayscale information is used as training data, and the remaining 30% is used as training test data (Fig. 3).
106
X. Zhu
Fig. 2. Training library of closed eye state
Fig. 3. Training library of open eye state
Through the above two training libraries, the training mode of the model is run. At this time, the program will successively open and close two different training libraries to train the open and closed states of human eyes respectively, and finally test the trained trainer.
3 Simulation Experiment Through the training of the SVM trainer, the state of human eyes can be judged by the grayscale image of the trained human eye information state. Next, the real-time images are collected and the human eye information in the picture is transmitted to the SVM tester. By comparing with the results of the previous sample training, the real-time human eye information state is fed back. Simulation experiment of computer performance parameters for: IntelI7CPU, DDR38GRAM, win1064bits.The length of the video footage was 2S-4s, the frame rate of the video was 25fps, and the resolution of the video was 720 × 480.Through the training of SVM trainer, it is now possible to accurately locate the position of human
Driver Fatigue Detection Algorithm Based on SMO Algorithm
107
eyes and compare it with the state database of human eyes to judge the closed state of human eyes. As shown in Fig. 5 below, the red circle shows the positioning information of human eyes. The video window on the right side of the figure shows the real-time information of human eye state, which can be divided into left eye state and right eye state (Fig. 4).
Fig. 4. The open state of human eyes
Fig. 5. Fatigue state feedback
After the above state detection of human eyes when they are open, the fatigue state detection is carried out to see whether the correct feedback of human eyes can be made within a predetermined time, and predetermined prompt measures are taken for people
108
X. Zhu
in a fatigued state. The prompt measures are system prompt tone and pop-up prompt. The test results are shown in Fig. 6: Figures 7, 8, 9 and Fig. 10 shows the performance comparison results of fatigue state detection compared with the methods in literature [1]. First, from the perspective of training, the method in this paper is significantly higher than the method in literature [1], which is more than 10% higher. In this paper, the model size is reduced in the process of model testing, that is, the number of parameters that can be learned in the model is reduced. In the iteration process, the performance of the training data is compared, and it can be stopped when the threshold value is reached to prevent over-optimization of the data. In this way, the accuracy of state classification has been significantly improved.
Fig. 6. Training loss and verification loss of reference [1]
Fig. 7. Training accuracy and verification accuracy of reference [1]
Driver Fatigue Detection Algorithm Based on SMO Algorithm
109
Fig. 8. Training loss and verification loss of this paper
Fig. 9. Training accuracy and verification accuracy of this paper
4 Conclusion To reduce the occurrence of more traffic accidents, the fatigue detection system proposed in this paper is particularly important. With such a system, it can play a decisive role in the process of traffic protection in the future and provide better direction and technology for the future development of science and technology. A fatigue detection algorithm based on driver’s eye information is proposed in this paper. By detecting the behavior characteristics of the driver’s eye state information, taking advantage of the SOM algorithm training library, and through a certain amount of human eye information training, we can obtain more accurate feedback of human eye information, so as to complete the judgment of the driver’s fatigue state within a certain time.
References 1. Zhang, C.: Research on driving fatigue detection based on PERCLOS. Wuhan Zhicheng Times Cultural Development Co., Ltd. Proceedings of the 4th International Conference on Vehicle, Mechanical and Electrical Engineering (ICVMEE 2017), vol. 5. Wuhan Zhicheng Times Cultural Development Co., Ltd (2017)
110
X. Zhu
2. Jiang, M.: Seed-free solid-state growth of large lead-free piezoelectric single crystals: (Na1/2K1/2)NbO3. J. Am. Ceramic Soc. (10) (2015) 3. Zhang, H.: A new DDA model for kinematic analyses of rockslides on complex 3-D terrain. Bull. Eng. Geol. Environ. (2) (2018) 4. Cheng, Y., et al.: Evaluation and comparison of the Toxic Effects of MgO NPs, ZnO NPs, α-Fe2O3 NPs, γ-Fe2O3 NPs, and Fe3O4 NPs on the remediation for cadmium-related effects in wheat seedlings. Water, Air Soil Pollut.: Int. J. Environ. Pollut. (9) (2020) 5. Adam, F., Drew, D.: Field-based validations of a work-related fatigue model based on hours of work. Transp. Res. Part F: Psychol. Behav. (1) (2001) 6. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. (1) (1997) 7. Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. (2) (1995)
Image Mosaic Technology Based on Harris Corner Feature Xueya Liu(B) , Shaoshi Wu, and Dan Wang Chinese Flight Test Establishment, Xi’an, Shaanxi, China
Abstract. Image mosaic technology can obtain images with ultra-wide viewing angles without reducing the resolution. With the increasing demand of panoramic images, image mosaic technology has become the focus of computer vision research. This paper studies the image mosaic technology based on Harris corner feature. The extraction and description process of corner features and the process of image mosaic by calculating homography matrix are introduced in detail. Finally, the experiment is carried out on the existing data set and the actual shot data, and the experimental results can directly observe the process of scene splicing under different perspectives. The experimental results show that it has a good application effect in actual images, but in the case of great changes in the scene perspective, that is, to complete the mosaic of more than ten images, it is necessary to introduce the technology of boundary fusion. Keywords: Harris · Feature extraction · Homography matrix · Image registration · Image mosaic
1 Introduction Image Mosaic technology [1] is a large seamless image technology that integrates overlapping images from many same sensors or different perspectives under different sensors. When using an ordinary camera to obtain wide-field scene images, the resolution of the camera is fixed. The larger the scene, the lower the resolution of the image will be. Image Mosaic technology can obtain images with ultra-wide viewing angles without reducing the resolution. With the increasing demand for panoramic images, image Mosaic technology has increasingly become the research focus of computer vision [2]. It has been widely used in space detection, remote sensing image processing, medical image analysis, virtual reality technology, super resolution reconstruction and other fields [3, 4]. Image registration and image fusion are two key technologies of image Mosaic. Image registration is the basis of image fusion, and the calculation amount of image registration algorithm [5, 6] is generally very large, so the development of image Mosaic technology largely depends on the innovation of image registration technology. Adjacent image registration and Mosaic is the key to panorama generation technology. Research on image registration technology has a long history, and the main methods are as follows: The method based on the minimum brightness difference between two
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 111–120, 2022. https://doi.org/10.1007/978-981-16-6554-7_12
112
X. Liu et al.
images and the feature-based method [7]. The most commonly used method is the matching method of feature points based on feature template. This method allows the image to be spliced to have certain tilt and deformation, overcomes the problem that the axis must be consistent when the image is acquired, and allows some color difference between adjacent images. Panorama Mosaic mainly includes the following four steps: The image pre-stitching, that is, to determine the precise position of two adjacent images, lays a foundation for the search of feature points. The extraction of feature points means that the feature points to be matched are found after the basic coincidence position is determined [8]. Image matrix transformation and Mosaic, that is, according to the matching points to establish the image transformation matrix [9] and achieve image Mosaic. Finally, the smoothing of the image. Image splicing is the basic step to further do image understanding. The quality of the splicing effect directly affects the next work, so a good image splicing algorithm is very important. In this paper, Harri corner feature [10] is used to achieve image registration and image fusion, and the whole process of image Mosaic is completed.
2 Theory and Technology 2.1 Image Registration Due to the differences in Angle of view, shooting time, resolution, illumination intensity and sensor type, the images to be stitched often have differences in translation, rotation, scale change, perspective deformation, chromatic aberration, distortion and occlusion of moving objects. The purpose of registration is to find a transformation model which can best describe the mapping relationship between images to be stitched. At present, some commonly used spatial transformation models include translation transformation, rigid transformation, affine transformation and projection transformation. The above relational model (8-parameter model) can be described in the form of a matrix: ⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎤ m1 m2 m3 x x x ⎣ y ⎦ = ⎣ m4 m5 m6 ⎦⎣ y ⎦ = M ⎣ y ⎦ m7 m8 1 1 1 1 ⎡
(1)
The meaning of the parameters of the parameters in the matrix M, as shown in Table 1. Table 1. Meaning of each parameter in the projection transformation matrix M Parameter
Meaning
m3
Horizontal displacement
m6
Vertical displacement
m1 ,m2 ,m4 ,m5
Scale and rotation matrices
m7 ,m8
Horizontal and vertical scaling
Image Mosaic Technology Based on Harris Corner Feature
113
According to the meaning of each parameter and the characteristics of different transformation models, the parameter matrices of each transformation model can be obtained by simplifying the matrix M accordingly. The translation relation of the image is easy to detect and register, but the detection of rotation and scaling is more difficult. Many new image registration algorithms are designed for this feature. Feature-based image Mosaic is to use the obvious features of images to estimate the transformation between images, rather than using all the information of images. These obvious features such as image feature points (corner points or key points), contour and some invariant moments. The registration algorithm based on Harris feature is adopted in this experiment. 2.2 Harris Image Feature Extraction Harris corner is a typical feature, which comes from human’s perceptual judgment. Its core is to set a specific window and translate the local window on the image to find the area where the gray level has changed greatly, as shown in Fig. 1.
(a) Flat region
(b) Linear region
(c) Corner area
Fig. 1. Schematic diagram of window sliding
In image I (x, y), the autocorrelation function is used to calculate the similarity of (x, y) point B after it moves (x, y) in two directions: r(x, y) = w(xi , yi )[I (x, y) − I (xi + x, yi + y)]2 (2) w
In the formula: w(x, y)—window function. w(xi , yi )—The weight of each point in the window is a constant or Gaussian weighting function. As shown in Fig. 2:
114
X. Liu et al.
Fig. 2. Schematic diagram of window form (left: constant window, right: Gaussian function)
When the translation (x, y) is very small, the first-order approximation of I (xi + x, yi + y) is carried out according to Taylor expansion: (xi + x, yi + y) =I (xi , yi ) + Ix (xi , yi )x + Iy (xi , yi )y + O x2 , y2 ≈ I (xi , yi ) + Ix (xi , yi )x + Iy (xi , yi )y
(3)
Where Ix and Iy are the gradients of the image in the two directions of (x, y), then the autocorrelation function can be simplified into Formula (4).
2 w(xi , yi ) I (x, y) − I (xi + x, yi + y) r(x, y) = ≈
w
w
2 w(xi , yi ) Ix (xi , yi )x + Iy (xi , yi )y
x 2 = w(xi , yi ) Ix (xi , yi ) Iy (xi , yi ) y w ⎛ ⎞ w(xi , yi )Ix2 (xi , yi ) w(xi , yi )Ix (xi , yi )Iy (xi , yi )
w ⎠ = x y ⎝ w w(xi , yi )Ix (xi , yi )Iy (xi , yi ) w(xi , yi )Iy2 (xi , yi )
w
x = x y M (x, y) y
AC x = x y C B y
w
(4)
Let the two eigenvalues of matrix λ1 be λ2 and M (x, y) respectively. Then, the eigenvalue has the following relationship with the corner, line and plane in the image: (1) (1) Plane: the values of λ1 and λ2 are relatively small, and the values of λ1 ≈ λ2 and r(x, y) are relatively small in both directions of x, y; (2) Straight line: The difference between λ1 and λ2 is large enough to satisfy λ1 λ2 or λ1 λ2 . The value of r(x, y) is large in one direction and small in the other direction. (3) Corner point: the values of λ1 and λ2 are relatively large, and the values of λ1 ≈ λ2 and r(x, y) are relatively large in both directions of x, y; According to the above description, the eigenvalue of M (x, y) matrix can be calculated according to the calculation method of eigenvalue and then the judgment can be
Image Mosaic Technology Based on Harris Corner Feature
115
made. However, in the actual calculation, we do not need to get the specific value in most cases, but calculate the corner response value R by a simple method. R = detM − α(traceM )2
(5)
In the formula: detM = λ1 λ2 = AB − C 2 —The determinant of M (x, y); traceM = λ1 +λ2 = A + B—The trace of M (x, y); α—Constants, generally between 0.04 and 0.06. 2.3 Calculation of Homography Matrix Drawing on the ideas of RANSAC in image registration: Firstly, three matching point pairs are selected randomly, and the model parameters are estimated, and the errors of other matching points are calculated by the obtained model parameters. When the error is less than a given threshold, it indicates that the matching point supports the current model parameters. If more than 2/3 of the matching points in the candidate match points support the current model parameters, the selected 3 matching point pairs are considered reasonable. Then, the model parameters are recalcated as the final model parameters through the matching points of all the supporting parameters. Otherwise, another 3 pairs of matching points need to be randomly selected to estimate the model parameters, and the previous steps are repeated. The control points selected in the reference image correspond to the control points in the image to be registered. Let affine transformation matrix be: ⎤ ⎡ a00 a01 tx (6) H = ⎣ a10 a11 ty ⎦ 0 0 1 When solving affine transformation matrix, the main solution is vector sum. The solution formula is as follows: ⎡ ⎤−1 ⎡ ⎤ ⎡ ⎤−1 ⎡ ⎤ x1 y1 1 x1 y1 1 x1 y1 (7) p1 = ⎣ x2 y2 1 ⎦ ⎣ x2 ⎦p2 = ⎣ x2 y2 1 ⎦ ⎣ y2 ⎦ x3 y3 x3 y3 1 x3 y3 1 Solve the above two vectors and substitute them into the formula to obtain the affine transformation matrix H. The coordinates of the four vertices of the image to be registered in the original coordinate system are calculated to obtain the four vertices coordinates of the transformed image to be stitched in the reference image coordinate system. These four vertex coordinates are used in image fusion to determine the boundaries of the new image. ⎤ ⎤ ⎡ ⎡ 0 h2 0 h2 a00 a01 tx ⎢ ⎥ CP = ⎣ a10 a11 ty ⎦ × ⎣ 0 0 w2 w2 ⎦ (8) 0 0 1 1111
116
X. Liu et al.
3 Image Mosaic Process 3.1 Feature Point Capture First, extract Harris key points. Next, the Sobel operator is used to calculate the image brightness gradient in X and Y directions, and the Gaussian function of σ = 1.5 is used to smooth the gradient to reduce the influence of noise on brightness. It is easy to find that if we take the cumulative value of brightness in a small area, the cumulative value of moving the window up, down, left and right in the area where the image changes gently doesn’t change significantly. At the edge of the object, the change along the edge direction is not obvious; In the vicinity of key points, the slight movement of the window will strongly change the brightness cumulative value. 3.2 Adaptive Non-maximum Suppression Due to the many key points obtained in the previous step, direct calculation will lead to a large amount of computation and increase the error. The next step is to remove most of the key points, leaving only some distinctive points, and make the key points evenly distributed throughout the entire image. Adaptive non-maximal suppression (ANMS) method was used to select a specific number of key points. So the idea of ANMS is that you have a radius r, and you start at infinity. When r decreases continuously, other key points whose r value is less than the r value of the center point, which are kept within the radius r, are added into the queue. The search stops when the key points in the queue reach the preset value. 3.3 Description of Key Points Apply a moderate Gaussian blur to the image, centered on the key points, and take a 40x40 pixel area. Sample the region down to a size of 8x8 to produce a 64-dimensional vector. We’re going to normalize the vectors. Each key point is represented by a 64-dimensional vector, and a 500x64 eigenmatrix is obtained for each image. 3.4 Matching of Key Points Key points were matched using the Random Sample Consensus (RANSAC) algorithm. Using one image as a baseline, select 8 points at random from each image and find 8 matching points in another image. A homography is obtained by using 8 pairs of points. The remaining feature points in the reference map are projected to another image according to the homography transformation, and the number of matching points is counted. Repeat the above steps for 2000 times to obtain a homography with the most accurate pairs. At this point, the projection transformation relationship between the two images has been found.
Image Mosaic Technology Based on Harris Corner Feature
117
3.5 The Composition of New Images Before making the image projection, create a new blank canvas. Compare the upper, lower, left and right boundaries of the 2 dimensional coordinates of the two images after the projection, and select the maximum value of the boundary in each direction as the size of the new image. At the same time, the intersection region of the two images is calculated. In the cross region of the two images, two templates were made according to Cross Dissolve method, and the pixel values of the three channels decreased again within the interval (ascending).
4 Experiment 4.1 Image Mosaic of Existing Data Sets
Fig. 3. Final processing results of the first group of data
Fig. 4. Final processing results of the second group of data
It can be seen that for the three groups of data from a free perspective, the Mosaic effect is OK, the transformation does not cause too much deformation of the image, and the color brightness and saturation are well integrated, which is in line with the subjective feelings of human eyes (Figs. 3, 4 and 5).
118
X. Liu et al.
Fig. 5. Final processing results of the third group of data
4.2 Actual Shot Image
Fig. 6. Final processing results of the fourth group of data
Fig. 7. Final processing results of the fifth group of data
In the process of image Mosaic, the method of decreasing weight is used to fuse the overlapping area of images, and the overlapping area of two images is added according to
Image Mosaic Technology Based on Harris Corner Feature
119
Fig. 8. Final processing results of the sixth group of data
a certain weight. In the experiment, since the image is spliced after affine transformation, the image boundary part is filled with 0 pixels in the splicing results in the previous step, and there are black boundaries in the splicing results. For example, the rightmost boundary in Fig. 7 and Fig. 8 will be superimposed with the black boundary as the image in subsequent stitching. Therefore, when the Angle of view of the image to be stitched changes greatly and there is a large transformation, the seams in the stitching results are more obvious (Fig. 6).
5 Concludes Image Mosaic technology solves the contradiction between image perspective and image resolution and can obtain images with ultra-wide viewing Angle without reducing the resolution. With the increasing demand of panoramic images, image Mosaic technology has become the focus of computer vision research. In this paper, Harris feature combined with Sobel operator was used to complete the detection and extraction of image features. Adaptive non-maximal suppression (ANMS) method was used to select a specific number of key points to achieve adaptive non-maximum suppression. The Random Sample Consensus (RANSAC) algorithm was used to select matching points to complete the calculation of homography matrix. In the cross area of two images, two templates are made according to Cross Dissolve method to complete the edge processing of image Mosaic. It can be seen from the experimental results that image matching and Mosaic using Harris corner feature has a good application effect in actual captured images. However, in the case that the scene perspective changes greatly, that is, when more than ten images are spliced, boundary fusion technology should be introduced. Therefore, the next research of this paper will focus on the processing of the edge region under the large perspective scene Mosaic.
References 1. Brown, M., Szeliski, R., Winder, S.: Multi-image matching using multi-scale oriented patches. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, CVPR 2005, vol. 1, pp. 510–551. IEEE, 2005 2. Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2014) 3. Liebelt, J., Schmid, C.: Multi-view object class detection with a 3d geometric model. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1688–1695. IEEE (2010)
120
X. Liu et al.
4. Perrotton, X., Sturzel, M., Roux, M.: Implicit hierarchical boosting for multi-view object detection. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 958–965. IEEE (2010) 5. Magri, L., Fusiello, A.: Robust multiple model fitting with preference analysis and low-rank approximation. In: Proceedings of the British Machine Vision Conference, F (2015) 6. Patel, M.S.: Feature based multi-view image registration using SURF. In: International Symposium on Advanced Computing and Communication (2015) 7. Zhang, F., Liu, F.: Parallax-tolerant image stitching. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 958–965. IEEE (2014) 8. Chojnacki, W., Szpak, Z.L., Brooks, M.J., et al.: Enforcing consistency constraints in uncalibrated multiple homography estimation using latent variables. Mach. Vis. Appl. 26(2–3), 401–422 (2015) 9. Szpak, Z.L., Chojnacki, W., Eriksson, A., et al.: Sampson distance based joint estimation of multiple homographies with uncalibrated cameras ✩. Comput. Vis. Image Underst. 125(8), 200–213 (2014) 10. Harris C.: A Combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, F (1988)
Image Semantic Segmentation Based on Joint Normalization Jiexin Zheng1(B) , Taiwei Qiu2 , Lihong Chen3 , and Shengyang Liang1 1 School of Electronic and Control Engineering, Chang’an University, Xi’an 710064, China 2 Highway School, Chang’an University, Xi’an 710064, China 3 College of Electronic Engineering, South China Agricultural University, Guangzhou, China
Abstract. Image semantic segmentation is an important research direction in image processing, computer vision and deep learning. Semantic segmentation is to classify the image pixel by pixel, so that the original image is divided into semantic segmentation images with specific pixel marks, which is the most challenging in image processing. Based on DSC-JFP (depthwise separable convolutionjoint feature pyramid) model, ASPP model and auxiliary network are removed to improve the real-time performance of semantic segmentation. Combined with batch normalization and instance normalization, parallel batch and instance normalization (PBIN) and cascaded batch and instance normalization (CBIN) methods are proposed to improve the effect of semantic segmentation. The experimental results also show that the proposed method improves the real-time performance of semantic segmentation while ensuring the effect of semantic segmentation. Keywords: Deep learning · Semantic segmentation · Joint normalization
1 Introduction Image Semantic Segmentation, in easy-to-understand terms, is to allow the computer to perform regular segmentation based on the semantics in the image. “Semantics” here refers to the understanding of the content and meaning of the image. “Segmentation” refers to segmenting different objects in a given image from the perspective of pixels, and assigning a label to each pixel in the image [1]. Marking means to mark an attribute for each category, such as “horse”, “aircraft” and so on. Early research in the field of computer vision did not comprehend images at the pixel level according to the way humans perceive, but only stayed at finding elements such as edges or gradients. Later, the semantic segmentation technology gathered pixels belonging to the same category to realize the dense prediction of the image, thus broadening its application field. Since target detection is closely related to video analysis and image understanding, it has attracted a lot of research attention in recent years. Traditional target detection methods are based on artificially designed features and shallow trainable architecture. However, this structure combines multiple low-level image features from the target detector and scene classifier with the high-level context, and their performance tends to stagnate [2]. The rapid development of deep learning has introduced tools that can © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 121–127, 2022. https://doi.org/10.1007/978-981-16-6554-7_13
122
J. Zheng et al.
learn semantics, advanced, and more in-depth functions to solve the problems existing in the traditional architecture. These models behave differently in terms of network architecture, training strategies, and optimization functions.
2 Related Work In recent years, due to the rapid development of deep neural networks, image semantic segmentation has made rapid progress. The following are some achievements in the field of image semantic segmentation in recent years. FCN [3] is the first work of deep learning applied to image semantic segmentation tasks. It has made an important innovation based on the previous convolutional neural network: FCN replaced the last three fully connected layers of the CNN network with three convolutional layers. Thereby solving the problem of image segmentation at the semantic level. The proposal of FCN opened up ideas for scholars studying semantic segmentation. Inspired by FCN, a series of FCN-based networks were derived. Hypercolumns [4] is similar to FCN, also use jump connections to explore the context features of the middle layer for high-resolution prediction. PSPNet [5] adds a global pooling branch to extract contextual information. In the Dilated-Net paper, a convolution filter that can expand the receptive field is designed to obtain contextual information. In the Zoom-Out [6], a feed forward network using hand-made hierarchical context features is proposed. In order to better restore the information of the original image, the researchers have proposed an encoder-decoder net (EDN) [7]. Unlike FCNs, EDNs have a nearly symmetrical network structure. EDNs consist of two parts: the encoder uses continuous convolution and pooling to extract features, and the resolution of the feature map will gradually decrease. The decoder gradually restores the detailed information and resolution of the original image through up-sampling. There is usually some kind of connection between the encoder and the decoder. The encoding side of SegNet [8] and DeconvNet [9] remembers the position of the maximum value during pooling, and then fills in this value back to the original position when sampling on the decoding side, and fills in other positions with zeros. In order to make better use of the information in the middle layer, U-Net [10] connects the features of the encoding end to the corresponding decoding end through the jump connection layer. Therefore, for semantic segmentation, its implementation requirements are very high. The study of semantic segmentation is bound to bring major breakthroughs in computer vision. At present, the fields of automatic driving, biomedicine, and remote sensing road detection are still undergoing continuous trials to improve the effect of semantic segmentation to make breakthroughs in its application. Therefore, the research and development of semantic segmentation has important practical significance.
3 Method 3.1 Network Structure The idea of coding structure proposed in this paper comes from DSC-JFP(depthwise separable convolution-joint feature pyramid) model. The difference is that there is no
Image Semantic Segmentation Based on Joint Normalization
123
joint low-level feature. This part directly adds decoding part by extracting global information, and removes ASPP model, as shown in Fig. 1. The output of the last three layers of ResNet101 are f1 , f2 and f3 , and the dimensions are [(h, w), 512], [1 / 2 (h, w), 1024] and [1 / 4 (h, w), 2048] respectively. The convolution layer of the coding structure is composed of convolution, normalization and ReLU. Among them, 3 × 3 convolution is separable hole convolution without dropout, while 1 × 1 convolution is added with 0.3 dropout. f1 features are extracted by 3 × 3 separable hole convolution, and the hole convolution rate is set to 2 as the first output. Then, f2 reduces the depth of the feature image from 1024 to 512 by 1 × 1 convolution, and extracts the feature by 3 × 3 separable hole convolution. The hole convolution rate is set to 4, and then bilinear interpolation is used to make the size of the feature image the same as f1 as the second output. Secondly, f3 reduces the depth of feature image from 2048 to 512 by 1 × 1 convolution, then extracts the feature by 3 × 3 separable cavity convolution. The void convolution rate is set to 8, and then it is processed by bilinear interpolation, so that its size is the same as f1 , and it is used as the third output. Finally, the output f4 is combined as the output of coding structure, and the output of f4 is [(h, w), 1536].
Fig. 1. The network structure
The network structure of this paper is shown in the figure above. Figure 1 shows the coding structure on the right and the decoding structure on the left. First, take f1 as lowlevel information, use global average pooling to extract global information, and apply bilinear interpolation to restore the feature image size to f1 . Secondly, it is combined with f4 , that is, the encoding output. The output feature image size is the same as f1 , and the depth is 2048. Then use 3 × 3 separable convolution processing to reduce the feature image output depth to 256, and apply 0.3 DropOut. Finally, the 1 × 1 convolution further reduces the feature image to 21, that is, the number of object categories in the Pascal VOC 2012 data set is used in this article, and the bilinear interpolation is used to restore the size of the input image. In the training phase, the SoftMax CrossEntropy function is used as the loss function.
124
J. Zheng et al.
3.2 Normalization Method The current normalization methods used in semantic segmentation networks are all BN [11]. BN is to normalize each channel of a batch of images. Too small a BatchSize setting does not make much sense, and it is widely used in style transfer tasks. The normalization method is IN [12], which is normalized on each image and is not affected by batchsize and channel. The two methods are different.
(a) BN
(b) IN
(c) PBIN
Fig. 2. Diagram of BN, IN and PBIN
Among them, the definition of PBIN method is shown in the following formula, BN and IN equally divide the characteristic channel for normalization. ⎧ ⎨ yBN xiCi C1 ∈ 1, C2 yPBIN = ⎩ yIN xC2 C2 ∈ C + 1, C i 2 Where C represents the number of channels, which is divided into two independent parts, C1 is the input of BN, half of the channel characteristics are BN, C2 is the input of IN, and the other half of the channel characteristics are IN. After the combination, yPBIN is obtained, which means PBIN result.
4 Experiment 4.1 Dataset The dataset used in this article is derived from the Pascal VOC 2012 dataset [13]. There are two types. The first one includes 1464 training images, 1446 verification images and 1456 test images; the second enhanced dataset includes Pascal boundary detection data. The set is expanded to include 10582 training images, 1446 verification images, and 1456 test images. The test set does not publish true value images (ie GT images). The image resolution of the dataset varies from 300 to 500. The label image includes a background category and a total of 21 different categories. Different colors are used to represent objects of different categories. The label images of the training set and the validation set are published. Perform network training on the training set, perform index evaluation on the validation set, and visually compare the semantic segmentation results on the test set.
Image Semantic Segmentation Based on Joint Normalization
125
4.2 Analysis When using the Pascal VOC 2012 dataset training model for visualization experiments, the method of this article is used to generate the semantic segmentation results, as shown in Fig. 3, where (a) is the original image, (b) is the result image of the BN method, and (c) is the result of the IN method. Figure (d) is the result figure of the PBIN method.
Fig. 3. Visualization results of Pascal VOC 2012 dataset
It can be found that for these types of objects, the method in this paper can basically segment the main target, and the overall effect of the PBIN method is relatively good. As shown in the figure, in the segmentation of bird, the segmentation of PBIN method is the closest to the contour of the original image. Under the same training set, the mIoU value of the PBIN method in this chapter is 81.24%, which is better than other methods, indicating that the joint normalization method in this paper has certain advantages.
126
J. Zheng et al. Table 1. Comparison of mIoU indicators of related methods
Methods
mIoU(%)
Skeleton network
LadderDenseNet
78.01
DenseNet
MsNet-4
75.80
ResNet101
DeeplabV3
78.51
ResNet101
DUpsamling
79.67
Xception
Ours + BN
78.78
ResNet101
Ours + IN
79.13
ResNet101
Ours + PBIN
81.24
ResNet101
5 Conclusion This paper draws on the DSC-JFP(depthwise separable convolution-joint feature pyramid) model, removes the ASPP model, improves the encoding and decoding structure, combines the two normalization methods of BN and IN, and proposes an image semantic segmentation method based on joint normalization, which combines low-level global features and high Hierarchical features, and then restore the size of the feature image, further remove the auxiliary network, apply the SoftMax CrossEntropy function to feedback the loss value, and perform network learning and training. The experimental results show that the joint normalization method can improve the effect of semantic segmentation. In the mIoU index, the PBIN method is slightly higher than the BN and IN methods. On the whole, this paper also definitely improves the semantic segmentation effect of images with unclear illumination and complex details, but there is still a lot of room for improvement.
References 1. Yan, J., Zhong, Y., Fang, Y., Wang, Z., Ma, K.: Exposing semantic segmentation failures via maximum discrepancy competition. Int. J. Comput. Vis. 129(5), 1768–1786 (2021). https:// doi.org/10.1007/s11263-021-01450-2 2. Budak, U., Çıbuk, M., Cömert, Z., Sengür, ¸ A.: Efficient COVID-19 segmentation from CT slices exploiting semantic segmentation with integrated attention mechanism. J. Digit. Imaging 5, 1 (2021). https://doi.org/10.1007/s10278-021-00434-5 3. Zhang, J., Xiaoli, D., Xie, Y., Jianjia, D., Fuyong, H., Zeyu, Z.: A semantic segmentation method for buffer layer defect detection in high voltage cable. In: E3S Web of Conferences, vol. 233 (2021) 4. Jin, Z., Zhang, Z., Ott, J., Gu, G.X.: Precise localization and semantic segmentation detection of printing conditions in fused filament fabrication technologies using machine learning. Addit. Manufact. 37, 101696 (2021) 5. Kose, K., et al.: Segmentation of cellular patterns in confocal images of melanocytic lesions in vivo via a multiscale encoder-decoder network (MED-Net). Med. Image Anal. 67, 101841 (2021)
Image Semantic Segmentation Based on Joint Normalization
127
6. Liu, Y., Sun, M., Lu, G.: Medical image semantic segmentation algorithm based on machine learning. Int. J. Educ. Manage. 5(4) (2020) 7. Mamoon, S., Manzoor, M.A., Zhang, F., Zakir, A., Lu, J.F.: SPSSNet: a real-time network for image semantic segmentation. Front. Inf. Technol. Electron. Eng. 21(12), 1770 (2020) 8. Maxwell Aaron E., et al.: Semantic segmentation deep learning for extracting surface mine extents from historic topographic maps. Remote Sens. 12(24) (2020) 9. Huang, L., He, M., Tan, C., Jiang Du, Li, G., Yu, H.: Jointly network image processing: multi-task image semantic segmentation of indoor scene based on CNN. IET Image Process. 14(15) 2020 10. Strohmann, T., et al.: Semantic segmentation of synchrotron tomography of multiphase Al-Si alloys using a convolutional neural network with a pixel-wise weighted loss function. Sci. Rep. 9(1), 1-9 (2019) 11. Strohmann, T., et al.: Semantic segmentation of synchrotron tomography of multiphase Al-Si alloys using a convolutional neural network with a pixel-wise weighted loss function. Sci. Rep. 9(4). 12. Wu, J.-M., Liu, Y.-C., Chang, D.T.H.: SigUNet: signal peptide recognition based on semantic segmentation. BMC Bioinform. 20(3) (2019) 13. Zhang, Z., Zhang, B.: Deep feature attention and aggregation for real-time head semantic segmentation. AEIC academic exchange information center, Asia-Pacific institute of innovation and economics. In: Proceedings of International Conference on AI and Big Data Application (AIBDA 2019).AEIC Academic Exchange Information Center, Asia-Pacific Institute of Innovation and Economics: International Conference on Humanities and Social Science Research, vol. 5 (2019)
DeepINN: Identifying Influential Nodes Based on Deep Learning Method Wei Zhang and Jing Yang(B) College of Computer Science and Technology, Harbin Engineering University, Harbin, China [email protected]
Abstract. Because of the lack of useful structural information, node identification methods based on node ranking method suffer from the low resolution problem and overlapping problem. This prevents these methods from getting a wider spreading scale. To solve this problem, we propose a framework based on network embedding method to identify influential nodes. First, we propose a node similarity measurement method based on network embedding methods. This measurement can distinguish nodes by capturing comprehensive information and improve the resolution. Based on this approach, we present a node sampling algorithm based on community structure. This strategy can select out nodes preliminarily and deal with the overlapping problem from the macro view. Then, a node screening strategy is proposed to select influential nodes in each community from the micro view. This strategy can further solve the overlapping problem. Finally, we can get a set of influential nodes by combining these three approaches together. Experimental results on CiteSeer and Blogcatalog dataset show that the performance of our method is better than IMSN. Our method can expand the spreading size. Keywords: Low resolution problem · Overlapping problem · Community structure · Node screening strategy · Network embedding
1 Introduction A tremendous amount of real-life activities can be described as the spreading process on the complex networks, such as disease spreading behavior among people, information diffusion through social networks, electricity transmission between buildings. Identifying a set of influential spreaders is a vital step to monitor the spreading process. It can help us to contain infectious disease [1], enhance the effect of marketing strategy [2], prevent paralysis of the power system [3] and control the public opinion [4]. Because of the wide application prospects, the node identification has received considerable attention recently. With unique fast and efficient advantages, methods based on node ranking become the mainstream of node identification. The first task of these methods is to distinguish which node is more influential. Degree [5] is good at identifying hub nodes. However, it suffers from the low resolution problem. Some scholars [6–9] take other information into account to solve this problem. Hampered by a lack of multiple information, the resolution ability is still limited. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 128–137, 2022. https://doi.org/10.1007/978-981-16-6554-7_14
DeepINN: Identifying Influential Nodes
129
The second task of this area is to expand the spreading scale by screening nodes. Some researchers [9–12] hold the view that the influence overlapping sphere only exists in the neighborhood of individuals. Hence, they select nodes with their neighbor structure. These methods deal with the overlapping problem from one simple perspective. Thus, it is impossible for these methods to avoid the overlapping problem entirely. What is worse, they are still plagued by the low resolution. In summary, there two main problems in the existing methods based on node ranking: (1) low resolution problem because of the lack of information; (2) influence overlapping problem. Based on the above facts, we propose a framework based on network embedding method to identify a set of influential nodes, called DeepINN for short. Network embedding method can capture the comprehensive structural features efficiently. With the help of this technique, we introduce more useful information to distinguish nodes. Selecting out nodes from macro and micro views with this method, we can deal with the overlapping problem and improve the resolution further. The main contributions of this paper are as follows. (1) Proposed a node similarity measurement method based on network embedding methods to improve the resolution based on comprehensive information. (2) Presented a sampling algorithm based on community structure to select out nodes from the macro view. (3) Proposed a node screening strategy to select influential nodes in each community from the micro view to further solve the overlapping problem.
2 Related Work 2.1 Node Identification Method Based on Node Ranking Method To identify influential nodes in complex networks, some researchers focus on the local structural feature to measure the node influence. Degree [5] regards the number of neighbors as the influence of nodes. High efficiency makes it be one of the most popular methods. Unfortunately, there are too many nodes with the same degree value in the networks. Therefore, it suffers from the low resolution. In order to distinguish the nodes, scholars try to introduce more features to measure the influence. H-index uses [8] the degree value of neighbors to select important nodes. Grassi et al. [13] identify the head of crime based on different Betweenness measures. IC ranking method [14] combines the node location with the iteration information of k-shell to distinguish nodes. Although, some achievements have been made. The low resolution problem still bothers scholars. These methods have good performance to find influential individuals. Nevertheless, we cannot use them directly to obtain the wide spreading scale. Because they do not consider the overlap sphere between nodes. In order to get large spreading size, VoteRank [9] selects nodes with local structural information by simulating voting activity in real life. Essentially, it depends on the degree value to measure the node influence. Hence, it has also the low resolution problem. Meanwhile, the overlapping problem is not fully considered in its node selection strategy. To measure the node influence accurately, IMSN [10] fuses Degree, k-shell and
130
W. Zhang and J. Yang
information entropy to rank nodes. Then, it reduces the overlapping area by removing the neighbors with high node set similarity from the network. Because of simple structure information, this node screening strategy cannot distinguish neighbors further. As a result, the overlapping problem cannot be solved completely. 2.2 Network Embedding Method As we mentioned above, the previous node identification method try to integrate more structural features to distinguish and mine influential nodes. However, the mining ability of them is limited. The main reason is that the employed structural feature is not enough for different kinds of networks. In recent years, network embedding method is widely applied in many fields, such as social computing [15], clustering [16]. DeepWalk [17] is the first to use deep learning techniques for network embedding. It collects node path information with random walks and regards them as sentences to learn the representations. Considering the limitation of DeepWalk, LINE [18] learns the node representations with the one-hop and twohop area of nodes. It tries to use first and second order to stand for the node structural feature. Node2vec [19] combines depth-first strategy with breadth-first search strategy to integrate local path information with global path information. Because these methods can get rid of the relational matrix, the running time of them is short. With the help of them, preserving the local and global structural features of node in limited timeframes is possible for node identification method.
3 DeepINN 3.1 Node Similarity Based on Network Embedding Node similarity is used to remove neighbor nodes of seed node from the sample space. This way, we can avoid two adjacent nodes selecting as seeds at the same time. It is a common way to reduce the overlapping sphere between two influential nodes. The traditional node similarity calculates similarity depending on the number of common neighbors between nodes. Common neighbor is a kind of local structural feature. This feature just considers the one-hop area of nodes. There are many nodes with same common neighbors in the networks. Therefore, the low resolution is a problem for traditional node similarity measurement.
DeepINN: Identifying Influential Nodes
131
In order to distinguish the similarity between nodes, we use node representations to represent the node structural feature. Based on the network embedding methods, we can get the node representations easily. By this way, we can get comprehensive node features. Then, we calculate the cosine similarity between different node representations to improve the resolution. The formula is as follows: d f (u)i f (v)i f (u) · f (v) = i=1 cos(f (u), f (v)) = |f (u)||f (v)| d d 2 2 i=1 f (u)i i=1 f (v)i
(1)
In Eq. (1), f (u), f (v) denotes the node embedding vectors of node u, v respectively. 3.2 Node Sampling Algorithm Based on Community Structure To reduce the overlapping sphere between seed node, we use the influence of communities to search important node. At the macro level, community is kind of node. If a community is powerful, the members of this community are more influential than nodes in the other communities. Hence, the community detection method, Louvian [20] is applied in this part to get the node community attribute. Then, we coarsen the original network based on the community structure. We regard members of the community as a new node and take the relationship between communities as the new links between the new nodes. Finally, we can get a new network with fewer nodes. The overlapping problem still exists between communities. To identify influential community and get large spreading size, we try to avoid this problem by node similarity. Based on the new network, we rank the node with Degree, and select them according to the node similarity. After screening, several nodes will be retained. The community members which they are mapping are the sample space of seed nodes. Algorithm 1 describes this process. As shown in Algorithm 1, we get new network G’ based on community structure in Step 1. In Step 2 we put this network into the arbitrary network embedding method to get the node representations, ϕG . Subsequently, we select influential nodes in a heuristic process based on Degree. When we select a node as seed, the node whom similarity value is greater than or equal to the threshold will be removed from the candidate list, Degree_list in Step 4–11. This selection process will stop until we get the expected quantity of nodes, or there is no node in Degree_list. Finally, we can get a set of influential nodes. These nodes correspond to the sample space of vital nodes in the original network.
132
W. Zhang and J. Yang
3.3 Node Screening Strategy Based on Influential Communities Based on Algorithm 1, we can get the community structure of the original network G and the influential community set, SC. These two kinds of information can guide the algorithm to choose the final seed set as well as get rid of the overlapping problem in micro view. The process of node selection strategy is shown in Algorithm 2. As shown in Algorithm 2, we put the original network G into the arbitrary network embedding method to get its representations, ϕG . In step 2, we rank the node similarity in descending order to conduct the relevant vector for each node. By counting the node frequency in relevant vectors, we can get the node frequency vector, FL. The detail of this procedure refers to the reference [21]. In Step 4–Step 16, we sample the node in each community in SC to form the candidate node set, FSS. For each member of the community, we do the same sampling measure as the Algorithm 1. We can obtain a candidate nodes set for each community in SC. Then, we merge these candidate node set to get the node set FSS. Finally, the most influential nodes are chosen as seed set based on the node frequency vector, FL in Step 18–19.
DeepINN: Identifying Influential Nodes
133
4 Experimental Results 4.1 Data Preparation Two real world networks are used to evaluate the performance of our methods. They are CiteSeer [22] and Blogcatalog [23]. In CiteSeer network, nodes represent authors of papers and edges represent reference relationships between papers. Blogcatalog is a social network from Blogcatalog website.
134
W. Zhang and J. Yang Table 1. Statistics of network datasets.
Network CiteSeer Blogcatalog
N
E
D
3312
4732
1.390
10
0.072
10312
333983
32.388
10
0.232
N, the number of nodes; E, the number of edges; , the average degree of networks; D, the diameter of networks; , is the average clustering coefficient.
In this network, bloggers are the vertexes. The social relationships between users are the edges. The information about these two datasets is shown in Table 1. 4.2 Parameter Settings of the Network Embedding Methods We combine the following network embedding methods with DeepINN to conduct the experiment: DeepWalk—DeepWalk utilize random walk to collect the node structural feature. It regards sequences of walk as sentences, and put them into Skip-gram model to learn the node representations. In the experiment, the number of random walks is set as 80. The walk length t is 10 and the size of window is 5. We set the representation size d as 128. Node2vec—Based on DeepWalk, Node2vec combine DFS with BFS to do the random walk. By this way, it can capture more structural features. In our experiment, the number of random walks, walk length, window size and the representation size are the same with the DeepWalk group. The in-out parameter and the return parameter are set as 1.0. LINE—LINE expands the neighborhood structure to the two-hop area of the node. Then it combines maps nodes to the vector space based on the density of node relationships. It combines first-order similarity with second-order similarity as the node structural features. In this experiment, the representation size d is set as 64, and the iteration time is 50. 4.3 Results Besides the methods we have mentioned in Sect. 4.2, IMSN is also applied to the experiment as the compared method. To verify the validity of our method, we use IC model [5] to examine the spreading size for different methods. The results are averaged over 1000 independent runs. And the active probability of IC model is set as 0.5. The results are shown in Fig. 1. Figure 1 shows the curve of spreading size with different size of seed node set. As shown in Fig. 1(1), with the growth of the seed node set, these four methods increase steadily for Citeseer network. When K is small, the spreading size of DeepINN(Node2vec), DeepINN(LINE) and DeepINN(DeepWalk) is larger than IMSN. While K becomes bigger, the situation changes. DeepINN(LINE) and DeepINN(DeepWalk) still maintain the original superiority status. There has been a decline in spreading size of DeepINN(Node2vec).
DeepINN: Identifying Influential Nodes
(1)Citeseer
135
(2)Blogcatalog
Fig. 1. The spreading size of different methods for two networks (K is the size of seed node set).
According to Table 1, the average degree of CiteSeer is 1.390 and the average clustering coefficient is 0.072. This indicates that the degree value of nodes in this network is low and close. IMSN synthesizes Degree, k-shell, and information entropy with the weight factor. Although it combines local structure with global structure, it decimates the effect of Degree and k-shell. Relative to methods based on a single local or global structural feature, IMSN is insensitive to the hub node or bridge node. Hence, when K is small, the spreading size of IMSN is smaller than the other three methods. DeepINN(DeepWalk) focus on the node surrounding environment. The feature it captures is a kind of local structural feature. Hence, it has advantage to find hub nodes with amount of neighbors. As a result, DeepINN(DeepWalk) gets the largest spreading size when K is small. DeepINN(Node2vec) and DeepINN(LINE) merge the global structure into algorithm based on DeepWalk. This makes them insensitive to the hub node. Hence, their spreading size is smaller than DeepINN(DeepWalk) when seed node set is small. These two methods contain more structural information than IMSN. Thus, their results are higher than IMSN. When K becomes bigger, the bridge nodes launch its effect. With the effect of k-shell, bridge node can be detected by IMSN easily. DeepINN(Node2vec) is insensitive to the bridge node. Therefore, when the scale of seed node set is large, the spreading size of DeepINN(Node2vec) is smaller than IMSN. As shown in Fig. 1(2), when K is in the interval [0, 10], all methods keep a high increasing rate in Blogcatalog network. By contrast, the increasing rate of IMSN is slower than the other three methods. With the increasing of K, the curve of spreading size goes to be flat for all the methods. The spreading size of our methods keep ahead all the time. DeepINN(DeepWalk), DeepINN(Node2vec) and DeepINN(LINE) use network embedding method to select nodes. Node representation that they get contains much more useful structural information. This leads to their high resolution. Hence, they get advantages when K is small. According to Table 1, we can know that the average degree of Blogcatalog is 32.388. This indicates that the hub node with amount of neighbors exist in this network. Our three methods can easily target the hub node and obtain wide spreading size. Hence, when k is small, their increasing rate is high. When K becomes
136
W. Zhang and J. Yang
larger, the degree value between nodes is close. This makes the increasing curve flat. The node selection strategy plays a vital role in this interval. Our method considers the overlapping sphere of communities as well as the same problem between nodes. However, IMSN just takes the problem in micro view. As a result, the spreading size of DeepINN(DeepWalk), DeepINN(Node2vec) and DeepINN(LINE) is larger than IMSN when K is big. Above all, our method can get larger spreading size. The node similarity based on network embedding can improve the resolution with more useful information. Our node sampling strategy and node selection strategy can reduce the overlapping sphere between nodes and communities. These three approaches help us to expand the spreading size.
5 Conclusion The lack of useful structural information makes node identification methods based on node ranking suffer from the resolution problem and overlapping problem. In this paper, we propose a node identification framework based on network embedding method, DeepINN to address this problem. DeepINN calculates the node similarity based on the network embedding method. It differentiates the nodes with the similar structure and improves the ranking result. Then, DeepINN divides the sampling space based on community structure. This strategy reliefs the overlapping problem from a macro view. Finally, DeepINN selects out seed node according the community structure from a micro view. It reduces the overlapping sphere between seed nodes. Based on the real network datasets, the experimental results have shown that the spreading size of DeepINN is larger than IMSN. In the future, we will consider emotion and behavior information based on node embedding methods to identify influential nodes. Acknowledgment. This paper is supported by the National Natural Science Foundation of China (No. 61672179, 61370083 and 61402126), the Natural Science Foundation of Heilongjiang (No. F2015030), the Distinguished Young Scholars of Heilongjiang (No. QC2016083), the Postdoctoral Science Foundation of Heilongjiang (No. LBH-Z14071).
References 1. Kostkova, P., Mano, V., Larson, H.J., Schulz, W.S.: Who is spreading rumours about vaccines? Influential user impact modelling in social networks. In: Proceedings of the 2017 International Conference on Digital Health, pp. 8–524. Association for Computing Machinery (2017) 2. Arrami, S., Oueslati, W., Akaichi, J.: Detection of opinion leaders in social networks: a survey. In: De Pietro, G., Gallo, L., Howlett, R.J., Jain, L.C. (eds.) KES-IIMSS 2017. SIST, vol. 76, pp. 362–370. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-59480-4_36 3. Resende, M.G.C.: Handbook of Optimization in Telecommunications. Optimization & Its Applications. Springer, Berlin (2008) 4. Dinh, T.N., Nguyen, D.T., Thai, M.T.: Cheap, easy, and massively effective viral marketing in social networks: truth or fiction? In: Proceedings of the 23rd ACM conference on Hypertext and social media, pp. 165–174. Association for Computing Machinery, USA (2012)
DeepINN: Identifying Influential Nodes
137
5. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pp. 137–146. ACM, New York, NY, USA (2003) 6. Chen, D., Lü, L., Shang, M.S., Zhang, Y.C., Zhou, T.: Identifying influential nodes in complex networks. Phys. A 391(4), 1777–1787 (2012) 7. Wen, T., Deng, Y.: Identification of influencers in complex networks by local information dimensionality. Inf. Sci. 512, 549–562 (2020) 8. Lü, L., Zhou, T., Zhang, Q.M., et al.: The H-index of a network node and its relation to degree and coreness. Nat. Commun. 7, 10168 (2016) 9. Zhang, J.X., Chen, D.B., Dong, Q., et al.: Identifying a set of influential spreaders in complex networks. Sci. Rep. 6(6), 27823 (2016) 10. Sheikhahmadi, A., Nematbakhsh, M.A.: Identification of multi-spreader users in social networks for viral marketing. J. Inf. Sci. 43(3), 412–423 (2017) 11. Alshahrani, M., Zhu, F., Sameh, A., et al.: Efficient algorithms based on centrality measures for identification of top-K influential users in social networks. Inf. Sci. 517, 88–107 (2020) 12. Li, W., Zhong, K., Wang, J., et al.: A dynamic algorithm based on cohesive entropy for influence maximization in social networks. Expert Syst. Appl. 169, 114207 (2020) 13. Grassi, R., Calderoni, F., Bianchi, M., Torriero, A.: Betweenness to assess leaders in criminal networks: new evidence using the dual projection approach. Soc. Netw. 56, 23–32 (2019) 14. Wang, Z., Du, C., Fan, J., Yan, X.: Ranking influential nodes in social networks based on node position and neighborhood. Neurocomputing 260, 466–477 (2017). S0925231217308354 15. Keikha, M.M., et al.: Community aware random walk for network embedding. Knowl.-Based Syst. 148, 47–54 (2018) 16. Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on Machine Learning, vol. 48 (2016) 17. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘14). Association for Computing Machinery, New York, NY, USA, pp. 701–710 (2014) 18. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web (WWW ‘15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp. 1067–1077 (2015) 19. Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16). Association for Computing Machinery, New York, NY, USA, pp. 855–864 (2016) 20. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008) 21. Keikha, M.M., Rahgozar, M., Asadpour, M., Abdollahi, M.F.: Influence maximization across heterogeneous interconnected networks based on deep learning. Expert Syst. Appl. 140, 112905 (2020) 22. Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., Eliassi-Rad, T.: Collective classification in network data. AI Mag. 29(3), 93–106 (2008) 23. Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘09). Association for Computing Machinery, New York, NY, USA, pp. 817–826 (2009)
Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet Guangyuan Zhong , Huiqi Zhao(B)
, and Gaoyuan Liu
Shandong University of Science and Technology, Tai’an 271019, Shandong, China [email protected]
Abstract. Semantic segmentation plays a very important role in computer vision. It can be used in many real-world applications, such as virtual reality and augmented reality, robotics and autopilot technology. The existing models have large amount of network parameters and high complexity, and can not fully extract the context information of the image. In order to solve the problem of high complexity and poor real-time performance of DenseNet, the backbone network of DenseAspp model, this paper proposes an image semantic segmentation method based on Shufflenetv2. The lightweight convolutional neural network Shufflenet-v2 is used to replace DenseNet as the backbone network of the segmentation model to extract features, which effectively reduces the amount of parameters and calculation of the model and improves the real-time performance of the segmentation algorithm. SkNet, a selective convolution kernel mechanism, enables each neuron to adaptively select the size of receptive field according to the multi-scale information of input features, thus improving the segmentation accuracy. Keywords: Semantic segmentation · SkNet · Shufflenet-v2 · DenseASPP
1 Introduction With the development of Internet and camera technology, image has become the carrier of a large number of information, which contains a large number of interesting objects and redundant information. In order to reduce human resources and realize automation, it is a hot topic to use computer to help human analysis and understanding images. Image semantic segmentation can separate people’s interested goals from the background, which is one of the key technologies to realize scene understanding, which is of great significance to people’s life and social development. In deep learning, convolution network architecture, such as VGG-16, RESNET, Xception, DenseNet, can automatically extract the effective features of images through end-to-end training, which avoids the manual extraction of image features [1]. It can obtain higher accuracy than traditional methods in image classification, and is more and more widely used in image semantic segmentation tasks. The image semantic segmentation method based on CNN can separate the foreground and background in an image by training the full convolution neural network architecture, and can identify the object of interest and describe its boundary, which is more accurate than the traditional segmentation method. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 138–145, 2022. https://doi.org/10.1007/978-981-16-6554-7_15
Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet
139
Although FCN has greatly promoted the development of convolutional neural network in image semantic segmentation task, the current segmentation algorithm is obviously far from reaching the standard of commercial implementation [2]. Therefore, if we want to apply semantic segmentation technology to intelligent monitoring, AR glasses and other mobile machine vision systems, researchers need to further improve the segmentation accuracy, so this paper must further optimize the semantic segmentation algorithm.
2 Related Work At present, most deep learning techniques for semantic segmentation are based on full convolutional networks (FCN). In 2014, Long et al. [3] proposed FCN method for image semantic segmentation. Based on the vgg-16 network, the last three fully connected layers are replaced by the convolution layer, and the image of any size can be input and the corresponding feature map can be output. After that, the feature map is sampled and restored to the original image size, and dense pixel level labels are obtained. Badrinalayanan et al. [4] proposed SegNet based on FCN. The network is an encoder decoder structure, which is characterized by establishing a position index for the maximum value when the encoder part is pooled to the maximum. Each up sampling layer of the decoder corresponds to a maximum pooling layer in the encoder. In the process of up sampling, the position index is used to recover the position of pixels in the original image. Similarly, Hyeonwoo et al. [5] proposed the deconvnet model to improve the FCN, learning a deconvolution network which is completely symmetrical to the FCN network, and using the combination of up pooling and deconvolution to complete the up sampling, which can better reflect the details of the object and achieve better segmentation results. The U-Net network [7] for biomedical image processing directly adds central clipping and merging operations at the low and high levels, and realizes the up sampling of different levels of features, so that the up sampling layer can refer more to the information of the encoder down sampling middle layer, and better achieve the effect of restoring the details of biomedical images. In 2018, Maoke Yang [6] and others proposed the DenseAspp model based on DenseNet, which combines the ASPP in deep lab with the dense connection in densenet to form the denseaspp (DASPP) module, which has larger receptive field and denser sampling points, and is used in the street scene with high resolution. The backbone network of DASPP model is DenseNet. A large number of feature stitching operations take up extra space, so it takes a large amount of video memory and takes a long time in the actual training. When the expansion rate of convolution with holes d > 24, the segmentation accuracy of the network decreases slightly, which limits the use of convolution with holes to continue to expand the receptive field. The main way to improve the real-time performance is to reduce the training and prediction time by reducing the network parameters and simplifying the network structure, but how to ensure its effectiveness is also a problem to be solved. In this paper, based on CNN, focusing on the shortcomings of the segmentation model DenseAspp, targeted research work is carried out to improve the segmentation network in training time-consuming, accuracy needs to be further improved.
140
G. Zhong et al.
3 Method Starting from the full convolution neural network, many convolution neural networks used for classification are used as the backbone network of image semantic segmentation based on convolution neural network, which achieves better results than traditional image segmentation. However, different networks have different complexity, and the network with higher accuracy often has poor real-time performance. DenseNet is used in the backbone network of DenseAspp network, which has a large number of parameters, and the frequent feature merging operation consumes extra time and memory space. 3.1 The Network Structure To solve the above problems, this paper proposes a method of using Shufflenetv2 as the backbone of DenseAspp network to improve the DenseAspp network, so as to reduce the amount of network parameters and calculation, improve the real-time performance of the network and reduce the complexity of the network. Shuffletv2 [8, 9] is an effective and efficient convolutional neural network architecture. In order to reduce the amount of parameters and computation of image semantic
(a) Basic unit
(b) Down sampling unit Fig. 1. The network structure
Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet
141
segmentation network, reduce the complexity of the network and improve the real-time performance of the segmentation network, this chapter proposes an image semantic segmentation model based on Shuffletv2 based on DenseAspp model. In this model, the global pooling layer and full connection layer of Shufflenetv2 are removed, and the rest of the network is used as the backbone of the segmentation network. At this time, the network consists of a convolution layer and pooling layer, three feature extraction modules, and the output step is 32. Each feature extraction module is composed of a down sampling unit and several basic units. The down-sampling unit inputs the input features into two branches with different structures. Each branch contains a 3 × 3 depth separable convolution with a step size of 2 for down-sampling operation. Finally, the output feature graphs of the two branches are merged. The number of output feature channels of this unit is twice larger than that of the input. The structure is shown in Fig. 1. DWC represents depth separable convolution. At the same time, in order to continue to increase the receptive field, the expansion rate of the 3 × 3 convolution of the last module of the backbone network is set to 2. After the completion of the backbone network, the structure behind it follows the original DenseAspp model except the backbone network. The 3 × 3 convolutions in shufflenetv2 are all deeply separable convolutions, which can be used as the backbone of the segmentation network. Compared with DenseNet, it can greatly reduce the amount of parameters and computation. 3.2 SKNet In the visual cortex, the size of the local receptive field of neurons in the same area is different, so different sizes of spatial information can be obtained in the same processing stage. However, when designing convolution network, other properties of receptive field are not considered, such as adaptive adjustment of receptive field size (Fig. 2).
Fig. 2. SkNet
SkNet [10] is a non-linear method to fuse the features from different nuclei to achieve different size adjustment of receptive field. It includes three operations: split operation generates multiple channels with different nuclear sizes, which are related to different receptive field sizes of neurons. Fuse operations combine information from multiple channels to obtain a global and understandable representation for weight selection. The select operation fuses feature maps with different core sizes according to the selected weights.
142
G. Zhong et al.
By introducing SKNet into the network, the network can automatically adjust the proportion of different receptive fields in the training process, so as to improve the network performance. The network architecture proposed in this paper is shown in the following (Table 1). Table 1. Network architecture with SKNet Layer
Output size kernel size Stride Dilation rate
Input
512 × 512
Number Output channels 3
Conv1
256 × 256
3× 3
2
MaxPool
128 × 128
3 × 3
2
Stage2
64 × 64 64 × 64
Stage3
1
1
24
2 1
1 1
1 3
116
32 × 32 32 × 32
2 1
1 1
1 7
232
Stage4
32 × 32 32 × 32
1 1
2 1
1 3
464
DASPP
32 × 32
3× 3
1
(3, 6, 12, 18, 24) 1
784
Conv5
32 × 32
1×1
1
1
1
256
Conv6
32 × 32
1×1
1
1
1
SKNet Unsample 512 × 512
1
class_num
The 3 × 3 convolutions in shufflenetv2 are all deeply separable convolutions, which can be used as the backbone of the segmentation network. Compared with DenseNet, it can greatly reduce the amount of parameters and computation.
4 Experiment 4.1 Experimental Environment The hardware environment of the experiment includes Ge force GTX 1080 Ti graphics card and 64 g memory. The framework uses Python and tensorflow [11]. The experiment was carried out on Cityscapes data set. 4.2 Experimental Results and Analysis The model is trained on Cityscapes [12], and the best model is selected as the training result. The experimental results are shown in Table 2. Table 2 shows the accuracy of Bi SeNet [13], Deeplab [14] and the proposed method on cityscapes dataset, as well as the parameters of each model. The experimental results
Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet
143
Table 2. Comparison of experimental results of cityscapes data set Methods
mIoU
Parameter quantity
Bi SeNet
68.4%
5.8M
Deeplab
70.4%
262.1M
Ours
70.5%
1.5M
Ours +SKNet
71.3%
1.6M
show that the accuracy of this method is 70.5%, and the accuracy of this method combined with SKNet is 71.3%. Compared with other models, the parameters of this method are greatly reduced, which can greatly improve the calculation efficiency. The experimental results show that the method is effective (Fig. 3).
Fig. 3. Visualization of prediction results on Cityscapes
This method combined with SKNet for the first image of the truck segmentation results appear more complete, and the classification is correct, does not contain noise points. In the second picture, this method combined with SKNet reduces the situation that slender columns are cut off, and the sky and buildings are also segmented more accurately. In the third picture, the model without SKNet mistakenly classifies part of the terrain into plants, while the method in this paper combined with SKNet correctly
144
G. Zhong et al.
predicts most areas of the terrain. It shows that the improved method can reduce the classification errors of large and similar targets caused by the lack of context information, and the improved method is effective.
5 Conclusion In this paper, the research work is carried out to solve the problem that DenseAspp has a large number of parameters, consumes time and memory, and can not balance the accuracy and real-time. Based on the literature review, a semantic image segmentation method based on Shufflenetv2 is proposed. Firstly, this paper introduces the principle and improvement of Shufflenetv2. Secondly, it describes how to combine Shufflenetv2 with DenseAspp to deal with the task of semantic segmentation, and gives the overall network structure. SKNet is introduced to further improve the performance of the model. Finally, experiments are carried out on two standard datasets. The experimental results are compared with other real-time segmentation networks. The results show that the proposed method can effectively improve the real-time performance of the network and reduce the complexity of the model.
References 1. Huang, Y., Wang, Q., Jia, W., Lu, Y., Li, Y., He, X.: See more than once: Kernel-sharing atrous convolution for semantic segmentation. Neurocomput. 443, 26–34 (2021) 2. Zhang, Y., Sun, X., Dong, J., Chen, C., Lv, Q.: GPNet: gated pyramid network for semantic segmentation. Pattern Recogn. 115, 107940 (2021) 3. Sediqi, K.M., Lee, H.J.: A novel upsampling and context convolution for image semantic segmentation. Sensors 21(6), 2170 (2021) 4. Yan, J., Zhong, Y., Fang, Y., Wang, Z., Ma, K.: Exposing semantic segmentation failures via maximum discrepancy competition. Int. J. Comput. Vis. 129(5), 1768–1786 (2021). https:// doi.org/10.1007/s11263-021-01450-2 5. Xu, Z., Zhang, W., Zhang, T., Li, J.: HRCNet: high-resolution context extraction network for semantic segmentation of remote sensing images. Remote Sens. 13(1), 71 (2020) 6. Miyamoto, R., et al.: Visual navigation based on semantic segmentation using only a monocular camera as an external sensor: special issue on real world robot challenge in Tsukuba and Osaka. J. Robot. Mech. 32(6), 1137–1153 (2020) 7. Zhou, D., et al.: Robust building extraction for high spatial resolution remote sensing images with self-attention network. Sensors 20(24), 7241 (2020) 8. Wu, T., Tang, S., Zhang, R., Cao, J., Zhang, Y.: CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans. Image Process. Publ. IEEE Sig. Process. Soc. 30, 1169–1179 (2020) 9. Pozzer, S., Rezazadeh Azar, E., Dalla Rosa, F., Chamberlain Pravia, Z.M.: Semantic segmentation of defects in infrared thermographic images of highly damaged concrete structures. J. Perform. Constr. Facil. 35(1), 04020131 (2021) 10. Pemasiri, A., Nguyen, K., Sridharan, S., Fookes, C.: Multi-modal semantic image segmentation. Comput. Vis. Image Understanding 202, 103085 (2021) 11. Feng, J., Liu, Y.-S., Gong, L.: Junction-aware shape descriptor for 3D articulated models using local shape-radius variation. Sig. Process. 112, 4–16 (2015)
Lightweight Semantic Segmentation Convolutional Neural Network Based on SKNet
145
12. Dewanto, V., Aprinaldi, A., Ian, Z., Wisnu, J.: A novel knowledge-compatibility benchmarker for semantic segmentation. Int. J. Smart Sens. Intell. Syst. 8(2), (2015) 13. Gritti, T., Damkat, C., Monaci, G.: Semantic video scene segmentation and transfer. Comput. Vis. Image Underst. 122, 172–181 (2014) 14. Pei, D., Li, Z., Ji, R., Sun, F: Efficient semantic image segmentation with multi-class ranking prior. Comput. Vis. Image Underst. 120, 81–90 (2014)
The Research on Image Detection and Extraction Method Based on Yin and Yang Discrete Points Haini Zeng(B) School of Artificial Intelligence and Smart Manufacturing, Hechi University, Yizhou 546300, China [email protected]
Abstract. The low-level targets in the image analysis method have problems of instability and difficulty in the subsequent grouping. In this paper, the line shape, surface shape, and line shape with complex structure in the complex scene are taken as the research object. Weber’s theorem and Yin and Yang’s discrete point grouping calculation method are proposed to design a set of three-layer detection systems based on feature point, straight line segment, and centerline. The system can accurately extract the centerline of the stripe region and enhance the robustness of the scene graph by using the Yin-Yang discrete point sampling graph. Keywords: Shape detection · Yin-Yang discrete point calculation · Edge extraction
1 Introduction The human visual system [1, 2] (HVS) uses various visual features to extract and detect targets from noisy images. It can obtain sufficient visual elements from a minimal number of learning samples to complete corresponding visual tasks. HVS is very good at quickly detecting and extracting shapes from complex scenes. According to the different stable features used in target detection, static target detection methods can be divided into four categories based on gray-scale features [3], shape features [4], texture features [5], and local features [6]. The detection based on gray-scale features is greatly affected by light, background, and noise. According to the boundary or region of a single target obtained by edge extraction and the prior knowledge of this type of target, the detection based on shape features requires direct detection and location. However, in many complex scenes, the regions or edges extracted from the image of the same target have random irregular changes, so the prior knowledge and the generalization ability of the shape template are restricted. Aiming at the problem of instability of the low-level features of the target caused by the interference of complex scenes or the change of the target itself in the image shape detection and extraction, this paper adopts the discrete point sampling graph as the grouping object. It proposes an image analysis method of Yin and Yang discrete points that can quickly detect and extract the shape. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 146–152, 2022. https://doi.org/10.1007/978-981-16-6554-7_16
The Research on Image Detection and Extraction Method
147
2 Design of the Sampling Method of Yin and Yang Discrete Points 2.1 Yin and Yang Discrete Points Sampling Yin-Yang discrete point sampling is an intermediate expression, which not only enhances the hierarchical characteristics of the target, but also reduces the difficulty of later organization. This paper classifies uneven gray areas such as roads, image edges and corners as Yin sampling sets, and classifies uniform gray areas such as terraces as Yang sampling sets. According to Weber’s theorem [8], the difference in perceptual stimulus is proportional to the original stimulus, and the formula is as follows, I =K I
(1)
In the formula (1), I is the original stimulus value, I is the increment of the stimulus that causes the sense of difference, and the ratio of I and I is a constant K called the Weber coefficient. Based on the sampling model [8], the calculation result f (Pk ) of formula (2) is used to indicate the degree of perception of the current sampling point Pk . If it is greater than the Weber coefficient K, the perception is stronger, otherwise the perception is weak. Gj Pi ∈Cr CGri − Pj ∈CR−r CR−r f (Pk ) = (2) Gj Pj ∈CR−r
CR−r
Where, Pk is the current sampling point. Cr and CR represent the inside and outside of a concentric circle CR−r , and its radii are r and R, respectively. Pi represents. In the Yin/Yang sampling map, all the sampling points that can be sensed are set to 1, which is called the Yin/Yang discrete sampling point, otherwise it is set to zero. The specific classification is as follows, Yin discrete sampling point. 1 f (Pk ) > K (3) g(Pk ) = 0 f (Pk ) ≤ K Yang discrete sampling point. g(Pk ) =
1 f (Pk ) > K 0 f (Pk ) ≤ K
(4)
According to formula (3) and formula (4), a set of discrete points of Yin and Yang can be obtained to form a Yin and Yang sampling map. 2.2 Parameter Tuning of Discrete Point Sampling Method According to formula (3) and the Yin-Yang discrete point sampling diagram, there are three important parameters in the Yin-Yang discrete point sampling model, i.e., the fixed grid sampling radius r, the outer ring radius R and the Weber constant K.
148
H. Zeng
(1) Sampling radius r The sampling radius r affects the density between discrete points in the sampling map. As the sampling radius r becomes larger, the discrete points become sparser. The sampling radius also affects the ability of the sampling image to describe the details of the original image. As the sampling radius r decreases, the ability to describe the details of the original image becomes stronger, and the amount of calculation increases. In Figs. 1(b), 1(c) and 1(d), as the sampling radius r changes from 1 to 3, the number of sampling points decreases. Taking into account the actual picture size, this article takes r = 1 for all subsequent applications based on the Yin discrete point sampling map. (2) Outer ring radius R The radius R of the outer ring determines the strength of the strip area in the Yin discrete point sampling map. The larger the value of the radius R is, the place where the discrete points are generated will become farther from the real edge, and the strip area formed by them will become more prominent. Comparing Figs. 1(e) and 1(f), it can be seen that as the radius R increases, the bar-shaped area of the Yin sampling graph becomes more obvious. The larger the radius R is, the stronger the ability to suppress noise and texture interference is, and the longer the calculation time is. Through experiments, the algorithm in this paper takes R = 0.5.
Fig. 1. The influence of different parameters on the results of the Yin sampling diagram. (a) Original map of the overpass road; (b) r = 1, R = 0.5, K = 0.5; (c) r = 1, R = 0.5, K = 0.5; (d) r = 3, R = 0.5, K = 0.5; (e) r = 1, R = 0.1, K = 0.5; (f) r = 1, R = 1, K = 0.5; (g) r = 1, R = 0.5, K = 0.1; (h) r = 1, R = 0.5, K = 1.
The Research on Image Detection and Extraction Method
149
(3) Weber coefficient K The Weber coefficient K not only affects the ability of the Yin sample image to describe the details of the original image, but also has the ability to suppress noise interference. Increasing the value of K will weaken the ability to describe the original image in detail, but will enhance the ability to suppress noise interference. From the comparison of Fig. 1(g) and 1(h), it can be seen that the image detail information will be lost as the value of K increases. Through experiments, the Weber coefficient K is usually taken as 0.5 in the Yin sampling graph. If the image is too dark for the details to be distinguished, the K value can be reduced to enhance the details. If the image has a lot of noise and too much detail, the interference can be reduced by increasing the K value. 2.3 Analysis of Anti-noise Interference Ability The proposed Yin-Yang discrete point sampling model based on the different intensity of illumination changes has a stronger ability to suppress noise and texture interference than traditional edge extraction operators, and is more adaptable to illumination changes.
Fig. 2. Comparison of edge algorithm under noise annoyance and sampling results of Yin and Yang discrete points. (a) The original plan of the overpass road; (b) Canny edge algorithm; (c) Maximum class threshold algorithm; (d) Discrete point sampling algorithm.
Figure 2 shows the comparison between the edge algorithm under noise annoyance and the sampling results of Yin and Yang discrete points. Figure 2(b) is the use of the canny edge operator [5] to detect real weak edges and is not susceptible to noise interference. However, excessive attention to weak edges will make the pixels in this complex scene chaotic, which will cause the continuous point sequence composed of these pixels to not be completely consistent with the direction of the target edge. Figure 2(c) is the maximum class threshold method, which can better separate the background and the target, but the complex shape of the line and surface are processed separately. It is necessary to extract the frame from the line and the contour from the surface. And then they are combined together to form the detection target. In Fig. 2(d) the Yin-Yang discrete point sampling method expresses the target information in the image through the striped area gathered by discrete points, so that the target information in the image
150
H. Zeng
is enhanced in the form of strips, which is beneficial to suppress detail noise and texture interference. Then, the line and area information of the target image can be obtained by grouping these strip regions. Therefore, in terms of complexity, the discrete point sampling is much lower than the previous canny operator and the maximum class threshold segmentation method. 2.4 Constructing Independent Regions of Discrete Graphs Due to the existence of noise interference, Yin-Yang discrete point sampling often has some burrs and holes, so a filter operator is needed to filter the yin-yang sampling map. The principle of the independent area filtering proposed in this paper is to judge the 8 neighborhoods of each pixel in the sampling image. If there are more than 5 neighborhoods as 1, then the point is set to 1, otherwise it is set to 0. This operator can be used repeatedly to filter out burrs and holes so that the Yin and Yang sampling maps form independent regions.
3 Research on Shape Detection and Extraction Based on Yin And Yang Discrete Points The image analysis method of Yin and Yang discrete points sampling is to select the appropriate sampling radius, sampling operator and discrimination threshold according to the size, type and interference degree of the target to be detected. Then use the multiscale sampling operator to sample the image to obtain the Yin and Yang sample map, or filter the sample map several times as needed to enhance the saliency of the target to be detected. Finally, the Yin and Yang discrete points grouping method is used to construct the corresponding detection targets for the Yin and Yang sampling maps. The target detection algorithm is as follows: Step 1: Select the relevant parameters according to the target to be detected, and enter the formula (2), (3) and (4) to generate the Yin and Yang sampling map; Step 2: Traverse the image according to the sampling radius r = 1 to obtain the coordinates of the Yang discrete points; Step 3: Determine the target region of the sampling map of the Yang discrete points according to the connectivity of the Yang discrete points and the gray-scale range; Step 4: Traverse clockwise or counterclockwise non-8 connected points in the area of the Yang discrete point sampling map to obtain the contour of the area; Step 5: Go to step 6 after the whole image traversal is completed, otherwise return to step 2; Step 6: Remove the contour with a smaller area, and output the target shape.
4 Experimental Results The algorithm here is simulated and tested using the MATLAB 2019a. Discrete point sampling deals with the setting of discrete point sampling parameters and the setting of
The Research on Image Detection and Extraction Method
151
discrete point grouping parameters. The sampling radius of discrete point sampling, the discrimination threshold of the Yin and Yang sampling set and the number of filtering times of discrete point filtering can be parameterized, so that relevant parameters can be input according to the needs to obtain better detection effect.
Fig. 3. Yin and Yang discrete detection result graph. (a) Original image of road edge detection; (b) Discrete point Yin sampling map; (c) Original image of terraced field detection; (d) Discrete point Yang Sampling map.
In this paper, the actual road is used as an example to extract the target road. As shown in Fig. 3(a) and 3(b), comparing the road with the original image, the basic information detection is completed. Since the discrete point sampling image is obtained based on the degree of change of the sampling point and the gray value of the neighborhood, some areas with obvious gray level changes in the original image will also be misjudged into the Yin sampling image. This makes some misjudgment points appear in the Yin sampling image, but it can still realize the basic contour detection of the target to be detected. The result of the Yang sampling map of the terrace area detection is shown in Fig. 3(c) and 3(d). Compared with the original image, the basic information detection of the terrace area is completed. Since the discrete point sampling image is obtained based on the degree of change of the sampling point and the gray value of the neighborhood, some areas of the original image with insignificant gray level changes will also be misjudged into the Yang sampling image. This makes the detection effect of densely terraced areas not ideal, but it can still achieve the basic area detection of the target to be detected.
5 Conclusion In this paper, the edge is judged according to the degree of image gray change, and the method of Yin and Yang discrete points sampling is used to solve the problem of lowlevel feature instability in target detection. The Yin and Yang discrete points grouped in the N-dimensional direction are used to solve the difficult problem of subsequent
152
H. Zeng
grouping. Through the experimental simulation of the Yin and Yang discrete points sampling model, the algorithm in this paper realizes the rapid detection and extraction of line shape and surface shape. Acknowledgment. This work was supported by youth project School Level scientific research project Hechi University (No. 2018XJQN010).
References 1. Bosse, S., Maniry, D., Muller, K.R., et al.: Deep neural networks for no-reference and fullreference image quality assessment. IEEE Trans. 27(1), 206–219 (2018) 2. Wang, H., Fu, J., Lin, W., et al.: Image quality assessment based on local linear information and distortion-specific compensation. IEEE Trans. 26(2), 915–926 (2017) 3. Mahmood, A., Khan, S.: Correlation-coefficient-based fast template matching through partial elimination. Image Process. IEEE Trans. 21(4), 2099–2108 (2012) 4. Lin, Z., Davis, L.S.: Shape-based human detection and segmentation via hierarchical parttemplate matching. Pattern Anal. Mach. Intell. IEEE Trans. 32(4), 604–618 (2010) 5. Chiranjeevi, P., Sengupta, S.: New fuzzy texture features for robust detection of moving object . Sign. Process. Lett. IEEE 19(10), 603–606 (2012) 6. Lee, J., Roh, K., Wagner, D., et al.: Robust local feature extraction algorithm with visual cortex for object recognition. Electron. Lett. 47(19), 1075–1076 (2011) 7. Hu, H., Pang, L., Tian, D., et al.: Perception granular computing in visual haze- free task. Expert Syst. Appl. 41(6), 2729–2741 (2014) 8. Shen, J.: On the foundations of vision modeling: I. Weber’s law and Weberized TV restoration. Phys. Nonlinear Phenom. 175(3), 241–251 (2003) 9. Fan, Y.N., Lang, B.: An object shape-matching method using contour orientation feature. Comput. Technol. Dev. 28(4) (2018) 10. Zhu, Z.H., Wang, G.Y., Liu, J.G., Chen, Z.: Fast and robust 2D-shape extraction using discretepoint sampling and centerline grouping in complex images . Image Process. IEEE Trans. 22(12), 4762–4774 (2013) 11. Papari, G., Petkov, N.: Adaptive pseudo dilation for Gestalt edge grouping and contour detection .Image Process. IEEE Trans. 17(10), 1950–1962 (2008) 12. Zhu, Z.H., Yang, B.: Using feature discrete-point computing in handwritten documents line segmentation. Comput. Eng. Appl. 51(8), 148–152 (2015) 13. Zhu, Z.X.: Image analysis method based on yin-yang discrete point sampling model. Huazhong University of Science and Technology (2014) 14. Zhu, Z.X., Wang, G.Y.: A fast potential fault regions locating method used in inspecting freight cars. J. Comput. 9(5), 1266–1273 (2014)
Research on Short-Term Power Load Prediction Based on Deep Learning Lanlan Yin, Feng Mo(B) , Qiming Wu, and Shuiping Xiong Hechi University, Yizhou 546300, China
Abstract. Short-term power load prediction is not only a critical part of power system dispatching but also an essential task for power marketing, grid planning, and other management departments; deep learning is an artificial intelligence method that has gained extensive attention in recent years; this paper selected three models with the most typical signs of deep learning recurrent neural networks and studied their model performance for short-term power load predictions, including power load data preprocessing, feature selection of prediction models, determination of model parameters, and comparison of prediction characteristics between shallow and deep networks, to explore and prove the applicability of deep learning neural networks in short-term power load predictions, ultimately improving the accuracy of short-term load predictions. Keywords: Power load · Deep learning · Prediction
1 Introduction With the further deepening of the power system reform, the establishment and improvement of the power market, the orderly release of power generation plans and the formation of the main body of the power sales market, the power system will gradually change its power plans and production methods, making them work in a more independent and open market environment [1]. Therefore, power companies need more accurate load prediction results to design effective dispatching and operation plans to improve their market competitiveness. To improve market competitiveness and production efficiency, short-term load predictions have been increasingly concerned. Power companies have been looking for influential theories and methods. In recent years, with the rise of deep learning, breakthroughs have been made in artificial neural network algorithms. More advanced network structures and training algorithms have emerged. Probability prediction methods have also appeared in the field of load predictions. At the same time, accurate short-term load prediction results can generate substantial economic benefits. Relevant studies have found that 1% reduction in the average prediction error of shortterm power loads can reduce costs. The short-term prediction is usually measured in hours or days. It can predict the power load demand of a particular day, week, or month in the future. Indeed, it can also indicate the load demand of a specific day shortly. The short-term prediction can standardize and instruct daily operation and management activities of power companies, making daily power consumption plans more reasonable, © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 153–159, 2022. https://doi.org/10.1007/978-981-16-6554-7_17
154
L. Yin et al.
increasing the system’s load capacity, and improving the operating efficiency of power grids [2]. The short-term power load prediction will be significantly affected by weather, emergencies, and other factors; it can provide a foundation for the arrangement of daily power consumption, daily shutdown, and startup. It is the essential task necessary for the day-to-day operation of power grids. In the past thirty years, thousands of articles related to loading predictions at home and abroad have been published in various top journals or conferences. Most studies in the literature focus on deterministic short-term load predictions and long-term load predictions based on time point predictions. The so-called point prediction of power load predictions refers to a single period corresponding to the load prediction result of several months, weeks, and hours. With the development of China’s energy Internet technology and the construction of intelligent power grids, the application of power big data is becoming more and more extensive. Still, there is no accurate definition of big data in the power industry. In the context of China’s robust construction, a large number of smart meters and supporting test equipment have been put into use. Meanwhile, massive large-scale power data have been collected timely [3]. These data are closely integrated together to form big power data by running through various links of power production.
2 Deep Learning Models Deep learning is generated based on the gradual optimization theory of neural networks. It directly simulates human brains’ neural network learning process, which abstracts how human brains respond to external stimuli into learning networks. Combined with network topological structures, it realizes the processing of complex machine information. Currently, the most frequently used neural networks are as follows: artificial neural network algorithms (ANN), convolutional neural network algorithms (CNN), and recurrent neural network algorithms (RNN). According to the number of hidden layers in neural network structures, neural networks can be divided into two categories: external neural networks, namely, neural networks with only one hidden layer, such as ANN; deep neural networks, namely, neural networks containing over two hidden layers, such as CNN and RNN. At present, the BP neural network algorithm is a widely used artificial neural network algorithm. It is an error backpropagation theory proposed by James in 1982, which uses the weight of the connection matrix in multiple hidden layers with advanced continuous variation functions to represent the strength of the relationship between neurons. Different network processing methods also indicate diverse expression and processing of network information [4]. A single hidden layer perceptron could directly solve the problem of nonlinear neural networks. Convolutional neural networks are used to process specific neural networks, which have unique network topologies. They are popular algorithms in deep learning, which have been widely applied to process unstructured data such as images. Because original images can be directly inputted into these networks, the pre-processing of unstructured data is ignored, their processing rates and methods are faster. Traditional neural networks such as ANN and CNN all have the same problem. That is, information cannot be provided sustainably between new layers. For example,
Research on Short-Term Power Load Prediction
155
when you have a conversation, you must ensure the continuity of your speech. You complete the conversation through others’ questions or the subject. Talking to yourself cannot make others receive your information. Therefore, the brain will continue to think coherently about the conversation content, but this cannot be achieved by traditional neural networks, which is also an issue to be solved by current traditional neural networks. RNN Models.RNN can solve the problem of information persistence. It has a special loop structure to process sequence data. It will be found in different neuron unit components [5]. It is essential to share the parameters of cross-layer parts, which means that it should be expanded to complete example analyses in different forms. RNN runs through the process of multi-layer networks developing into cyclic networks. The output of the current RNN sequence is also related to the previous RNN sequence. RNN will calculate the current sequence output following the memory processing information of the previous RNN, that is, the nodes connected between hidden layers. The data received by the input unit of the hidden layer includes the output of the input layer and the output of the previous RNN moment. RNN can process sequences of any length. But long sequence processing will increase the time complexity. Therefore, in actual applications, it only processes the first several corresponding signals. Long Short-Term Memory Models (LSTM).In 1997, Hochreiter and Schmidhube proposed the long short-term memory model (LSTM) for the first time. It aimed to solve the problem that neural networks did not have a good memory function [6]. Since then, many scholars have optimized and improved the LSTM model, which has also received increasing attention and application. It has been popularized and widely used in practice. The present LSTM model structure diagram is roughly the same as the RNN in terms of the network structure. Their calculation methods of hidden layer parameters are different. The cell in LSTM is regarded as a unit structure, and each cell is a black box used to store the current state. There are three structures in LSTM: input gate (/), forget gate (/), output gate (o), as well as an activation vector c. The information flow in all cells is controlled through the input and output gates. The input gate must ensure continuous validity, which can directly enable signals to enter the cell, while the output gate must also ensure continuous validity, and at the same time reset the cell state. Bi-Directional Long Short-Term Memory Models (BLSTM).BLSTM will quickly calculate the power information repeatedly and then store the result value in the BLSTM database for later calculations [7]. After analyzing the above steps, it can be known that BLSTM will carry memory units in both positive and negative sides. Therefore, when the data is obtained, the memory unit vector will integrate the context information and store the result. BLSTM will also combine the memory units on both sides to form the BLSTM vector of the hidden layer and the output layer vector. In other words, the context information in all questions and answers is included in the calculations [8]. It is necessary to check the output label to identify the correct answers with the question sentences database. The decoding side of the BLSTM model is identified by integrating the hidden layer state of each last time slice. This method is relatively simple and easy to implement. However, it is possible to lose the semantic information of the sentence beginning, which reduces the accuracy of the question-and-answer system. To make the integration efficiency of the hidden layer state of the decoding side higher, the attention mechanism is
156
L. Yin et al.
introduced, which is calculated by adding the weights to obtain more accurate answers. The specific process is as follows. During the encoding stage, the semantics of the keyword feature vector is not lengthened, but the semantic representation capability of core phrases has been improved, thus achieving more accurate calculations after the decoding side is completed. Answers more consistent with questions can be obtained. The attention mechanism is introduced to the input end of the BLSTM model. One is the hidden layer state vector hS0:M at the encoding side of the BLSTM model, and the other is the hidden layer state hTl−1 at the decoding side. After the calculation of relevant attention weight vectors, the context vector z1 can be obtained. The calculation formula is as shown in the formula: βlm = bT tanh(WhTl−1 + UhSm ) exp(βlm ) αlm = M m=0 exp(βlm ) M zl = αlm hSm m=0
(1) (2) (3)
3 Prediction Results and Analysis of Power Short-Term Load Models In experiments, precision, recall, and F value are usually used to evaluate data. In order to verify the power short-term load prediction effects based on RNN (recursive or recurrent neural network) models, LSTM (long short-term memory) models, and BLSTM neural network models based on the Attention mechanism, this paper selected the pre-processed power load data of the public transformer in the first 7 days of August 2018 in a lowvoltage station in a certain area of Zhejiang Province for training. The following main prediction tests were carried out. The real data of the first 7 days of August is shown in Appendix I: (1) Predict the load of the first day of August 8, 2018 (i.e. August 8, 2018) and the load of the first week of August 8, 2018 (August 8, 2018–August 14, 2018); (2) Appendix I shows the rolling load predictions day by day (the training data increased day by day) and week by week (the training data increased week by week) starting from August 8. From the power short-term load prediction results for one day and one week, it can be known that the relative prediction error of RNN prediction models is very large, the average prediction accuracy is not high, and the relative error fluctuates greatly. The reason is that the gradient disappeared during the prediction process, resulting in a decline in prediction accuracy. In order to suppress the disappearance of the gradient, an LSTM prediction model was established based on the RNN prediction model. From the prediction results, compared with the RNN prediction model, the prediction relative error fluctuation of the LSTM prediction model is relatively stable. There is no
Research on Short-Term Power Load Prediction
157
decline in prediction accuracy after a period of time. The prediction performance is better than that of the RNN prediction model. However, the mean absolute percentage error (MAPE) was 6.09%, showing that the load prediction accuracy of the LSTM model is not very high. In order to further improve the performance of the LSTM prediction model, this paper introduced an Attention mechanism, and built an LSTM neural network prediction model based on the Attention mechanism. As for the four evaluation indicators, the mean square error (MSE) was 21.587%, the mean absolute error (MAE) was 1.285%, the mean absolute percentage error (MAPE) was 3.776%, and the root mean square error (RMSE) was 30.529%. The comparison of the prediction performance evaluation indicators between prediction effects for one day (August 24, 2018) and one week (August 24–August 30, 2018) of RNN (recursive or recurrent neural network) models, LSTM (long short-term memory)models, BLSTM models based on the Attention mechanism are shown in Table 1 and Table 2. Table 1. Comparison of prediction evaluation indicator results of three deep learning models for one day Model
MSE
MAE
MAPE
RMSE
RNN
0.176692388
0.026544631
0.088495182
0.249880771
LSTM
0.106050999
0.016089026
0.059750627
0.149978761
LSTM based on Attention mechanism
0.0841958
0.011966185
0.038652461
0.119070842
Table 2. Comparison of prediction evaluation indicator results of three deep learning models for one week Model
MSE
MAE
MAPE
RMSE
RNN
0.506995445
0.03033632
0.087610011
0.716999834
LSTM
0.362606871
0.021206124
0.060936665
0.512803555
LSTM based on Attention mechanism
0.215875499
0.012854648
0.037768861
0.305294058
It can be found from Table 1 that the indicator errors of the BLSTM neural network prediction model prediction based on the Attention mechanism for one day (August 24, 2018) with good effects are as follows: the mean square error (MSE) was 8.419%, 9.25% and 2.186% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively; the mean absolute error (MAE) was 1.196%, 1.458% and 0.412% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively; the mean absolute percentage error (MAPE) was 3.865%, 4.984% and 2.11% higher than that of the RNN (recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model,
158
L. Yin et al.
respectively; the root mean square error (RMSE) was 11.907%, 13.081% and 3.09% higher than that of the RNN (recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively; and 3.09%, respectively. It can be found from Table 2 that the indicator errors of the BLSTM neural network prediction model prediction based on the Attention mechanism for one week (August 24–August 30, 2018) with good effects are as follows: the mean square error (MSE) was 21.587%, 29.112% and 14.674% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively; the mean absolute error (MAE) was 1.285%, 1.749% and 0.836% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively; the mean absolute percentage error (MAPE) was 3.777%, 4.984% and 2.317% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long shortterm memory) prediction model, respectively; the root mean square error (RMSE)) was 30.529%, 41.171% and 20.751% higher than that of the RNN (recursive or recurrent neural network) prediction model and the LSTM (long short-term memory) prediction model, respectively. On the basis of the prediction results, the BLSTM neural network prediction model based on the Attention mechanism can better mine the key information in historical load data series and the feature relationships between time series data, which has significant impacts on the time series prediction problem. Obviously, deep learning recurrent neural networks have a good adaptability to the power load prediction problem. It is proved that the proposed BLSTM neural network prediction model based on the Attention mechanism has high prediction accuracy and strong stability.
4 Conclusion During the short-term load prediction of the power system, it is essential to better connect the actual power load characteristics that affect power load with the actual environmental characteristics of the actual power stations, and then analyze the internal laws of the power load development, further analyze the external factors that affect power short-term load predictions; it is also crucial to perform a more comprehensive analysis of power short-term load predictions so as to further improve the accuracy of power short-term load predictions. Fund Project. 2019 Guangxi Basic Research Ability Improvement Project for Young and Middle-Aged University Teachers (2019KY0640).
References 1. Zhang, L., Wang, H., Liu, W., Liu, M.: Short term load forecasting based on high dimensional data and deep learning. Sci. Technol. Bull. 37(03), 55–59+66 (2021) 2. Yang, Z., Ding, S., Ye, M., Li, J., Xue, S., Wu, H.: Short term load forecasting model based on variational mode decomposition and deep learning. Electrical measurement and instrumentation
Research on Short-Term Power Load Prediction
159
3. Li, X., Lian, D., Jing, C., Li, R., Xu, Z., Yue, S.: Short term power load forecasting based on wekpca and deep learning. Power Inf. Commun. Technol. 18(10), 34–41 (2020) 4. Yao, D., Wu, Y., Lei, L., Yan, S., Wu, W., Hong, D.: Short term load forecasting based on deep learning. Foreign Electron. Measure. Technol. 39(01), 44–48 (2020) 5. Ma, T., Wang, C., Peng, L., Guo, X., Fu, M.: Short term Load Forecasting Considering Demand Response and deep structure multi task learning. Electric. Measure. Instrum. 56(16), 50–60 (2019) 6. Chen, Z., Sun, L.: Short term power load forecasting method based on deep learning LSTM network. Electron. Technol. 47(01), 39–41 (2018) 7. Liang, C., Zhen, W., Gang, W.: Application of LSTM network in short-term power load forecasting under deep learning framework. Power Inf. Commun. Technol. 15(05), 8–11 (2017) 8. Dong, H., Cheng, P., Li, L.: Application of deep learning algorithm in short-term load forecasting of power system. Electric. Times (02), 82–84 (2017)
Image Repair Methods Based on Deep Residual Networks Hongwei Deng1,2(B) , Ziyu Lin1 , Jinxia Li1 , Ming Yao1 , Taozhi Wang1 , and Hongkang Luo1 1 Hengyang Normal University, Hengyang 421002, China 2 Hunan Provincial Key Laboratory of Intelligent Information Processing and Application,
Hengyang 421002, China
Abstract. In recent years, deep learning has shown significant advantages in image restoration. Compared with the traditional repair method, the image repair method based on deep learning can better solve the problem of image missing blur, but it will also cause the problem of local color difference of the repair image. In this paper, an image repair model based on deep residual network is proposed, which is divided into repair module and mitigation module. The repair module uses part of the convolution network to repair the missing area of image blur, the mitigation module uses the deep residual network to adjust the color difference of the repaired image, and the two modules coordinate with each other to make the image repair effect closer to the real image. The experimental results show that the method proposed in this paper has better effect on image repair. Keywords: Depth residual network · Image repair · Image missing blur · Local chromatic aberration of the image
With the continuous development of economy and society, image restoration has gradually become a popular research content, which plays an important role in the field of computer vision. With the emergence of deep learning, image repair has been gradually applied in many fields. Image repair is to fill missing and obscured images to get closer to the real image. Traditional image repair methods are filled by selecting image blocks with similar features from other areas of the image through some low-level features in the image. This method of approximate copying and pasting has a good effect in simple background repair tasks. But in real life, the scene of the image is complex and changeable, some pictures even have special scenes and targets, at this time, relying on the same picture to find the characteristic point of the method is difficult to play the original role. Therefore, it is particularly important to study and explore new image repair methods.
1 Related Work With the development of computer technology and the increasing computing power, deep neural network plays an increasingly important role in image repair. The deep neural network proposed by Wu [1] has a good performance in the ability to express and extract features, and improves with the increase of the depth of the network. He, Zhang © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 160–166, 2022. https://doi.org/10.1007/978-981-16-6554-7_18
Image Repair Methods Based on Deep Residual Networks
161
[2], found that the depth of the network is not the deeper the better, when the accuracy of the network reaches the limit, and then continue to increase the depth, the error rate of repair will be dozens of times the limit. In order to solve these problems, He et al. put forward the deep residual network, alleviated the gradient explosion caused by the increase in depth, improved the accuracy of the system, but also make the effect of image repair more prominent.
2 Body Method In this paper, a new type of image repair network model is proposed, consisting of two independent generation anti-network modules. The repair module PConv-GAN is composed of partial convolution and countermeasures network, which is used to repair irregular masks, and according to the constant confrontation between the generator and the evaluator, the clarity and texture structure of the resulting image are closer to the original image. For repairing the chroma difference problem of the module, this paper designs the mitigation module Rem-GAN, which combines the deep residual network with the countermeasure network, and by using the residual block to connect the output and train the mitigation module, the information of the non-missing area of the image is preserved, thus maintaining the consistency of the texture structure of the image in the non-missing area, and achieving the goal of eliminating the local chromatic phenomenon and solving the pseudo-border. The method framework flow is shown in Fig. 1.
Begin
Image preprocessing
Repair module
Mitigation module
end Fig. 1. Image repair flowchart
2.1 Repair the Module The repair module model, as shown in Fig. 2, contains generators and discriminations.
162
H. Deng et al.
Input PConv BN
Encode Blocks
RELU Up Sampling Concatenate PConv
Decode Blocks
BN Cnov Output Fig. 2. Fix the module model
The convolution layer of the generator’s repair module consists of a partial convolution layer, a batch normalization layer, and a ReLu layer. The use of some convolution layers can better learn images, while using batch normalization layers to improve fit speed and generalization capabilities. The encoding phase sets up eight convolution layers, the decoding phase upsamples the image, and sets up eight decoding layers. The decoding layer is divided into the upper connection layer, the sampling layer, the batch normalization layer and the partial convolution layer, in addition to the first and last part of the convolution layer, the batch normalization layer interval is inserted between the other layers. In order to effectively fuse high-level semantic information and low-level spatial local information for image repair, the generator serializes the output feature map with the corresponding feature map in the decoding block. The divider is also made up of eight convolution blocks, configured in line with the generator, which, in addition to the first convolution layer, uses a batch normalization layer between each convolution layer and the Leaky ReLu layer, replaces part of the convolution layer with two fully connected layers, and finally outputs the verdict. 2.2 Mitigation Module For the repaired image there is a partial color difference problem, here, this paper presents a mitigation model based on the deep residual network, its architecture is shown in Fig. 3. The extraction of image features in the generator network is divided into two steps, the first stage is the preliminary extraction of the front-end convolution layer, the use of multi-scale expansion convolutional residuals, the image multi-scale depth
Image Repair Methods Based on Deep Residual Networks
163
characteristics. Using residual blocks to extract features from 4 different sizes of felt fields increases the characteristic extraction ability of the network and the learning ability of multi-scale semantics. The second part is the residual connection block, which strings the output of the four expansion convolution blocks in the first part, then enters it into a convolution layer to sort out the extracted features, and finally makes the residual connection. Input Conv RELU
DSConv
DSConv
DSConv
DSConv
BN
BN
BN
BN
RELU
RELU
RELU
RELU
Concatenate Conv BN Element wise SUM Conv Output Fig. 3. Mitigation module
2.3 Loss Function 2.3.1 Repair Module The purpose of the repair module is to ensure that the color and spatial position of each reconstructed pixel maximizes the original color and texture of the image. The total loss function definition of the repair module is as shown in formula (1), consisting of the repair loss of the non-masked area, the repair loss of the masking area, the perceived loss, the style loss, the confrontation loss, and the total variation loss. input
Lans = 2Li + 12Lj + 0.04Lk + 100(L1type + L2type ) + 100Lp + 0.3Lq
(1)
The weight of each loss item is determined after analysis of the results of 50 separate experiments. In this paper, the Manhattan distance between the repair image and the
164
H. Deng et al.
non-masked area of the real image is used as the repair loss, and the different convolutional feature layers of several pretrained networks are used to obtain the loss of feature perception between the repair result image and the real image, and the perceived loss of the area to be repaired is enhanced. After 50 independent experimental comparisons, the final experiment in this article used the pool1 layer of VGG16, pool 2 layer, and pool3 layer, and the conv1 layer, pool 2 layer, and pool3 layer output of the pool 3 layer as the perceptive layer for generating the network, which is used to calculate the perceived loss. The parameters of the pretraining network are not involved in the training, but are used to find the loss value. Weighting the perceived loss obtained by the two pretrained networks as the final perceived loss, in order to make the repaired content close to the real image in style, this paper defines two style losses, and finally the anti-loss function definition is shown in formula (2): i,j+1 i,j i+1,j i,j ||Isame − Isame ||1 + ||Isame + Isame ||2 (2) Lvi = (i,j)∈G,(i,j+1)∈G
(i,j)∈G,(i+1,j)∈G
2.3.2 Mitigation Module The purpose of building the mitigation module loss function is to maximize the preservation of the real and reasonable parts of the image and to improve the area where local chromation exists. The image obtained by the repaired module is used as the input image, after the mitigation network module output, the composition of the total loss function of the mitigation module is composed of content loss, perceived loss and countermeasure loss, and the weight of each loss item is finally determined by 50 independent experiment comparison. output
Lans
= 40Lsan + Lk + 0.75Lp
(3)
The content loss function is defined as an average absolute error with weights, and the definition of perceived loss is similar to the perceptible loss of the repair module, except that the pretrained network used is a pretrained VGG-19 network and a DenseNet network on ImageNet, using a block3_conv4 layer of the VGG-19 network and a pool2_conv layer of the DenseNet network, and finally the anti-loss function as shown in (4): N −1 1 Lp = − (Doutput (Ioutput (xi )) log(Doutput (Itrue (xi ))) N i=0
+ (1 − Doutput (Ioutput (xi ))) log(1 − Doutput (Itrue (xi ))))
(4)
N represents the total number of training samples per batch, Iopt (xi ) and Ireal (xi ) represents the first optimized image sample and real image sample per batch. The training of the repair module and the mitigation module is carried out in turn, first by sending the input image into the repair module for training, and then by entering the output of the repair module into the mitigation module for training.
Image Repair Methods Based on Deep Residual Networks
165
3 Analysis of Experimental Results In this experiment, the repair module and mitigation module are implemented based on TensorFlow’s Keras framework, using versions TensorFlow-gpu2.0 and Keras 2.24. The models presented in this article were trained and tested on the Places2 and CelebA Faces datasets, respectively. Randomly select 200,000 images from each dataset as a training set and 4,000 images as test sets. The size of the image and mask is set to 256 × 256. The initial weighting method of the generator in the image repair network module adopts the initialization method proposed by Zhang [2], the experimental training uses NVIDIA Tesla P100-PCIE-16G GPU, the mini batch is set to 4, the total iteration training is 100,000 times, the optimization algorithm is using Adam14, the decay is 1e−7, and the learning rate is set to 0.0002. As shown in Fig. 4, from the results of the repaired image, the repair area can produce a reasonable texture structure and correct contextual semantics, improve the
Fig. 4. Image repair results
166
H. Deng et al.
local chromatography problem, greatly enhance the visual effects and image quality, reflecting the advantages of the method proposed in this paper in the case of irregular masking image repair.
4 Conclusions An image repair model based on deep residual network is proposed in this paper, which can fill the missing image accurately. Compared with GL, this model uses a deep residual network, in the mitigation module to connect convolution network with residual connection blocks, can better solve the problem of local chromatic of the repair image, so that the repair results are integrated. The model designed by this model is closer to the real image than other build-up anti-network models. Acknowledgment. This research was supported by Scientific Research Fund of Hunan Provincial Education Department (18A332), the Science and Technology Plan Project of Hunan Province (2016TP1020), the Application-Oriented Special Disciplines, Double First-Class University Project of Hunan Province (Xiangjiaotong [2018] 469), Internet of things, a first-class undergraduate major in Hunan Province (Xiangjiaotong [2020] 248, No: 288), National College Students’ innovation and entrepreneurship training program (s202010546008), and Innovation and entrepreneurship training program for college students in Hunan Province (20203227).
References 1. Zifeng, W., Shen, C., van den Hengel, A.: Wider or deeper: revisiting the ResNet model for visual recognition. Pattern Recogn. 90, 119–133 (2019). https://doi.org/10.1016/j.patcog.2019. 01.006 2. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, USA (2016) 3. Zheng, C., Cham, T., Cai, J.: Pluralistic image completion. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1438–1447. IEEE, USA (2019) 4. Yu, J., Lin, Z., Yang, J., et al.: Generative image in painting with contextual attention. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514. IEEE, USA (2018) 5. Liu, G., Reda, A., Shih, K., et al.: Image inpainting for irregular Holes using partial convolutions. ACM Trans. Graph. 9(3), 37–51 (2018) 6. Harley, A., Derpanis, K., Kokkinos, I., et al.: Segmentation-aware convolutional networks using local attention masks. In: IEEE International Conference on Computer Vision, pp. 22–29. IEEE, Italy (2017) 7. Ledig, C., Theis, L., Huszar, F., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Conference on Computer Vision and Pattern Recognition, pp. 105–114. IEEE, USA (2017) 8. Zeng, Y., Fu, J., Chao, H., et al.: Learning pyramid-context encoder network for high-quality image inpainting. In: Conference on Computer Vision and Pattern Recognition, pp. 486–1494. IEEE, USA (2019) 9. Zhou, B., Lapedriza, A., Khosla, A., et al.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2018)
Real-Time Traffic Sign Detection Based on Improved YOLO V3 Haini Zeng(B) School of Artificial Intelligence and Smart Manufacturing, Hechi University, Yizhou 546300, China [email protected]
Abstract. The automatic recognition technology of traffic signs is crucial in the process of autonomous driving. This paper aims to solve the low recognition rate of traffic signs due to complex background interference. This study realizes the improvement of detection accuracy of the trained YOLOv3 model by adjusting the relevant parameters in the YOLOv3 model, retraining the data set, and finding the appropriate threshold value, and finally realizing the accurate recognition of traffic signs. Keywords: YOLO v3 model · Traffic signs · Complex background
1 Introduction The detection of traffic signs identifies what kind of traffic signs are in the scene image and is an essential part of the application of autonomous driving technology. In recent years, the recognition of traffic signs in road traffic environment perception systems is the basis for subsequent reconstruction for path planning and decision-making, so recognizing traffic signs is of great importance. In recent years, many studies on detecting traffic sign targets in static scenes have been conducted. They use hand-designed relevant features or split the traffic signs into localities and extract additional features for the localities to obtain more accurate detection results. However, applying hand-designed feature extraction has limitations on feature selection, and the selection of an appropriate classifier can also have a critical impact on the detection results. This approach cannot generalize the problem when applying the human-extracted features to scenarios with more complex scenarios or changing targets. To achieve better detection results, this paper uses the YOLO v3 objective detection algorithm for traffic sign detection in complex backgrounds without relying on manual features and for detailed differentiation of traffic signs.
2 YOLO V3 Target Detection Algorithm Redmon et al. [1, 2] in 2015 proposed the YOLO (You Only Look Once) algorithm, which is an end-to-end training structure, where the image is viewed as a small grid and each © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 167–172, 2022. https://doi.org/10.1007/978-981-16-6554-7_19
168
H. Zeng
grid point is responsible for a candidate box prediction, and the coordinate information of the obtained candidate box (Anchor Box) and the category probability are output. The YOLO network then uses a preset threshold to exclude candidate boxes with small probability, and suppresses non-extreme pixels to exclude redundant candidate boxes and obtains the detected results. The backbone network structure of YOLO v3 [6] is darknet-53 as shown in Fig. 1. Darknet-53 network makes extensive use of residual units to increase the network depth without overfitting; down sampling is achieved by using a convolution with a step size of 2, which is a fully convolutional network. The convolutional kernel size of YOLO v3 network is 3 × 3 and 1 × 1, which can increase the network operation rate and also reduce the complexity. The original 416 × 416 image is convolved into three different scales of feature maps by 32-fold down sampling, 16-fold down sampling and 8-fold down sampling. The first scale is a 13 × 13 sized feature map, which is added to the darknet53 structure after convolution with the information of the generated prediction frame. The second scale is the fusion of the 26 × 26 sized feature map from the inverse second stage of convolution with the feature map of the same size obtained by up-sampling the first scale. The third scale is the fusion of the 52 × 52 sized feature map obtained by up-sampling the second scale with the feature map of the same size obtained by up-sampling the first scale. YOLO v3 network structure and multi-scale feature fusion are used for target detection to improve the detection capability of small targets.
Fig. 1. Network structure of YOLO v3
Real-Time Traffic Sign Detection Based on Improved YOLO V3
169
3 Data Collection and Data Enhancement 3.1 Data Collection In this paper, the dataset Zhang [13, 14] and others have selected the publicly available Chinese Traffic Sign Detection Benchmark (CCTSDB). The dataset is divided into training set and test set in the ratio of 5:1, with 15,000 and 3,000 images respectively, and contains six major categories: speed limit signs, other ban signs, unban signs, indication signs, warning signs and other signs. This paper mainly focuses on four types of signs: indication, speed limit, stop and crosswalk. 3.2 Data Enhancement In order to improve the detection accuracy of the model, prevent over-fitting and traffic signs from being defaced and deformed after years of wind and sun, this paper performs data enhancement processing on traffic sign data by adding noise, Gaussian blur, and rotation. In this paper, the K-means clustering algorithm is used to cluster the real target borders of the already labeled dataset. The area intersection ratio IOU is selected as the rating index to obtain the initial candidate target borders of the predicted traffic marking targets in the training dataset, i.e., when the IOU is not lower than 0.5, the predicted candidate borders at this moment are taken as the initial candidate target borders. IOU is expressed by the formula as follows: area boxp ∩ boxi (1) IOU = area(boxp ∪ boxi ) Where boxp denotes the area of the prediction box and boxi denotes the area of the real target box; then the distance between all real target boxes and the initial candidate target box is expressed as: Dis(box, centroid ) = 1 − IOU (box, centroid )
(2)
Where IOU (box, centroid ) denotes the average intersection ratio of the two borders in the training dataset. The initial candidate boxes are used as the initial network parameters of the YOLOv3 model. The learning rate is set to 0.0001 and the batch_size is set to 256, and the training data are input to the YOLO v3 network for training. The weights and bias values of the convolutional layers of the YOLO v3 network are adjusted continuously for training. The loss function values of the training data are output:
loss = λcoord
K×K M
obj
Iij
2 2 2 xi − xi + wi − wi + hi − hi 1 − wi × hi
i=0 j=0
+
M K×K i=0 j=0
obj
Iij
ci log(ci ) + 1 − ci log(1 − ci )
170
H. Zeng
+ λ(no)obj
K×K M
(no)obj
ci log(ci ) + 1 − ci log(1 − ci )
Iij
i=0 j=0
+
K×K i=0
obj
Iij
pi (c)log(pi (c)) + 1 − pi (c) log(1 − pi (c))
(3)
C∈classes
where λcoord , λ(no)obj denote the coordinate loss of traffic signs and the confidence loss coefficient without traffic signs, respectively. K × K denotes the number of input image grids, and M denotes the number of predictable borders; (xi , yi , wi , hi , ci ) denotes the coordinates, width, height and confidence of the center point of the predicted traffic signs, respectively. (xi , yi , wi , hi , ci ) represent the coordinates, width, height and confiobj dence of the centroid of the real traffic sign. Iij represents the current target detected
(no)obj
by the i grid where the j candidate border is located. Iij indicates that the current target is not detected by the i grid where the j candidate border is located. pi (c) and pi (c) represent the predicted probability value and the true probability value of the traffic sign in the i grid belonging to that class, respectively. c indicates belonging to a certain class, and classes indicates the total number of classes. Until the loss function of the training data set is less than or equal to the value of the set threshold Q1 , or terminate the training when a pre-defined number of training sessions N is completed. The network obtained when the training is stopped is used as the YOLO v3 network.
4 Analysis of Experimental Simulation Results In order to improve the detection accuracy of the trained YOLO v3 model, the tensor parameter and the weight decay parameter among the many parameters of the model are first selected for tuning. The impact of the complexity of the model on the loss function can be controlled by adjusting the weight decay parameter, which mainly prevents the phenomenon of overfitting and improves the generalization ability of the model. The tensor parameter realizes the acceleration of the convergence of the model, especially the treatment of gradient anomalies. In this paper, we selected a weight decay parameter of 0.0005 and a momentum parameter of 0.94. Based on this, we set a detection threshold of 0.15 for adjustment, and finally the model improved from an average accuracy of 77.8% in the training dataset and 72.6% in the test dataset to 82.2% and 80.3%. The average detection time of YOLO v3 detection algorithm on CCTSDB dataset is 23ms. Meanwhile, traffic sign images are selected from the test set for detection to train the network model, as shown in Fig. 2. So the training model has good detection effect on traffic signs.
Real-Time Traffic Sign Detection Based on Improved YOLO V3
171
Fig. 2. Graph of test results
5 Conclusion In this paper, YOLO v3 traffic sign recognition detection system is proposed to address the problem of low recognition rate of traffic signs due to complex background interference. First, the YOLO v3 target detection algorithm is used to label the images in the Chinese traffic sign dataset according to different categories; the final detection accuracy of the YOLO v3 model is adjusted by modifying the parameters in the model; suitable thresholds are found to improve the generalization ability of the model; finally, the images on the test set are detected using the training model. The detection rate of the YOLO v3 model trained in this paper is 82.2%, and the detection run rate is 40FPS, which meets the real-time detection requirement. Acknowledgment. This work was supported by youth project School Level scientific research project Hechi University (No.2018XJQN010).
References 1. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788. IEEE Computer Society, Washington, DC (2016) 2. Redmon, J., Farhadi, A.: YOLO9000:better,faster,stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017) 3. Houben, S., Stallkamp, J., Salmen, J., et al.: Detection of traffic signs in real-world images: the German traffic sign detection benchmark. In: Proceedings of the 2013 International Joint Conference on Neural Networks, pp. 1–8. IEEE, Piscataway (2013) 4. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 5. Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. Honolulu, HI, USA (2017)
172
H. Zeng
6. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6517–6525 (2017) 7. Yan, J., Lei, Z., Wen, L., Li, S.Z.: The fastest deformable part model for object detection. In: Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA (2014) 8. Zhang, J.M., Xie, Z.P., Sun, J., et al.: A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 8, 29742–29754 (2020) 9. Jiang, J.H., Bao, S.L., Shi, W.X., et al.: Improved traffic sign recognition algorithm based on Yolo V3 algorithm. Comput. Appl. 40(8), 2472–2478 (2020) 10. Bai, S.L., Yin, K.X., Zhu, J.Q.: Traffic sign detection algorithm based on lightweight yolov3. Comput. Modern. 2020(9), 83–88+94 (2020) 11. Zeng, Z., Wang, P., Liu, W., et al.: Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression, pp. 12993–13000. AAAI (2020) 12. Zhang, J.M., Huang, M.T., Jin, X.K., et al.: A real time chinese traffic sign detection algorithm based on modified YOLOv2.Algorithms 10( 4), 127–140 (2017) 13. Zhang, J.M., Jin, X.K., Juan, X., et al.: Spatial and semantic convolutional features for robust visual object tracking. Multimedia Tools Appl. 79(8), 15095–15115 (2020)
Design of Ground Station for Fire Fighting Robot Minghao Yang(B) , Xizheng Zhang, Sichen Fang, Anran Song, Zeyu Wang, and Zijian Cui Hunan Institute of Engineering, Xiangtan 411104, Hunan, China
Abstract. The appearance of fire fighting robot brings great help to firefighters. The ability of fire fighting robot to perform tasks not only depends on the performance of robot equipment itself, but also depends on the remote control and command of the robot by the operator using the ground station. The fire-fighting robot system consists of the robot running part and the ground station part. The two main functions of the ground station are monitoring and control. The monitoring content includes the robot equipment status and real-time image. Control is mainly robot operation – robot start and stop, rotation, water spraying, video recording and other commands can be issued from the ground station. Therefore, the research and design of the ground station system which is a part of the fire fighting robot system has important role and practical significance. This paper mainly introduces the basic structure of the fire robot ground station, such as picture transmission, digital transmission, power supply system, design image transmission, to meet the actual needs of the long endurance power supply and its protection module. Keywords: Ground Station · Multiple Image transmission · Long endurance power
1 Introduction When a fire occurs, the scene is complex, the environment of the fire site (such as the temperature of the fire site, the fire situation, the content of flammable and explosive gas, the location of obstacles, etc.) is not clear, and the fire personnel can not accurately judge the situation, and rashly enter the fire site, which is likely to bring great harm to the personal safety of disaster relief personnel. To solve this problem, people have developed various kinds of fire fighting robots, which equipped with advanced fire fighting equipment to assist or even replace fire fighting personnel to carry out rescue and relief tasks. Ground station is the main part of the control of fire fighting robot. Operators control the fire fighting robot through the peripherals of the ground station system (control buttons) and the ground station software. They operate according to the real-time video and process the scene situation. Fire fighting robot system is composed of robot running part and ground station part, and two ends respectively include image sending and receiving, so image transmission system is also included. Overall system structure is shown in Fig. 1. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 173–181, 2022. https://doi.org/10.1007/978-981-16-6554-7_20
174
M. Yang et al.
Fig. 1. Block diagram of overall structure of fire fighting robot system
The research objective of this paper is to design a fire fighting robot ground station system which can meet requirements of fire mission. Compared with other common robots, fire fighting robot is widely different from other common robots. It is positioned as an industrial application, which requires longer endurance time in dealing with fire, clearer images to be sent back, detailed scene conditions to fire fighting personnel, and map display requirements in complex terrain. In a word, fire applications put forward higher requirements for the research and development of the robot system from all aspects, so the design of the corresponding fire robot ground station system must start from all aspects of its needs.
2 Ground Station Design The in-depth study of the functions and technologies related to ground station system can not only ensure the function realization of the ground station and the stability of the system, but also guarantee the stability and normal operation of the fire fighting robot, and improve the coordinated operation of fire fighting robot system. Therefore, the development of the ground station system must be regarded as a key research direction, whether for the realization of the real-time image and control function of the ground station or for the overall completion of the fire fighting robot system. The following paragraphs will begin with the basic composition of the Earth Station, including its development history and typical Earth Station profiles and functional composition. Ground station was first developed with the development of unmanned aerial vehicle (UAV) system. The appearance of ground station is to meet the requirements of unmanned aerial vehicle operation, reconnaissance and other more functions on the more perfect. Ground station is a comprehensive system that integrates the functions of real-time collection and analysis of telemetry data, timing sending of remote control
Design of Ground Station for Fire Fighting Robot
175
instructions, and dynamic display of operation status, etc. It is usually composed of display control platform and communication equipment. Because UAV involves such cutting-edge technologies as image processing, wireless transmission, advanced control and multi-sensor fusion, and has broad application prospects, its research has become a hot spot in the world. As an important part of small unmanned aerial vehicle system, ground station has also become a hot research topic. Scientific research institutions, companies, organizations and universities in many countries and regions regard it as an important research field. 2.1 Basic Composition of Ground Stations Generally speaking, the hardware equipment of ground station is composed of operating system module, liquid crystal display, wireless remote control, power supply system, image transmission module and data transmission module. Small consumer ground stations are relatively simple and consist of mobile devices, which play high-definition video, and wireless remote controls. This kind of special robot ground station – fire fighting robot ground station also includes console software, which mainly displays the status and function of the robot in real time, controls the walking and spraying of the robot, and displays the internal situation of the fire and whether there are trapped people through the camera in real time. Specific objects and overall block diagrams are shown in Fig. 2 and Fig. 3:
Fig. 2. Ground station console
176
M. Yang et al.
Fig. 3. Block diagram of the hardware part of ground station
2.2 Operating System Module Operating system module includes a computer and a control panel. The computer contains the operating system, and the data, images and other information transmitted back are displayed through the interface, and the parameters can be adjusted to change the running state of the robot. The control panel is a manual operation board, mainly including switch, function display light, keyboard, small screen, keys, joystick, etc. The switch is to control the start and stop of each functional part, the screen can display state parameters, the button can control the water cannon to open, take photos and so on, the joystick is used to adjust the movement and rotation of the robot. 2.3 Wireless Communication Module Wireless transmission is the transmission of control signals, data and images between ground station and robot, including the control instructions issued by ground station, as well as robot’s control of the motor, water cannon and camera, as well as data and image acquisition, and return to ground station, according to the control signal, the data wireless transmission signal and image signal. The control signal and data signal can be transmitted by the same wireless device. Therefore, the wireless communication module can be divided into data transmission module and image transmission module. Its working principle is shown in Fig. 4: Both data transmission module and image transmission module have an independent set of wireless transmitter and wireless receiver, each of which has a non-interference
Design of Ground Station for Fire Fighting Robot
177
Fig. 4. Robot wireless communication
transmission frequency band. When the working frequency band is set, the signal transmission between the wireless modules can be realized. In wireless data transmission, the wireless transmission module can be used as both transmitter and receiver, and the signal can be transmitted and received bidirectional between the two modules. In the case of wireless image transmission, the transmitter is connected to the camera and the receiver is connected to the display. Camera records the video in real time, and image information is transmitted to the receiver in one direction by the transmitter. Data transmission module is mainly used to send and receive the data between ground station and fire robot. Data transmitted are the state of the fire robot and some equipment, including the data collected by GPS state and other sensors, etc. The frequency range is generally 433MHz, 915MHz and 2.4 GHz, and 433MHz is mostly used in these frequency bands, especially widely used in civil use. Because its advantages are longer wavelengths and longer penetration. The transmitting power of the data transmission module is 800 mW. With the suction cup antenna, it has a long communication distance. Image transmission module adopts 5.8 G wireless image transmission. Compared with 2.4 G wireless image transmission, 5.8 G frequency band has fewer signals and is not easy to be interfered. 5.8 G wireless image transmission has strong anti-interference ability. In the case of complex urban environment, 5.8 G image transmission is more conducive to signal reception and transmission. The transmission distance of 5.8 G is longer than that of 2.4 g, the relative rain decline is smaller, and the stability is high, which is suitable for long-distance transmission in the field. As a fire protection application, whether urban or suburban, 5.8 G image transmission is a better choice. 2.4 Power Module Power supply uses 12 V battery to supply power to the system. There are under-voltage protection modules, voltage stabilizer modules and step-down modules. The endurance requirements should reach more than 6 h. The power supply is also equipped with a protection module to ensure the system’s long-term stable work.
178
M. Yang et al.
3 Project Design 3.1 Wireless Communication Module Design Data transmission module adopts high-performance LoRa spread spectrum chip SX1262, whose working frequency band is 433 MHz and transmitting power is 100 mW. LoRa spread spectrum can bring farther communication distance. It has external LNA and independent noise reduction circuit to improve the receiving sensitivity, and the maximum communication distance can reach 3000 m. The diagram is as follows (Fig. 5):
Fig. 5. Data transmission module
The transmitter of the image transmission module adopts TS832, transmission frequency is 5.8 g, transmission power is 600 mW, 48-frequency point, 700-wire highdefinition lens, 140-degree wide Angle, pixel 976 (H) X496 (V). The receiver is RC832, the AV output video cable is connected to the acquisition card, which is connected to the motherboard, and the audio cable is connected to the earphone connection entrance of the housing. It is expected that this set of devices can reach more than 1000 m. The diagram is as follows (Fig. 6):
Fig. 6. Image transmission module
Design of Ground Station for Fire Fighting Robot
179
The body of the firefighting robot vehicle needs to be connected with three cameras, namely, the front of the vehicle, the rear of the vehicle and the water cannon. Therefore, three sets of picture transmission devices are required to set three different frequency points, and the three sets of devices do not interfere with each other. 3.2 Power Module Design The total external power of the whole system is about 80 W, the external power supply voltage is 12 V, the current is about 6.7 A, the selection voltage is 12 V, the battery capacity is about 45 Ah, so the battery life is 7 h. The capacity of the battery remains above 85% after 800 cycles. At this time, the capacity of the battery is above 38.25 Ah and the battery life is still up to 6 h. Connect a battery discharge protection controller at the battery output port, set the alarm when the voltage drops to 10.5 V, and stop power supply to the load when the voltage drops to 10.3 V. In the energy-saving mode (when the voltage does not drop to 10.5 V), run the digital alarm screen for 15 s. If the voltage is lower than 10.5 V, exit the energy-saving mode and the buzzer starts to sound. The battery continues to discharge, when the voltage is less than 10.3 V, continue to supply power to the load for 3 s and then disconnect. When there is no load to use electricity, the battery voltage will rise. If the voltage is 10.5 V at this time, the discharge will end. If the voltage is greater than 10.5 V at this time, the load will be supplied again after a delay of 2 min, and the cycle will continue until the discharge is over. Set voltage regulator power module, input 12 V, output 12 V, output current 0–10 A, meet the load current, the maximum output power up to 120 W, meet the load requirements (Figs. 7 and 8).
Fig. 7. Battery discharge protection controller
180
M. Yang et al.
Fig. 8. Voltage regulator
3.3 Overall Design Diagram Design (Fig. 9)
Fig. 9. Overall design circuit diagram
4 Conclusion The research of this paper is to design a set of fire robot ground station system that can meet the requirements of fire missions. It can have a longer endurance time in dealing with
Design of Ground Station for Fire Fighting Robot
181
fire, and the transmitting and shooting images need to be clearer and multi-directional, so as to provide detailed scene conditions for fire personnel. The launching targets of water cannon can be visible, which can better eliminate the disaster. Acknowledgment. This work was supported in part by the National Natural Science Foundation of China under Grants 61673164, in part by the Hunan Provincial Natural Science Foundation under Grants 2020GK2089 and 2020JJ6024, and in part by the Key Project of Hunan Educational Department under Grant 19K025.
References 1. Pan, Y., Zhou, L., Ni, T.: Fire situation analysis in China. Fire Technology and Product Information 2. Haitao, L., Min, W., Lingbin, S.: Development and application prospect of fire fighting robot. China Public Safety 01(1), 161–164 (2007) 3. Yanjun, L., Jiwei, L., Xiaodong, Z.: Analysis and research on the development of UAV ground station. J Shenyang Aerosp. Univ. 31(03), 60–64 (2014) 4. Yan, Z.: Overview of the development of UAV ground station. Avionics Technol. 41(01), 1–6 (2010) 5. Songru, H., Chao, Z., Jia, Y., Pingfa, J.: Analysis and research on the development of civil UAV ground station. Dig. Technol. Appl. 37(10), 227–229 (2019) 6. Zhao, Q.: Control System Design of a New Type of Tracked Fire Fighting Robot. Nanchang University (2020) 7. Xiafu, L., Qizhong, C., Zheng, L.: The invention relates to a communication control terminal design for a small unmanned ship ground station. Ship Electric Technol 39(08), 1–4 (2019) 8. Yang, Q.: Design and Implementation of Ground Station System of Small Industrial UAV. University of Electronic Science and Technology of China (2017)
Baby Expression Recognition System Design and Implementation Based on Deep Learning Xuanying Zhu, Yaqi Sun(B) , Qingyun Liu, Jin Xiang, and Mugang Lin College of Computer Science and Technology, Hengyang Normal University, Hengyang, China [email protected]
Abstract. With the increasing demand for social security, the performance of expression recognition are getting important for our social life. However, current expression recognition technology has a poor performance in accuracy and speed, especially in the challenge of baby expression recognition. In this paper, we propose a method for baby expression recognition, and design and implement baby expression recognition system bases on a deep learning model. In the system, we build a convolutional neural network model and train a baby expression dataset. During the method, we use a forward propagation neural network, which can loads a picture directly and output the recognition result. The method also provides various functions of baby expression recognition. At last, the experiment shows that our method can balance the requirements of accuracy and speed perfectly. Keywords: Deep learning · CNN · Neural networks
1 Introduction Artificial intelligence expression recognition based on deep learning is currently valuable for applications, and research projects on face expression recognition [1] are increasing. Face expression recognition system generally consists of two major parts: 1) feature extraction; 2) classifier design. Among them, feature extraction refers to the extraction of recognizable features [3] from face images. Currently, two commonly used feature extraction methods are geometric structure-based feature extraction and epistemologybased feature extraction. The so-called classifier design is to build a recognition system that can classify expressions based on the extracted face features. Convolutional neural network CNN [4] is the most frequently used deep learning method in face recognition. The main advantage is that deep learning methods can be trained with a large amount of data. We can learn the variations of various representations in the training set to achieve face recognition. With this approach, it replaces the previous need to design special feature variables to accommodate different types of intra-class variables. For example lighting conditions, pose, facial expressions, etc. The main drawback of the deep learning approach is the need to use a large number of training sets. That contains almost all kinds of variables which can be generalized to almost all samples.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 182–188, 2022. https://doi.org/10.1007/978-981-16-6554-7_21
Baby Expression Recognition System Design and Implementation
183
2 Related Work 2.1 Deep Learning Based on Neural Networks Convolutional neural networks (CNNs) are the most commonly used class of deep learning methods for face recognition [5]. The main advantage of deep learning methods is that a large amount of data can be used to train. And we can learn expression representations by training data. Instead of designing specific features that are robust to different types of intra-class differences (e.g., lighting, pose, facial expression, age, etc.), which can learn them from the training data. The main shortcoming of deep learning methods is that they need to be trained using very large datasets. And these datasets need to contain enough variation so that they can be generalized to unseen samples. In addition to learning discriminative features, neural networks can be dimensionality reduced and can be trained as classifiers or using metric learning methods. CNNs [7] are considered end-toend trainable systems that do not need to be combined with any other specific method. Expression recognition systems typically consist of the following building blocks, as shown in Fig. 1. The selection of loss functions for training CNN methods has been the most active area of research in face recognition recently, and the functions need to satisfy the following conditions and we can see Eq. 1.
Fig. 1. Expression recognition system module
f (xa ) − f (xp )22 + α < f (xa ) − f (xn )22
(1)
2.2 Expression Detection and Correction In the expression detection part, we use support vector machine combined with gradient direction histogram. Firstly, the feature vectors are constructed by calculating the gradient direction of the local region, and then all the feature vectors are input into the classifier. If the output result is positive, the face position is returned, specifically, the coordinates of the upper left corner and the lower right corner of the rectangle are detected. Compared with other methods, this method balances accuracy and computation speed better, and is more suitable for online recognition applications. The details of calculating the pixel gradient in the image are shown in Eq. 2. In Eq. 2, m and θ are respectivly
184
X. Zhu et al.
magnitude and direction. fx (x, y) = f (x + 1, y) − f (x − 1, y) fy (x, y) = f (x, y + 1) − f (x, y − 1) m(x, y) = fx (x, y)2 + fy (x, y)2
(2)
θ (x, y) = αrc tan(fx (x, y)/fy (x, y)) In the expression correction part, we use the millisecond set method proposed in, use gradient enhancement to train several regression trees, and then use decision tree set to calculate 68 landmarks including eye contour, nose and mouth contour. 2.3 Face Recognition Technology Logistic Regression Loss Function. Logistic regression also called logistic regression analysis. It is a generalized linear regression analysis model. In logistic regression, we will define a different loss function, which plays a similar role to the error squared. These will give us a convex optimization problem. In logistic regression, we use the loss function −(y * log(ˆy) + (1 − y)log(1 − yˆ )) . In the learning process, trying to make the loss function smaller also means that log(1 − yˆ ) should larger. By this series of reasoning, you can conclude that the loss function makes yˆ as small as possible. Define a cost function, you can see Eq. 3. It measures the performance on the full training sample. yˆ is the predicted output value derived by logistic regression algorithm using a specific set of parameters w and b. So when training the logistic regression model, we have to find the appropriate parameters w and b. Then making the cost function J as small as possible. The results show that logistic regression, can be considered as a very small neural network. (3) Gradient Drop. After defining the loss function, we can intuitively see that the loss function is non-convex. Since the function is non-convex, there exists a minimum value of the function, which means we can have the solution with the minimum loss value (the optimal solution). Now the problem becomes how to find the optimal solution (find the most suitable parameters and b). In higher mathematics we have learned the derivative, which represents the slope of the function image at a certain point. In the process of finding the optimal solution, the parameters are updated by calculating the slope to find the optimal solution. In the gradient descent before the initialization of parameter values (to decide from which point to start the gradient descent), the actual verification shows that the function is non-convex regardless of which side of the descent can get the optimal solution (the initial parameters can be large or small). The formula is Eq. 4. The iteration continues until it is located at the optimal solution. At this point the standard neural network is trained and predictions can be made based on the parameters obtained. w = w − adw
(4)
Baby Expression Recognition System Design and Implementation
185
3 System Design 3.1 General Interface Design In addition to improving the performance and speed of the baby expression recognition [2] algorithm carried by the system. We design a user-friendly system interface. Making it not only optimizing the recognition effect, but also making it easy to deploy on mobile devices. The general framework of the deep learning-based baby expression recognition system is shown in Fig. 2. In addition to the usual digital image processing operations on face images. The system is fully functional and can perform face detection, alignment and recognition functions. Users can use a combination of drop-down menus and keystrokes to call functions to achieve each function. The system is generally divided into a login interface and function pages.
Fig. 2. System structure
3.2 Objective Comparison Dataset representative MTFL from recent years is selected. The dataset contains 12,995 faces with 5 key point annotations. In addition to that they offer information on gender, whether to smile, and head posture. The model is obtained by building a convolutional neural network and training the face expression dataset. After obtaining the model, the recognition results can be directly obtained by loading the image through the forward propagation of the neural network [6].
186
X. Zhu et al.
3.3 System Module The system is divided into user module [8], image recognition module, real-time monitoring module, expansion module, and backend management module. Image recognition requires a neural network. So before recognition, we need to build a convolutional neural network, and use the data set to train the neural network model. Then store the trained neural network in the model file for invocation. Based on the trained neural network model, the predicted values can be obtained by performing forward operations on the loaded static images. The addition of the real-time monitoring module allows the user to identify the baby. Not only can still images, but also call the camera to monitor the baby in real time. The real-time monitoring module requires the user’s camera to be on and of sufficient clarity. The speed of image refreshing in the real-time monitoring module of this software is 0.1 ms. So the results displayed in the real-time monitoring module are very timely. Users can clearly see the specific emotional changes of the baby at each moment, which is conducive to the physical and mental health of the baby. We can see the Fig. 3.
Fig. 3. User module
3.4 Software Testing In order to further test the performance of the proposed deep learning-based baby recognition system. We use pictures containing multiple faces and videos with high human traffic to test the accuracy and speed. To evidence algorithm in multiple face detection, alignment and recognition tasks under multiple face scenarios. Through the test results, it can be concluded that the system can accomplish the expected functions better and the interface is easy to operate and user-friendly. After logging into the software, enter the homepage and click Baby Recognition on the main page to switch to the recognition interface of baby. It is shown in Fig. 4. Open the camera can enter the video monitoring module to achieve real-time monitoring of expression recognition. It can be clearly seen
Baby Expression Recognition System Design and Implementation
187
from Fig. 5. Through comparison, it can be seen that the new algorithm proposed in this paper can focus on more faces with bad quality. And is not prone to the phenomenon of missing detection, which meets the demand of face detection. The algorithm equipped with this system can detect more faces and align better. Finding out the location of key points of faces more accurately, and recognizing faster at the same time. So the baby recognition algorithm of this system can well balance the needs of speed and accuracy.
Fig. 4. Baby identification interface
Fig. 5. Image recognition interface
188
X. Zhu et al.
4 Conclusion This paper designs and develops a baby expression recognition system based on deep learning. It using Python Qt V5 Graphical User Interface design. And using logistic loss function, gradient descent, dichotomous classification and other algorithms to proposes the precise and rapid recognition of baby expressions. Through testing videos and pictures containing baby faces, it can be seen that this system can be well applied to real life with fast response time. It have simple operation and intuitive image of results, which is convenient for users. Moreover, the system adopts a lightweight design and can be easily carried on different mobile devices. But in the complex environment, tracking to the face sensitivity is not high enough. We can use more accurate algorithm to achieve multiple face recognition, and through the simultaneous recognition of baby face for the expression of abnormal baby alarm. That can better monitoring baby situation first time to give more accurate help. Acknowledgments. This work was supported by National Natural Science Foundation of China (61772179), Hunan Provincial Natural Science Foundation of China (2020JJ4152, 2019JJ40005), the Science and Technology Plan Project of Hunan Province(2016TP1020), Scientific Research Fund of Hunan Provincial Education Department (18A333), Double First-Class University Project of Hunan Province (Xiangjiaotong [2018]469), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20190998), Degree & Postgraduate Education Reform Project of Hunan Province (2019JGYB266, 2020JGZD072), Industry University Research Innovation Foundation of Ministry of Education Science and Technology Development Center (2020QT09), Hengyang technology innovation guidance projects (2020jh052805, Hengcaijiaozhi [2020]-67), Postgraduate Teaching Platform Project of Hunan Province (Xiangjiaotong [2019]370–321).
References 1. Zhong, G., Liu, G., Du, X.: Facial Expression recognition method based on convolution neural network. Int. Core J. Eng. 7(05), 511–515(2021) 2. Gioia, N.J., et al.: The child emotion facial expression set: a database for emotion recognition in children. Front. Psychol. 12(1),1–9 (2021) 3. Yao, L., Hongbin, P., Da-Wen, S.: Efficient extraction of deep image features using convolutional neural network (CNN) for applications in detecting and analysing com-plex food matrices. Trends Food Sci. Technol. 113(07), 193–204 (2021) 4. Sirui, Z., et al.: A two-stage 3D CNN based learning method for spontaneous micro-expression recognition. Neurocomputing 448(06), 276–289 (2021) 5. Tamilselvi, M., Karthikeyan S.: Hybrid framework for a robust face recognition sys-tem using EVB_CNN. J. Cases Inf. Technol. 23(3), 43–57 (2021) 6. Akhand, M.A.H., Roy, S., Siddique, N., Kamal, M.A.S., Shimamura, T.: Facial emo-tion recognition using transfer learning in the deep CNN. Electronics 10(9), 1036–1036 (2021) 7. Yang, L., Yang, B., Gu, X.: Adversarial Reconstruction CNN for illumination-robust frontal face image recovery and recognition. Int. J. Cogn. Inf. Natural Intell. 15(2), 18–33 (2021) 8. Nour, N., Elhebir, M., Viriri, S.: Face expression recognition using convolution neural network (CNN) models. Int. J. Grid Comput. Appl. 11(4), 1–11(2020)
Handwriting Imitation with Generative Adversarial Networks Kai Yang, Xiaoman Liang(B) , Qingyun Liu, and Kunhui Wen College of Computer Science and Technology, Hengyang Normal University, Hengyang, China [email protected]
Abstract. Handwriting imitation is a challenging and interesting deep learning topic. This paper proposed a method to imitate handwriting style by style transfer. We proposed an neural network model based on conditional generative adversarial networks (cGAN) for handwriting style transfer. This paper improved the loss function on the basis of the GAN. Compared with other handwriting imitation methods, the handwriting style transfer’s effect and efficiency have been significantly improved. The experiments showed that the shape of the generated Chinese characters is clear and the analysis of experimental data showed the Generative adversarial networks showed excellent performance in handwriting style transfer. The generated text image is closer to the real handwriting and achieved a better performance in term of handwriting imitation. Keywords: Handwriting imitation · Style transfer · Generative adversarial networks · Conditional adversarial networks
1 Introduction Neural networks learned from imitating the signal transmission methods of human brain neurons. In recent years, neural networks and deep learning have received more and more attentions and made great achievements in many fields. Since 2016, deep learning has been applied to a new field called style transfer [1]. Style transfer inputting a content picture and a style picture and then generate a new one after the feature extraction and reconstruction of the neural network. The picture has the content structure of the content picture and the artistic texture and style of the style picture. This kind of research has become a very popular research topic in artificial intelligence. Neural style transfer is widely used in many fields to solve a variety of problems, such as video stylization, text style transfer and super-resolution, however, the handwriting is one of the most important way of communication for human beings. Chinese characters have a history of thousands of years. In the latest official GBK standard, Chinese characters contain a total of 27,533 Chinese characters. Everyone has their own handwriting style, but it is a very arduous and tedious task for creating a personal font library of 27,533 Chinese characters. This paper proposed a method for Handwriting imitation using style transfer by the conditional generative adversarial network. [4, 5] After Compared with the previous fonts transfer methods, GAN has achieved a good © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 189–195, 2022. https://doi.org/10.1007/978-981-16-6554-7_22
190
K. Yang et al.
performance in fonts style transfer. However, because of the complexity of Chinese characters, the methods in previous studies of fonts style transfer did not perform well on Chinese characters. In this paper, we use L1 loss function based GAN to solve this problem. Compared with other Loss functions L1 can retain the local detail structure (Fig. 1).
Fig. 1. Comparison of generated fonts and source fonts, the left handwriting style is the target fonts.
2 Related Work The neural style transfer has achieved excellent results in various fields, what is impressive is that neural networks often perform beyond people’s expectations in various fields. Many research institutions and laboratories have carried out their extensive and in-depth research on neural style transfer [1]. Among them, the Texture style transfer is one of the most popular research topics. The term Generative Adversarial Network (GAN) was first proposed in 2014 by Ian Goodfellow of the University of Montreal and his colleagues [2]. The generative adversarial network can be regarded as a generative model which learns the probability distribution of these samples by learning a large number of data samples from the encoder. Due to the excellent learning ability and generation ability of the generation adversarial network. Since its birth in 2014, the GAN has been developed to many generation adversarial networks. Among these networks, the well-known networks including the Conditional Generative Adversarial Networks (cGAN) [4, 5], which was proposed in the same year in 2014. Compared with the original GAN network, the cGAN adding conditional constraints to the generator and discriminator, and it can be seen as an improvement from the unsupervised learning generation adversarial network to the supervised learning generation adversarial network that established a solid basis for subsequent development and further research. In 2015, because researchers want to see what the generative network has learned to visualize the representation information, deep convolutional generative confrontation networks (DCGANs) [10] are proposed to clarify the representation information of the generated images. The purpose of this paper is to find a new method to apply to font style transfer. After comparison with some common methods, we found that cGAN is the most suitable for font style transfer. Recently, some researchers like Radford A et al. [10] Fogel S et al. [11] also proposed some methods for handwriting transfer. But it is not suitable for Chinese characters handwriting transfer.
3 Architecture We take consideration of the encoder and decoder structure in CGAN [5], as a result our generator is composed of U-net [9] network structure, and our discriminator uses a
Handwriting Imitation with Generative Adversarial Networks
191
convolutional PatchGAN [4] classifier. First of all, the input of the network is the image generated by the two fonts obtained by preprocessing and the random noise image generated by the generator. The U-net network of the generator extracts the feature map of the text image through the convolutional layer and the maxpooling layer of the encoder, and decoder expands the feature map upwards through deconvolution (Fig. 2).
Fig. 2. The GAN network structure we used, G means the generator and D means the discriminator.
The real target font image is input to the discriminator, and the random noise image generated by the generator from the source font is compared with the encoded real image inside the generator. The Constant loss function is used to optimize the generated random noise image to make it getting closer to the label picture. Lconstant =
n 2 1 a b Ti(w,h,c) − Fi(w,h,c) M
(1)
i=1
We not only need to constrain the loss between the image generated by the generator and the label image, but also need to constrain the loss between the generated image and the Source image, so we set up the L1 loss function to monitor the quality of the generated image. And we used the Tvloss loss function in the generator to eliminate blur noise caused by different fonts or scribbled handwriting. n 1 b b L1 = Ti(w,h,c) − Fi(w,h,c) n
(2)
i=1
In GAN, one of the most important thing is that the image generated by the generator must fool the discriminator, so I. Goodfellow et al. [11] proposed cheat loss to maximize the probability of the generated image fooling the discriminator. Lcheat =
n 1 − yi · log(pi ) + (1 − yi ) · log(1 − pi ) n
(3)
i=1
Where is yi represent the label of sample i and pi represent the probability that the sample i is predicted to be positive.
192
K. Yang et al.
The discriminator must also need to be more excellent to judge the difference between the picture generated by the generator and the real handwritten picture. The competition between the generator and the discriminator makes the generated pictures more excellent. We continuously train the model by saving the parameters obtained from the training, finally we will get the neural network that can generate other text based on the handwriting images (Fig. 3).
Fig. 3. The encoder and decoder in the GAN neural network adopt the U-net structure to connect the encoder and decoder of the same layer through skip connection
4 Experiment We hope that the cGAN model we trained can generalize and generate fonts of various complex styles. Because of the particularity of handwriting, it is difficult to imitate any person’s handwriting style. Therefore, during training, we used a variety of different fonts for training, in order to allow the model to learn as many Chinese character styles as possible. Experiment Environment. All the experiments are on a single Nvidia GTX1080 GPU and an Inter i7 3.7 Ghz CPU with Ubuntu16.04, which is efficient for our tasks. Running Time. It takes about 36 h on a single Nvidia GTX1080 GPU to train the model over 40 epochs, which may include about 32,000 examples.
4.1 Data Set Training this GAN neural network requires a lot of pictures of Chinese characters. How to find so many complete Chinese character libraries is a problem. At present, there is no outstanding public data set on the research topic of Chinese character handwriting imitation. One method is to input a large amount of handwriting into the font generation software to obtain the required font images. Another way is purchase the official copyright
Handwriting Imitation with Generative Adversarial Networks
193
of the font to obtain the required font library. Generally, this work of preparing a data set is huge. In our work, the data set we used to train the cGAN network is the FangZheng Chinese font library. The font input to the network should have as many font libraries as possible. If the input source font and the trained label font cannot match a certain Chinese character, it is likely to cause the model to crash. It is recommended to save the parameters before each training of the network. In addition, preprocessing is necessary to pickle data into binary and save in memory during training (Fig. 4).
Fig. 4. Some inappropriate samples to pickle data into binary, which will cause the failure to the model.
4.2 Experiment Result Our experimental goal is to train a network model that can imitate human handwriting. After training about 20 fonts, we use some famous people’s handwriting images as input to the GAN network for inference, Fig. 5 and the results obtained are better than expected.
Fig. 5. The result that using Mao’s handwriting style image as the input, which is learned from Fangzheng Lishu font library.
Because handwriting style is complicated, and the spacing between characters is difficult to divide into individual fonts, this will affects the effect of the experiment to a certain extent, but our purpose is to use deep learning to imitate any handwriting style. So we found a more complex data set to show the effect (Fig. 6).
Fig. 6. The result that using Fangzheng Kaiti characters as the source font.
194
K. Yang et al.
When our experiments with similar structure fonts and handwriting fonts, it will often get better results with clear shapes and no blurring of noise (Fig. 7).
Fig. 7. Some samples that using a similar fonts style and handwriting.
The reason for this different results may be the structure or style of the text is too large to make it difficult for the constant loss to converge to an ideal situation.
5 Conclusion In this paper, we proposed a new handwriting imitation method through deep learning neural networks, and constrain the gap between the generated handwritten font and the source font through the conditional generation adversarial network cGAN. This is a new way to imitating handwriting style. We set up a generators and variety of the loss functions of the discriminator to constrain the thickness and length of the strokes of the handwritten font. Then we trained with a variety of different styles of Chinese characters. Our generator used the encoder and decoder in the U-net network, and both the encoder and decoder used the form of convolutional neural network. It can be determined through experiments that good results can be achieved when both the encoder and decoder of U-net are trained through L1 loss. Fortunately, we have achieved excellent experimental results.
6 Expectation On the research topic of handwriting imitation, we have done much but not completely work. Handwriting imitation has many interesting directions that can be expanded. For example, if researchers in the future can train a model that can imitate the handwriting style of all kinds of languages fonts or imitate the handwriting style of all languages, that would be a great achievement. On the other hand, our network model has some flaws when training the data set, such as the size of the text, different size of text may also cause bad results. If the network structure can be optimized in the future work to improve this situation or solve it, then it will be a great progress for deep learning handwriting imitation or text style transfer. Acknowledgements. This work was supported by National Natural Science Foundation of China (61772179), Hunan Provincial Natural Science Foundation of China (2020JJ4152, 2019JJ40005), the Science and Technology Plan Project of Hunan Province(2016TP1020), Double FirstClass University Project of Hunan Province (Xiangjiaotong [2018]469), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20190998), Degree & Postgraduate Education Reform Project of Hunan Province (2019JGYB266, 2020JGZD072), Industry University Research Innovation Foundation of Ministry of Education Science and Technology Development Center (2020QT09), Hengyang technology innovation guidance projects (Hengcaijiaozhi [2020]-67), Postgraduate Teaching Platform Project of Hunan Province (Xiangjiaotong [2019]370-321).
Handwriting Imitation with Generative Adversarial Networks
195
References 1. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016) 2. Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014) 3. Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, CVPR (2017) 4. Isola, P., Zhu, J.Y., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision & Pattern Recognition. IEEE (2016) 5. Gauthier, J.: Conditional generative adversarial nets for convolutional face generation. Class Project for Stanford CS231N: Convolutional Neural Networks for Visual Recognition, Winter Semester (2014) 6. Zhao, H.H., Rosin, P.L., Lai, Y.K., et al.: Image neural style transfer with global and local optimization fusion. IEEE Access 7, 85573–85580 (2019) 7. Zhao, H.H., Rosin, P.L., Lai, Y.K.: Automatic semantic style transfer using deep convolutional neural networks and soft masks. Vis. Comput. 36, 1307–1324 (2017) 8. Zhang, S.X., Zhu, X., Hou, J.B., et al.: Deep relational reasoning graph network for arbitrary shape text detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR (2020) 9. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 10. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016) 11. Fogel, S., Averbuch-Elor, H., Cohen, S., et al.: ScrabbleGAN: semi-supervised varying length handwritten text generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE (2020)
Epidemic Real-Time Monitor Based on Spark Streaming Real-Time Computing Algorithm Jiaxin Yang, Yaqi Sun(B) , Xiaoman Lian, and Xiaoyang He College of Computer Science and Technology, Hengyang Normal University, Hengyang, China [email protected]
Abstract. Real-time data processing refers to the process by which the computer collects and processes field data in the actual time when it occurs. At present, there are many drawbacks to the traditional real-time data processing model. For example, developing a real-time processing model requires developers have high technical skills. And the model deployment and task monitoring are very inconvenient. Spark Streaming is currently the most popular real-computing framework. It has good scalability, high throughput, and fault tolerance mechanism.According to the characteristics of epidemic diffusion, this paper designs an epidemic realtime monitoring model based on the Spark Streaming algorithm and develops a visual and interactive real-time epidemic monitoring system for the novel coronavirus pneumonia (COVID-19) epidemic in a timely and effective manner. At last, a epidemic diffusion system is developed and the COVID-19 epidemic diffusion can be simulated as a graphic interface. Keywords: Spark Streaming · Real-time monitoring · Epidemic
1 Introduction On March 2, 2020, Chairman Jinping Xi pointed out in his speech at a symposium with responsible comrades and experts and scholars of relevant departments on scientific research on epidemic prevention and control, “Using new technologies such as big data and artificial intelligence to carry out epidemiological and traceability investigations, improving accuracy and efficiency” [1]. At this stage, the use of big data and artificial intelligence technology to carry out epidemic prevention work has very important practical significance. Based on multi-source geographic space-time big data, Zhang Liu proposed a dynamic estimation model of the multi-level spatial distribution of interregional migrants, which was used to estimate the inflow from Wuhan to all parts of Hubei Province before New Year’s Eve 2020 (January 24, 2020) the number of people and their distribution characteristics [2]. Di Xu used spatio-temporal big data as an important tool for this type of public opinion research and judgment, and provided government departments with a targeted, qualitative and quantitative scientific and accurate research and judgment system, and provides ideas and references for public opinion crisis management and control of public health events in my country [3]. Johns Hopkins University (JHU) used global epidemic data, combined with maps and charts, to © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 196–202, 2022. https://doi.org/10.1007/978-981-16-6554-7_23
Epidemic Real-Time Monitor Based on Spark Streaming
197
display the epidemic situation in various countries and cities around the world, and to log the daily epidemic trends in China and other regions. The visual results shows the distribution of the global epidemic situation. However, the daily increasing data can only greatly increase people’s sense of panic, and cannot show the dynamic trend of the development of the epidemic. The establishment of a unified, efficient, real-time, and accurate epidemic monitoring system has become an indispensable part of the epidemic prevention and control period.Based on Spark Streaming real-time computation algorithm of SARS can provide real-time monitoring display real time spatial distribution and dynamic development of the situation, to achieve data processing, analysis of epidemic diseases, the purpose of monitoring and early warning.
2 Related Work 2.1 Introduction to Spark Spark is a general-purpose parallel computing framework similar to MapReduce open sourced by UC Berkeley AMP Lab [4]. It also takes into account the characteristics of distributed parallel computing models and memory-based computing. The biggest advantage of Spark over MapReduce is that the intermediate results of job calculation do not need to be flashed to external storage such as hdfs like MapReduce, but stored in memory, so there is no need to read and write to and from external storage, which can greatly improve performance. 2.2 Introduction to Spark Streaming In Internet applications, website traffic statistics, as a common application mode, requires statistics on different data at different granularities, which not only requires real-time performance, but also involves more complex statistical requirements such as aggregation, deduplication, and connection. Traditionally, if you use the Hadoop MapReduce framework, you can easily implement more complex statistical requirements, but the real-time performance can’t be guaranteed. On the contrary, if you use a streaming framework like Storm, the real-time performance can be guaranteed, but the realization of the requirements is complicated. Spark Streaming finds a balance between the two and can easily and accurately implement more complex statistical requirements in real-time (Fig. 1). DStream
Spark Streaming Input data stream
DStream
Spark Engine Batches data stream
BatchEs data flow
Fig. 1. Principles of Spark Streaming Framework.
Spark Streaming splits the received real-time streaming data according to a certain time interval and delivers it to the Spark Engine engine to obtain batches of results. Spark Streaming provides a highly abstract and a discrete stream that represents a continuous data stream, called DStream.
198
J. Yang et al.
2.3 Comparison of the Advantages and Disadvantages of Spark Streaming and Storm Spark Streaming is not a real stream processing framework, but a batch of data at a time. This coarse-grained real-time processing framework, processes data after reading it once or asynchronously, and its calculation can be based on large memory, so it has a high throughput. However, it can’t avoid the corresponding calculation delay, so Spark Streaming is more suitable for a real-time computing system with second-level response [5]. The storm is a pure streaming real-time computing framework, used in scenarios that can’t tolerate a delay of more than 1 s such as real-time financial systems, which require pure real-time financial transactions and analysis. If the Storm framework needs to perform delayed batch processing and interactive query of data in the program, it is simpler and faster than Spark Streaming in terms of programming implementation. The Spark Streaming framework provides good scalability and fault tolerance [6]. In terms of data processing mode, Spark Streaming processes event streams within a certain time window, so the latency is relatively high compared to computing engines such as Strom that process independent events [7].
3 Method 3.1 Algorithm Introduction Real-time computing, also known as instant computing, is the study of computer hardware and computer software systems subject to “real-time constraints” in computer science. Real-time constraints are like the maximum time limit from the occurrence of the event to the system response. Real-time programs must ensure that they respond within strict time limits. Usually, the real-time response time will be in milliseconds, and sometimes in microseconds. In contrast, a non-real-time system is a system that cannot guarantee that the response time meets the real-time constraints under any conditions. It is possible that in most cases, non-real-time systems can meet the real-time constraints, or even faster, but there is no guarantee that the constraints can be met under any conditions. 3.2 Algorithm Implementation The Spark Streaming real-time calculation algorithm divides the data stream into discrete data units by time as the unit, treats each batch of data as RDD(Resilient Distributed Dataset), uses the RDD operator to process, and the final result is returned in the RDD unit. The entire stream computing can superimpose the intermediate results according to the needs of the business, or store them in an external device.Spark Streaming divides the data by time and then calculates it in the traditional offline processing method. From the perspective of the calculation process, there is more data collection and time division (Fig. 2).
Epidemic Real-Time Monitor Based on Spark Streaming
199
Fig. 2. Driver and executer
The Spark computing platform has two important roles, Driver and Executor. Whether it is in Standalone mode or Yarn mode, Driver acts as the master role of the Application, responsible for task execution plan generation and task distribution and scheduling; executor acts as the worker role and is responsible for the actual execution of tasks, the calculated result is returned to Driver.
Fig. 3. The execution process of Spark Streaming real-time computing.
200
J. Yang et al.
We can see from Fig. 3 that, real-time computing is the same as offline computing. The main components are Driver and Executor. The difference is that there are more data collection and data slicing processes. Data collection relies on external data sources. Here, it is represented by Message Queue. Data slicing relies on an internal clock. According to the batch interval, the data is sliced regularly. Data submission processing in each batch interval. Executor obtains data from Message Queue and transfers it to Block Manager for management, and then returns the metadata information BlockID to the driver’s Receiver Tracker. The driver’s Job Jenerator generates a JobSet for a batch of data and finally passes the job execution plan to the executor for processing.
4 Experiments 4.1 Test Results The effect of real-time monitoring of the epidemic situation based on the Spark Streaming real-time calculation algorithm is as follows (Fig. 4).
Fig. 4. Real-time outbreaks across the country.
Statistics of confirmed cases, suspected cases, cured cases and death cases nationwide (Fig. 5).
Epidemic Real-Time Monitor Based on Spark Streaming
201
Fig. 5. Replay in time series.
According to the time series analysis, the epidemic data of each region is rendered according to the level ratio, which can not only clearly and intuitively display the national, provincial, municipal, county and even community’s new epidemic rate, confirmed rate and death rate, and the scope of influence and severity level (Fig. 6).
Fig. 6. Real-time epidemics in various provinces (take Hubei Province as an example).
Real-time broadcast text, including release time and content, data sources, and view real-time epidemic broadcast status. The color depth of the map indicates the severity of the epidemic, which is convenient for the management and control of different measures.
5 Conclusion Compared with other frameworks, Spark Streaming in the real-time computing framework is more practical in practical applications because it is part of the Spark ecosystem. This article proposes real-time monitoring of the epidemic based on the Spark Streaming real-time calculation algorithm. Through the real-time monitoring system of the
202
J. Yang et al.
epidemic, the location and number of confirmed COVID-19 cases, the number of deaths and the recovery status of all affected areas are displayed. Real-time data is fed back through geographic regions. Which visually shows the spatial distribution and dynamic development of the epidemic. The system can monitor the movement of the population related to the epidemic, report the inspection of the epidemic with one click, generate the epidemic data report in real-time, and publish it to the public promptly as needed, and determine the epidemic situation to enhance the public’s awareness of prevention and control, and to detect outbreaks or clusters of epidemics. The zone provides a powerful early warning function, provides certain technical support for the establishment of a unified, efficient, fast, and accurate epidemic monitoring system, directly promotes the development of epidemic monitoring, and guarantees the sustainable development of the national economy and social stability. Acknowledgments. This work was supported by National Natural Science Foundation of China (61772179), Hunan Provincial Natural Science Foundation of China (2020JJ4152, 2019JJ40005), the Science and Technology Plan Project of Hunan Province (2016TP1020),Double FirstClass University Project of Hunan Province (Xiangjiaotong [2018]469), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20190998), Degree & Postgraduate Education Reform Project of Hunan Province (2019JGYB266, 2020JGZD072), Industry University Research Innovation Foundation of Ministry of Education Science and Technology Development Center (2020QT09), Hengyang technology innovation guidance projects(Hengcaijiaozhi [2020]-67), Postgraduate Teaching Platform Project of Hunan Province(Xiangjiaotong [2019]370-321).
References 1. Xi, J.: Provide strong scientific and technological support to win the battle against epidemic prevention and control. HongQi WenGao (06), 2 (2020) 2. Liu, Z., Qian, J., Yunyan, D., et al.: Multi-level spatial distribution estimation model of interregional migration population based on multi-source spatio-temporal big data-taking the population migrants from Wuhan during the COVID-19 epidemic as an example. J. Geo-Inf. Sci. 22(02), 147–160 (2020) 3. Xu, D.: Research on the internet public opinion research and judgment system of major epidemic emergencies based on spatio-temporal big data. Mod. Inf. 40(04), 23–30 (2020) 4. Dai, M., Gao, S.: Performance evaluation of large-scale data analysis based on Hadoop, Spark and Flink. J. China Acad. Electron. 13(02), 149–155 (2018) 5. Pei, G.: Streaming computing and its application in telecom real-time marketing. Inf. Commun. (03), 239–241 (2018) 6. Zhou, Z., Chen, F.: Overview of big data real-time computing platform technology. China New Telecommun. 19(04), 47 (2017) 7. Song, L.: Comparative analysis of Flink and Spark streaming flow computing models. Commun. Technol. 53(01), 59–62 (2020)
Design and Implementation of Fruit and Vegetable Vending Machine Based on Deep Vision Chengjun Yang and Yong Xu(B) Hechi University, Yizhou 546300, China [email protected]
Abstract. The paper aims to speed up the implementation of intelligent farms and solve unmanned automated sales of fruits and vegetables. The study designs an automated sales plan of fruits and vegetables based on STM32F103 and Raspberry Pi 3B. The fruit and vegetable vending machine can automatically recognize a variety of fruits and vegetables such as apples, pears, bananas, carrots, lettuce, and water spinach, and the price can be calculated according to the current unit price and the quality of the weighed fruits and vegetables. Users can scan the QR code to pay according to the information on the display, support WeChat and Alipay, and automatically push the sales of fruits and vegetables through the GSM module. The administrator can enter the password through the TFTLCD touch screen, modify the unit price of various fruits and vegetables, and has the function of power failure protection. Keywords: STM32F103 · Raspberry Pi 3B · GSM module · Image identification
1 Introduction At present, the world is in a period of rapid development of informatization, and production and management are gradually moving towards automation and intelligence. Intelligent automation equipment has become a development trend and has an essential position in intelligent automation. To save time, human resources, cost, and other issues, especially when buying fruits and vegetables and other necessities of life, people hope to save time and experience a different purchase process. With the development and update of the technology, Internet technology, and breakthroughs in image recognition research results, the realization of fruit and vegetable vending machines is of great significance. Traditional farms’ fruit and vegetable sales methods mainly include manual sales, online sales, etc., this requires a lot of time and energy from the merchant, and it is not easy to manage. To realize a smart farm, fruit and vegetable vending machines are widely adopted. Fruit and vegetable vending machines [1] are divided into three modes, and in the automated vending mode, customers only need to put the fruits and vegetables they need to buy on the fruit and vegetable vending machine [2], the vending machine can automatically identify the corresponding types of fruits and vegetables, and calculate the price according to the unit price and weighing quality of the fruits and vegetables. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 203–210, 2022. https://doi.org/10.1007/978-981-16-6554-7_24
204
C. Yang and Y. Xu
The fruit and vegetable vending machine proposed in this paper is connected to Baidu’s intelligent cloud server through Raspberry Pi 3B to realize the intelligent identification of fruits and vegetables. The system compresses and encodes the photographed pictures, uploads them to the Baidu intelligent cloud platform for image classification and recognition, and returns the recognition result to the system recognition terminal Raspberry Pi 3B. The Raspberry Pi 3B system sends the result to the STM32F103 main control through the serial port for processing. According to the received data, the recognition result is displayed on the LCD. The customer needs to click the “Pay” button to enter the payment interface according to the result displayed on the LCD screen. After the payment is successful, press the “Confirm” button to save the data in the database. The fruit and vegetable vending machine can be used as an ordinary electronic scale with simple and convenient operation and high precision in the manual sales mode. In administrator mode, the administrator can modify the unit price of each fruit and vegetable type through the touch input of the management password on the LCD screen. The fruit and vegetable vending machine retain the traditional selling method and realizes the automated sale of fruits and vegetables. The system is suitable for intelligent sales of fruits and vegetables on unmanned intelligent farms. It solves the problem of enormous human resources and material resources in the process of fruit and vegetable sales, saves fruit and vegetable sales time, and comprehensively improves the economic benefits of fruit and vegetable sales.
2 System Hardware Design The system circuit is mainly composed of six parts: power supply circuit, weighing circuit, LCD display circuit, download debugging circuit, power-down storage protection circuit and GSM short message sending circuit. The overall system can be installed outdoors or indoors. The fruit and vegetable vending machine can be divided into three parts as a whole, that is, the image processing part, the control part and the actuator. As shown in Fig. 1.
Fig. 1. System hardware block diagram
The image processing part is mainly composed of a Raspberry Pi 3B and a camera without a USB drive and is connected to the Internet through Raspberry Pi 3B’s WiFi. The entire system will be used for networking.
Design and Implementation of Fruit and Vegetable Vending Machine
205
The control part is mainly composed of a power supply circuit, primary control circuit and download circuit. The power circuit supplies power to the entire system to ensure that the system can operate reliably and stably. The main control circuit is the core of system control and sends different control signals according to the information collected by different modules. The downloading and debugging circuit shows the program downloading and debugging the system and downloading the compiled program to the system’s memory to run. The implementation agencies mainly include weighing circuits, LCD circuits, powerdown storage protection circuits, and SIM800A short message sending circuits. The weighing circuit realizes weighing and displaying the quality of fruits and vegetables. The LCD display circuit realizes the display of the system identification result and the display of unit price, quality, and total price of fruits and vegetables, as well as the display of some prompt information. The power failure protection circuit realizes the power failure storage of the unit price of fruits and vegetables so that the unit price will not be lost in the event of a power failure. SIM800A short message sending circuit and control part realize remote short message sending and transmission through UART communication.
3 System Programming System programming is mainly divided into system control end program design and system recognition end program design. The overall block diagram of the system programming is shown in Fig. 2.
Fig. 2. Overall block diagram of system programming
The system control terminal sends the taking pictures command “start” to the system recognition terminal through uart1, after receiving the instruction, the system recognition terminal performs photo recognition, and returns the recognition result through uart1. The system control end sends data packets of fruit and vegetable sales records to the system identification end through uart1. After the system identification terminal receives the data packet containing the fruit and vegetable sales record, it parses the data packet, and stores it in the database after the parsed is completed.
206
C. Yang and Y. Xu
3.1 System Control Terminal Design The program design of the system control terminal is mainly to realize the initialization of each module of the hardware, realize the driver of LCD screen, AT24C02 storage module, serial communication program, GSM short message sending program and electronic weighing program. The program design flow chart of the system control terminal is shown in Fig. 3.
Fig. 3. Program design flow chart of the system control end
After power-on, the system control terminal program first completes the initialization of the hardware. After the initialization is completed, the system enters the fruit and vegetable automated sales mode by default. In this mode, put the fruits and vegetables that need to be identified on the fruit and vegetable vending machine, the system will determine that there is identify objects on the fruit and vegetable vending machine. The system will wait for the customer to press the “identify” button, after the “Identify” button is pressed, the system will send the information to start identifying fruits and vegetables to the system identification terminal through serial communication. The control end will wait for the recognition end to return the recognition result through serial communication, according to the returned results, display the corresponding fruit and vegetable type information list [3] or the information that the identification failed on the interface. In the fruit and vegetable automated sales mode, if the customer presses the “Pay” button, it will jump to the “WeChat Pay” interface. Customers can choose to confirm the payment or return to the automated fruit and vegetable selling mode interface, or press the “Alipay” button to jump to the Alipay payment interface [4], there are “Back” button, “WeChat” button and “Confirm” button, the function is similar to the WeChat payment interface. If the customer presses the “Next Page” button, it will jump to the “Manual Sales” interface, and in this interface, there is a numeric keyboard to manually enter the unit price of fruits and vegetables, and use it as an ordinary electronic scale, or when you
Design and Implementation of Fruit and Vegetable Vending Machine
207
press the “previous page” button, it will return to the previous interface, When you press the “Next Page” button on this page, the system will enter the “Administrator” interface, and the administrator can enter the management mode by entering the number password through the numeric keyboard of the interface. The prices of various fruits and vegetables can be modified through the interface numeric keyboard in the administrator mode, and the administrator’s password can be modified by clicking the “modify” button. 3.2 System Recognition Terminal Program Design The system recognition terminal program design mainly realizes driving the camera to take pictures and saves, the serial communication program design, the realization of image uploading to Baidu Smart Cloud [5] for image classification and recognition, and the storage and reading of the database MySQL. The database MySQL information is displayed on the web page. The program design flowchart of the system recognition end is shown in Fig. 4.
Fig. 4. Program design flow chart of the system recognition end
After the Raspberry Pi 3B system is powered on and started, wait for the system to connect to WiFi. The system program first imports the required modules, and executes the main program after initializing each module. The system waits for the control terminal serial port to send a signal to start identification. After the system receives the signal to start recognition, it executes the camera photographing program, saves the photographs taken to a specified address, and replaces the photographs saved in the last photograph. Waiting for the execution of the camera program to be completed, the system will execute the recognition program, that is, call the API interface of Baidu Cloud, upload photos converted into base64 encoding format to Baidu Smart Cloud through POST request, and
208
C. Yang and Y. Xu
return Baidu Smart Cloud’s image classification and recognition results. The program compares the classification and recognition results returned by the judgment with the pre-stored fruit and vegetable types to determine whether it is correct, If it is correct, the serial number will be sent to the system control terminal through the serial port, if it is not correct, it will send the identification error information to the system control terminal through the serial port, and wait for the next identification signal. If the system receives a successful payment from the customer. The system will execute the MySQL database program to extract the sales information sent by the system control terminal through the serial port.At the same time, the data is classified, and the classified data is stored in the MySQL database. On the host computer, the data stored in the MySQL database of the system can be read out in the form of a dictionary. On the host computer, set up a local server [6], read data from the MySQL database, and then display it on the web page in the form of a table [7].
4 Overall System Debugging After completing the software and hardware design of the fruit and vegetable vending machine, it is necessary to perform functional tests on recognition accuracy and communication reliability. We will conduct a hundred use tests on 10 kinds of fruit and vegetable products, and all of them can communicate and trade normally, but some fruits still have insufficient recognition accuracy. Test the fruit and vegetable type recognition module of the system, and place the type of fruit and vegetable that needs to be recognized on the fruit and vegetable vending machine, Press the “Start Recognition” button and wait for the recognition result to return. After the identification is completed, the fruit and vegetable category, unit price, total price, and payment QR code will be displayed. The precision test results of fruit and vegetable recognition are shown in Table 1. Table 1. Fruit and vegetable identification results table Types of fruits and vegetables recognized
Recognition times
Correct times
Number of errors
Apple
100
100
0
Sydney
100
100
0
Banana
100
100
0
Crispy Pear
100
100
0
Water spinach
100
99
1
Carrot
100
100
0
Lettuce
100
98
2
Celery
100
100
0
Mango
100
100
0
Bitter gourd
100
99
1
Design and Implementation of Fruit and Vegetable Vending Machine
209
To test the system administrator mode, the system enters the administrator mode. When the administrator enters the correct password, he will enter the interface to modify the price of fruits and vegetables. The administrator can input the unit price of fruits and vegetables to be modified through the numeric keyboard on the interface. The fruit and vegetable price modification interface is shown in Fig. 5.
Fig. 5. Fruit and vegetable price modification interface
210
C. Yang and Y. Xu
5 Summary and Outlook The fruit and vegetable vending machine is designed according to the current development needs of smart farms, and can realize basic automated vending functions. However, there are still some shortcomings, such as the low accuracy of identifying individual types of fruits and vegetables; The anti-theft and anti-modification function of the system is not realized; The system uses a single process, which does not achieve the maximum utilization of the hardware. In the future, we will continue to improve the accuracy of vegetable and fruit recognition, and through machine vision technology to monitor and protect equipment safety. In addition, we will further optimize the performance of the vending machine in software design, thereby reducing hardware costs. Funding Statement. The authors are highly thankful to The project to improve the basic scientific research ability of young and middle-aged teachers in colleges and universities in Guangxi Autonomous Region (NO. 2021KY0617).
References 1. Zhang, J., Wang, Z., Xing, X.: Design of automatic fruit and vegetable vending system based on machine vision. Instrum. Technol. 2020(04), 7–10+16 (2020) 2. Cui, P.: Vending machine based on single chip microcomputer. China New Telecommun. 21(23), 81 (2019) 3. Li, X.: Research on Development Platform of Human-Computer Interaction Interface for Embedded System. Yantai University (2012) 4. Wen, G.: Design and implementation of vending machine IoT platform based on mobile phone QR code payment. Central South University (2013) 5. Chang, C., He, L., Wu, X., et al.: Baidu Smart Cloud ABC fully empowers smart cities to the future. Artif. Intell. 000(006), 76–87 (2019) 6. Zhang, X.: How to use a local computer to build a web site server. Fujian Comput. 2014(01), 184–185 (2014) 7. Yan, Y.: Web design skills-form use skills. Comput. Dev. Appl. 2006(4), 58–60 (2006)
Design and Implementation of License Plate Recognition System Based on Android Chengjun Yang and Ling Zhou(B) Hechi University, Yizhou 546300, China [email protected]
Abstract. This study aims to solve immovability and inflexible image acquisition of traditional license plate recognition systems. This paper proposes an Android-based license plate recognition algorithm system. The system can recognize license plates on smartphones. In this paper, the original image of the license plate is collected by the mobile phone camera, and the RGB color model is converted. Then the image is gray processing, global threshold binarized, Canny edge detection, and morphological processing. Secondly, perform projection positioning on the license plate area in the image, license plate correction, license plate color recognition, vertical projection character segmentation, and character recognition. Finally, get the license plate character information, color, and license plate recognition time. In this paper, the license plate recognition algorithm system can send the characters, colors, and recognition time of the license plate to the recorder’s mobile phone utilizing SMS. The simulation test results show that the license plate recognition algorithm system has high recognition accuracy. Keywords: Android · License plate recognition algorithm system · Character recognition
1 Introduction Because the traditional license plate recognition system has limited flexibility, convenient and movable license plate recognition equipment has become an important development direction. But mobile license plate recognition equipment is often based on FPGA, DSP, and other platforms. Need to rely on some specific hardware support and requires higher costs. For example, in 2012, Xu Wei [1] have successfully transplanted the ETC system to the Android system, but the system has many shortcomings, such as low positioning and recognition rate; In 2013, Chen Wei et al. [2] adopted a template matching method based on Euler number to improve the character recognition rate. In 2017, Z. Selmi et al. [3] proposed a method to create a license plate recognition database, but this method does not apply to Chinese license plates. In 2018, Yiwen Luo et al. [4] proposed a positioning method based on character edges, which can locate the image of a motor vehicle license plate in a complex background. This paper combines the advantages and disadvantages of previous research on license plate recognition and proposes a system that can perform license plate recognition © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 211–219, 2022. https://doi.org/10.1007/978-981-16-6554-7_25
212
C. Yang and L. Zhou
on smartphones. The system can recognize the license plate information by preprocessing the image to get the shape of the license plate. Secondly, perform the projection positioning of the license plate to determine the position of the license plate. Finally, the license plate characters are segmented, and template matching recognition is carried out, and this process realizes the recognition of the license plate information.
2 The Overall Algorithm Flow of the System The process of identifying license plates in this paper is shown in Fig. 1. License plate recognition can be divided into four major steps: The first is to perform image preprocessing on the original image of the license plate; Second, positioning in the processed image license plate area. Locate the license plate area and perform license plate correction, and then perform color recognition on the corrected license plate; Third, perform character segmentation on the corrected license plate image; Fourth, match the segment character template with the template in the character library. Finally, get the results of the license plate characters and colors.
Fig. 1. The overall algorithm flow chart of the license plate recognition system.
3 Image Preprocessing 3.1 Image Grayscale Algorithm Selection This paper adopts the weighted average method to process the gray processing of the image. Compared with the average value method and the maximum value method, this algorithm is more in line with people’s visual characteristics, and the processed image effect is more natural [6]. The effect is shown in Figs. 2 and 3.
Design and Implementation of License Plate Recognition System
Fig. 2. Original image
213
Fig. 3. Grayscale processed image
3.2 Image Binarization Algorithm Selection The image binarization processing [7] is to convert the processed grayscale image into a black and white effect image. The method is to set the pixel intensity value of the image after grayscale processing to the minimum value of 0 or the maximum value of 255. In this paper, the binarization of the image can further reduce the amount of image data, highlight the required target contour, and improve the efficiency of image processing. In this paper, the global threshold method is used to binaryize the grey image. After the research of global and local threshold methods. The program level of the local threshold method is more complicated and tedious than the global threshold method. The global threshold algorithm is relatively simple and simple, easy to implement, and the processing effect is relatively ideal. The effect is shown in Figs. 4 and 5.
Fig. 4. Grayscale image
Fig. 5. Binarized image
3.3 Edge Detection Algorithm Selection In this paper, the Canny edge detection algorithm [8] is used for image edge detection. This paper extracts 30 license plate images for edge detection. The place where the interference is greater is that the background of the picture is more complex, resulting in a lack
214
C. Yang and L. Zhou
of contrast between the license plate and the complex background information. The Sobel operator has a fast image processing speed and a simple method, but the detected edges of the image target are not detailed enough and appear blurry. After using the Canny operator, the edge of the license plate can be better outlined, which makes up for the deficiencies of the Sobel algorithm. The effects is shown in Figs. 6 and 7.
Fig. 6. Sobel operator effect diagram
Fig. 7. Effect diagram of Canny operator
Digital morphology image processing [9] can fill in the area target and find the target area using geometric characteristics. In this paper, expansion and corrosion operations in morphological processing are applied. The expansion operation is the effect of “enlarging” the target in the image, and the purpose is to connect the image of the target area as a whole. The erosion operation is the effect of “shrinking” the target in the image. The purpose is to remove the noise points in the image without deforming the image shape of the target area. In this paper, the two operations are combined to achieve smoothness and denoising in image. 3.4 Morphological Processing Digital morphology image processing [9] can fill in the area target and find the target area using geometric characteristics. In this paper, expansion and corrosion operations in morphological processing are applied. The expansion operation is the effect of “enlarging” the target in the image, and the purpose is to connect the image of the target area as a whole. The erosion operation is the effect of “shrinking” the target in the image. The purpose is to remove the noise points in the image without deforming the image shape of the target area. In this paper, the two operations are combined to achieve smoothness and denoising in image. (1) Corrosion operation: Expression (1): (1) S = X ⊗ B x, yBxy ⊆ X We set X in the expression as the unprocessed binary original image, and set B as the structural element. If point B_xy is included in X, write down point xy. The collection of xy points is called the result of X being corroded by structural element B Fig. 8. The corroded effect of the right image is obtained after X and B convolution calculation.
Design and Implementation of License Plate Recognition System
215
Fig. 8. Corrosion diagram
(2) Expansion operation: Expression (2): S = X ⊕ B x, yBxy ∩ X = ∅
(2)
We set X in the expression as the unprocessed binary original image, and set B as the structural element. If B_xy point intersects with X, then xy point is retained. The set of xy points is called the result of X being expanded by the structural element B. Figure 9, the expanded image target is a bit larger than the original image.
Fig. 9. Schematic diagram of expansion
4 License Plate Positioning This paper uses the projection positioning method [10] to locate the license plate. Use Hough transform [11] to correct the license plate. The positioning effect and correction effect are better. After locating the license plate area, the color of the license plate is recognized. The positioning and correction effects are shown in Figs. 10, 11 and 12. The license plate positioning test tests 102 license plate image samples. It is better to locate the license plate on the image where the tilt angle is not very large. The positioning effect is not good under a more complicated background and an image with an excessively large tilt angle. Therefore, excessive shooting angles should be avoided during the image acquisition process. The test results are shown in Table 1.
216
C. Yang and L. Zhou
Fig. 10. License plate location
Fig. 11. Inclined license plate
Fig. 12. The corrected license plate
Table 1. Different color license plate test results License plate color
Number of license plate images
Number of successful positioning
Positioning success rate
Blue
70
65
92.8%
Yellow
20
18
90.0%
Green
12
11
91.7%
5 Character Segmentation Test and Analysis The segmentation method used in this paper is the vertical projection segmentation method [12]. After locating the license plate, the license plate characters are recognized. The purpose of character segmentation is to improve the recognition rate of license plate information. The test results are shown in Figs. 13, 14 and 15.
Fig. 13. Binarization of license plates
Fig. 14. Vertical projection
Fig. 15. Segmentation effect diagram
In this segmentation test, 80 license plate pictures were segmented, and the effect was good. The reason for the imprecise segmentation is the influence of some special province
Design and Implementation of License Plate Recognition System
217
characters. For example, the segmentation effect of special characters such as “川” will cause errors. The reason is that a “川” character will project three discontinuous peaks and troughs during the segmentation process, which will result in inaccurate segmentation. In the later stage, a more effective segmentation method needs to be adopted to solve such problems. The test results are shown in Table 2. Table 2. Character segmentation test results License plate color
Number of test pictures
Number of successful segmentation
Segmentation success rate
Blue
50 sheets
47 sheets
94.0%
Yellow
18 sheets
17 sheets
94.4%
Green
12 sheets
11 sheets
91.7%
6 Application Design The design of this paper mainly applies to smartphones. The effect of the Android platform on software design [13] can be displayed in real time during design, which is convenient for the designer to adjust. When in use, it needs to be aligned with the vehicle license plate that needs to be recognized. Click the OK button after shooting. After clicking OK, the license plate recognition program will be called to recognize the license plate. After obtaining the license plate recognition result, the result will be displayed in the “Content” box. To set the function of sending SMS, you also need to apply for the information permission of the mobile phone. Enter the recipient’s mobile phone number, and then click the “Send SMS” button. The recognition result can be sent to the designated recipient’s mobile phone text message. The effect is shown in Figs. 16 and 17. This paper tests the stability and recognition accuracy of the application. The test environment was sunny and daytime. This test carried out license plate detection on 130 vehicles in actual scenes. Including 80 blue license plates, 20 yellow license plates and 30 green license plates. The results show that the success rate is highest when detecting private cars with blue license plates. The success rate of large-scale engineering vehicles with yellow license plates is relatively high. The factor that affects the accuracy is that there is too much soil and dirt on the yellow license plate, which causes inaccurate character recognition. When recognizing green license plates, the success rate is low. Because the green license plate color space is more complicated, it is mixed with green, white, white and green colors. Therefore, the color of the license plate cannot be recognized when the green license plate is recognized, but the character recognition is not affected. The recognition results are shown in Table 3.
218
C. Yang and L. Zhou
Fig. 16. Page after shooting
Fig. 17. Results display
Table 3. Application program test results Type of test
Test numbers
Number of successful
Success rate recognition
Blue license plate
80 sheets
76 sheets
95.0%
Yellow license plate
20 sheets
18 sheets
90.0%
Green license plate
30 sheets
25 sheets
83.3%
7 Summary and Outlook This paper studies the algorithms of license plate recognition, including image preprocessing, license plate location, license plate character segmentation, license plate character recognition and other algorithms. This paper uses the Android platform to realize the processing of license plate recognition and the design of the application program. The common blue, yellow, and green license plates can be recognized in color and license plate characters. After testing, the license plate recognition algorithm system in this paper has a higher recognition rate. However, there are still situations where the system cannot recognize vehicles with other shapes, such as the license plate detection effect of tricycles or motorcycles that need to be further improved.
Design and Implementation of License Plate Recognition System
219
Funding Statement. The authors are highly thankful to The project to improve the basic scientific research ability of young and middle-aged teachers in colleges and universities in Guangxi Autonomous Region (NO. 2021KY0617).
References 1. Xu, W.: Research and Implementation of License Plate Location and Segmentation Algorithm Based on Android Mobile Phone Platform. Huazhong University of Science and Technology (2012) 2. Chen, W., Cao, Z., Li, J.: Application of improved template matching method in license plate recognition. Comput. Eng. Des. 34(05), 1808–1811 (2013) 3. Selmi, Z., Halima, M.B., Alimi, A.M.: Deep learning system for automatic license plate detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). IEEE(2017) 4. Luo, Y., Li, Y., Huang, S., et al.: Multiple Chinese vehicle license plate localization in complex scenes. In: Vision and Computing (ICIVC). IEEE (2018) 5. Whitwam, R.: Google’s Android one program will make cheap phones less terrible, starting today. ExtremeTech.com 9, 6–7 (2014) 6. Zou, M., Lu, D.: Implementation of license plate character recognition algorithm based on improved template matching. Foreign Electron. Meas. Technol. (1), 59–61, 80 (2010) 7. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986) 8. Xiao, D.: Analysis of the application of mathematical morphology in image processing. Sci. Mosaic 000(005), 10–19 (2013) 9. Yang, G., Lin, N., Ji, X., Ye, Q.: License plate location method based on mathematical morphology and comprehensive color characteristics. J. Gansu Sci. 114–117 (2007) 10. Zhuo, J., Hu, Y.: Research on license plate location algorithm based on edge detection and projection method. Sci. Technol. Bull. 2010(3), 438–441 (2010) 11. Qu, Y., Yang, L.: Hough transform OCR image tilt correction method. J. Image Graph. Ser. A (2001) 12. Ran, L.: Segmentation method of license plate characters based on vertical projection. Commun. Technol. 2012(04), 89–91 (2012) 13. Zhang, Y.: Android4.X development is completely hands-on: build a complete AndroidApp example by hand. Tsinghua University Press, Beijing (2014)
Pseudo-block Diagonally Dominant Matrix Based on Bipartite Non-singular Block Eigenvalues Fangbo Hou(B) Jilin Agricultural Science and Technology University, Jilin 132101, Jilin, China
Abstract. With the development and popularization of computer technology, block technology has been widely used in the study of matrix theory. Special matrices under non-singular block eigenvalues based on two-part delineation, especially block diagonally dominant matrices, it plays a very important role in the research of practical problems such as calculation and engineering. The quasiblock diagonally dominant matrix class is a topic that has attracted the attention of many researchers in the fields of numerical algebra, economics, cybernetics, and matrix theory itself. It also has many applications in engineering and technology. It is important to explore the properties of these matrix classes. The theoretical value and practical significance of the purpose of this paper is to study the diagonally dominant matrix of pseudo-blocks based on the eigenvalues of non-singular blocks based on two-part delineation. Aiming at the existing matrix factorization recommendation algorithms that only consider scoring information, they lack interpretability and scalability. This article mainly considers item review information and applies relevant methods in natural language processing to these reviews. The learning idea of multi-modal and multi-view extracts topics and emotions respectively, and maps them to item feature matrix and user preference matrix, and conducts simulation experiments. Experiments show that the model has reached the current mainstream matrix decomposition considering comments. The performance of the recommendation algorithm also makes the item features and user preferences semantically interpretable. Experiments show that the proposed pseudo-block diagonally dominant matrix based on the two-part delineation of non-singular block eigenvalues achieves the accuracy of the current matrix factorization algorithm considering reviews, and at the same time is better in the interpretability of the model, especially in terms of the interpretability of user preferences. Keywords: Bipartite partition · Non-singular block eigenvalues · Diagonal dominance · Matrix theory
1 Introduction In the research of matrix and related theories, the convenience and effect brought by the block technology of matrix prompt people to study the block matrix itself [1, 2]. In matrix
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 220–228, 2022. https://doi.org/10.1007/978-981-16-6554-7_26
Pseudo-block Diagonally Dominant Matrix Based on Bipartite
221
analysis and numerical algebra, diagonally dominant matrices are one of the important research topics. The research results obtained are very good in computational mathematics, control theory, mechanics, electricity, information science, management science and engineering [3, 4]. The pseudo-block diagonally dominant matrix, as a special kind of matrix, has a wide range of application background. Using the diagonal advantage of matrix in cybernetics, the design of a multivariable system can be transformed into a design problem of a univariate system, which is a multivariable system. The design opened up a new path and brought the revival of the frequency domain method [5, 6]. If the diagonal elements of a symmetric matrix are all positive and a generalized diagonally dominant matrix, then it is a symmetric positive definite matrix [7, 8]. In the study of pseudo-block diagonally dominant matrix based on two-part eigenvalues of non-singular blocks, many scholars have studied it and achieved good results. For example, Tan Z uses a numerical inequality to obtain a list of discs, so that any disc area can contain all the eigenvalues of the matrix, and obtain some other upper bounds of the matrix expansion through the estimation and positioning of the matrix eigenvalues [9]. F Merlevede uses the correspondence between polynomials and adjoint matrices, the FROBENIUS norm of the matrix, etc., to estimate and locate the characteristic roots of real-coefficient polynomials, and use the disk to locate and analyze the roots of complex-coefficient polynomials [10]. This article is mainly to improve the results in the literature, give proofs in theory, and give numerical examples for comparison. Then use this result to estimate the upper bound of the infinite norm of the inverse of the diagonally dominant matrix, and use numerical examples to illustrate its effectiveness [11].
2 Quasi-block Diagonally Dominant Matrix Based on Bipartite Non-singular Block Eigenvalues 2.1 Quasi-block Diagonally Dominant Matrix Based on Two-Part Eigenvalues of Non-singular Blocks The pseudo-block diagonally dominant matrix based on two-part delineation of nonsingular block eigenvalues proposed in this paper can extract emotional information from comments and map it to user preference features through a conversion function. This user preference feature can be Used for the initialization of user preference features in the interpretable matrix factorization LSTMF model. Analysis of Information Contained in Article Reviews. The traditional matrix decomposition only considers the scoring part of the user’s historical behavior, while ignoring the content of comments rich in semantic and emotional information. In recent years, with the advent of the Web2.0 era, user generated content (UGC) has become more and more abundant. In e-commerce websites, in addition to user rating information, user comments are also included. Generally, when users choose whether to purchase an item or score an item, a considerable part of them will refer to the existing reviews of the item, which also reflects the value of the review information in the recommendation. A user’s review usually contains two aspects of content. On the one hand, the user’s emotion expressed in the content of the review is contained in words with emotional polarity,
222
F. Hou
reflecting the user’s preferences. On the other hand, the user’s the content of the review also shows attention to certain attributes of the item, that is, the hidden information that the item may belong to in the traditional matrix decomposition. This hidden information can be expressed in the item review, such as color, size, Appearance, etc. Text is usually difficult to process due to its unstructured nature. The user preference features and item features of traditional matrix factorization are both extracted from the rating matrix. When fusing review information, there are two problems. The first is how to extract Unstructured text information extracts structural features, and the second is how to combine these features with traditional matrix factorization. In response to the first problem, this article proposes two solutions to extract item features from the content of the review text. The first is to draw on the topic model of RLDA. The topic model RLDA is a Bayesian hierarchical model and also a kind of Probabilistic generative models are widely used in both academia and industry. The model can extract topics from documents. This article uses RLDA to extract topics from review documents and uses these K-dimensional topics as item features. The second solution method KAE algorithm, first uses a keyword extraction technology to extract keyword information from comments. Keyword extraction was applied to search technology in the early days. This article introduces it to the scene of article reviews. Through keyword extraction, an article content vector is generated. On this basis, the deep feature encoding capability of feature encoding in deep learning is used. The item content vector is processed by a multi-layer auto-encoder to generate a Kdimensional keyword vector that can represent the item. This keyword vector is also semantic, and the final K-dimensional keyword vector is also used as an item feature. The purpose of the pseudo-block diagonally dominant matrix under the non-singular block eigenvalues based on two parts is to obtain a K-dimensional item representation vector from the review. The two functions are the same, but they are extracted based on the KAE algorithm. Features are deep-semantic features, so the expressive ability is stronger. In addition, in order to obtain user preference characteristics, based on the observation of comments, user emotions are usually presented in the form of emotional polarity words in the comments, due to the consideration of the fact that user preferences are real values in matrix decomposition, it is necessary to calculate the user’s emotional strength, rather than simply classify it as positive or negative emotions. Item Feature Extraction Algorithm. Unlike the hidden factor model that extracts the hidden features of items from the score, these hidden features extracted from the score are usually not interpretable. The nonlinear interpretable sentiment topic matrix factorization (Logistics Sentiment Topic Matrix Factorization, LSTMF) proposed in this article the model first extracts the topic features or keyword features with semantic information from the article reviews as the article characteristics. Shallow Semantic Feature Extraction Algorithm RLDA. In the field of item feature extraction based on reviews, the pseudo-block diagonally dominant matrix under the non-singular block eigenvalues based on two parts is mainly a way of combining the topic model with the training process of the matrix factorization model. This simple combination of two the combination of the training process of the user makes the item features lack of semantic interpretability.
Pseudo-block Diagonally Dominant Matrix Based on Bipartite
223
When traditional LDA is applied to item reviews, there is an obvious problem. The review text is usually too short, while the topic extraction effect of LDA on short texts is not very good. In order to apply LDA to item reviews, extract from the comments. The hidden features of items, this article constructs a review document to describe items. When users comment on items, in most cases they will consider multiple aspects, such as price, size, color, etc. These aspects are usually limited but extremely valuable. Key user considerations are extracted from lengthy reviews. The factor becomes extremely important. On the other hand, the content of these comments is usually untagged, which determines that the RLDA algorithm is used to extract the topics that users care about from the comment content, which becomes simple and feasible. Due to the massive nature of user reviews, this puts forward performance requirements for the topic extraction algorithm, and the RLDA algorithm is efficient and can be parallelized when using Gibbs sampling training, which can easily deal with such massive data. Deep Semantic Feature Extraction Algorithm KAE. In the related work of recommendation systems considering reviews, most of the research has focused on combining topic models with recommendation systems, but topic models usually can only extract shallow features from reviews, and this shallow representation is not very good. Indicates the characteristics of the item. In view of this, in order to extract a deep representation of an item from an item review, this article combines keyword extraction technology with an autoencoder, and proposes an algorithm KAE (Key words Auto Encoder) for extracting deep semantic features from reviews). Similar to the purpose of the RLDA algorithm proposed in this article, the KAE algorithm proposed in this article extracts a K-dimensional item feature from item reviews, and combines the characteristics of the review corpus. In this process, two problems will be encountered. First: single review content the length of is very limited. It is not meaningful to extract keywords from it through statistical methods, namely TF-IDF. This problem can be solved by constructing article review documents. Compared with treating each item review as a document, all reviews corresponding to each item are merged into one document. This document can be regarded as a description of the item. In addition, different users’ evaluations of the same item may be quite different or even appear. Therefore, if only a single review content is used as a document, it may bring conflicts in the input information. Combining multiple reviews of an item into one document, through keyword extraction technology, can be avoided to a certain extent the contradiction information of, extract the mainstream user opinions, the comments of the items can be mapped to the hidden feature space of the items, and the semantic information of the item feature vectors in the matrix factorization model can also be enhanced. So here, all the comments corresponding to each item are merged. Second, the output vector dimension of TF-IDF is relatively high, and what matrix decomposition needs is a K-dimensional feature. This problem can be solved by an autoencoder. Extraction Algorithm of User Preference Features. There is currently no relevant research on how to extract user sentiment information from comments and map it to user preferences. In view of this, this article combines the dictionary-based sentiment
224
F. Hou
analysis method, and proposes an algorithm that extracts user sentiments from comments and converts them into user preference features, TSTUP (Topic Sentiment to User Preference). The user’s preference is often presented in the comments with emotionally polarized words. The user’s comments appear in the comments such as “favorite”, “very well”, “good”, “nice” and other emotional words, expressing the user’s attitude towards the item. The sentiment of is positive, that is, the user likes the item, correspondingly, the user will also prefer the hidden category to which the item belongs. Based on this observation, the comment content needs to be processed first. This article adopts a dictionary-based the sentiment analysis method extracts the user’s sentiment from the comment, and then maps it to the user’s preference feature vector through a conversion function. User Sentiment Analysis. Before extracting user preference features, first analyze the existing problems in the matrix factorization model that considers comments. When users discuss a certain attribute more frequently in comments, it means that users have a higher degree of preference for that attribute, but There is a problem with this hypothesis. When users are discussing attributes, it may be positive emotions or negative emotions. When users discuss this attribute more with positive emotions, it means that users prefer the attribute. The higher the degree, but when the user discusses the attribute with negative emotions, it means that the user’s preference for the attribute is lower. Dimension Conversion. The interpretable matrix factorization algorithm proposed in this paper requires a K-dimensional user preference feature and a K-dimensional item feature vector. However, through sentiment analysis, the user’s expression in comments can only be included in a single record. The emotional intensity is calculated, and it is not a K-dimensional user preference feature vector. Based on this, it is necessary to design a conversion function to map the emotional intensity expressed by the user to the item into the K-dimensional hidden space. Record the purchase set corresponding to each user as B = {i1,…,in}, then the emotional set corresponding to each user is S = {s1,…,sn}, in order to get the user’s emotions for the K topics, it is natural to think of using a conversion function to calculate the user’s emotions for K topics.
2.2 Calculation Formula Estimating the lower bound of the matrix rank to obtain a disk area containing all the eigenvalues of the matrix is shown in formula (1): 1 2 2 trA |trA| n − 1 2 ≤ λ− A − n n n
(1)
Pseudo-block Diagonally Dominant Matrix Based on Bipartite
225
Each disk can contain the eigenvalues of the matrix. For matrix A, its eigenvalues are located in the following column of disks, as shown in formula (2): |λk − mλ | ≤ γp , γp =
(n − 1)2p (λi − mλ )2p 2p (n − 1) + (n − 1) i=1 n
1
2p
(2)
Use numerical inequality to obtain a list of discs, so that any disc area can contain all the eigenvalues of the matrix, and apply the properties of the convex function to obtain a list of discs, as shown in formula (3): (λi − λ)2r ≤
n(n − 1)2r−1 m2r 1 + (n − 1)2r
(3)
3 Experiments Based on the Study of the Pseudo-block Diagonally Dominant Matrix Under the Non-singular Block Eigenvalues Under the Two-Part Division Similar to the work of other matrix factorization recommendation systems that consider reviews, this article selects the Amazon data set as the data source for this experiment. The data set comes from a real Amazon shopping website and is collected and published by McAuLey et al. Each sub-data set corresponds to a kind of items on sale on Amazon.com, including art, cars, clothing, etc. These classifications provide great help in evaluating the accuracy of the recommendation system in different scenarios. It can be seen that the original data contains 10 dimensions of information such as item ID, item name, price, user ID, rating, comment, etc. Since the four dimensions of user, item, rating, and comment information are used in this article, these four categories the information is extracted.
4 Experimental Analysis Based on the Study of the Pseudo-block Diagonally Dominant Matrix Under the Non-singular Block Eigenvalues Under the Two-Part Delineation 4.1 Average Ratings on Amazon’s Various Sub-Data Sets This article counts the scores of each data set, and the average score is shown in Fig. 1. This article counts the score distribution on each sub-data set, and selects a number of results to be displayed in graphical form. From the figure, it can be easily seen that the software data set has a relatively uniform score distribution, while the other data sets have high scores. These statistical description data bring convenience in the selection of algorithm parameters and result analysis.
4.15
4.12
4.12
3.95
4.15 3.2
4.03
4.31
4.18 3.38
4.41
4.3
4.08
3.95
4.21
4.11
4.05
3.52
4.15
4.12
AVERAGE RATING
4.1
F. Hou
4.13
226
CATEGORY
Fig. 1. Scoring for each data set.
4.2 PSNR Performance Comparison of Block Eigenvalue Inclusion Domain Complex Matrix Algorithm In this paper, the peak signal-to-noise ratio is used as the objective evaluation formula to conduct an experimental analysis on the pseudo-block diagonally dominant matrix based on the two-part delineated non-singular block eigenvalues. The peak signal-to-noise ratio mainly depends on the comparison between the original data and the calculation. The data after the target is removed is of little significance to the original comparison. Therefore, the image of the target removed is mainly judged by people’s visual senses. The PSNR performance comparison data of the pseudo-block diagonally dominant matrix under the eigenvalues of the two-part non-singular block is shown in Table 1: Table 1. PSNR performance comparison of repair algorithms. Algorithm
Picture 1
Picture 2
Picture 3
Picture 4
CRIMINISI algorithm
29.5
31.6
30.2
27.5
FROBENIUS algorithm
30.2
32.5
31.4
28.2
CAYLEY algorithm
29.3
32.7
32.6
28.5
Experimental algorithm
30.4
33.3
33.2
31.2
Pseudo-block Diagonally Dominant Matrix Based on Bipartite
227
Algorithm
It can be seen from Fig. 2 that the peak signal-to-noise ratio PSNR of the pseudoblock diagonally dominant matrix under the eigenvalues of the two-part non-singular block is larger than the PSNR value obtained by other algorithms, indicating that the data calculated by the algorithm is compared with the original. The data is the closest, and the calculation effect is better. The average value of the peak signal-to-noise ratio of the algorithm in this paper is above 30 dB, while the other algorithms are all less than 30 dB.
Experimental algorithm CAYLEY algorithm FROBENIUS algorithm CRIMINISI algorithm 0
10
20
30
40
PSNR/dB Picture 4
Picture 3
Picture 2
Picture 1
Fig. 2. PSNR performance comparison of repair algorithms.
5 Conclusions With the development of information technology, while massive information brings convenience to people’s lives, it also brings about the problem of information overload. In order to help people quickly locate the part of their interest from the large amount of information, a recommendation system came into being. As a representative of the recommendation system, the recommended system has been paid attention to by the majority of researchers since its inception. Aiming at the problem that the matrix factorization recommendation model lacks interpretability, based on the above features with semantic information, this paper proposes an extensible and interpretable matrix factorization recommendation model LSTMF, which uses semantic item features extracted from reviews and user preference characteristics make the model semantically interpretable. Subsequently, the scalability of the model proposed in this paper is discussed from the information layer, feature layer, and model layer, which shows that the model has high scalability. A scalable and interpretable matrix factorization recommendation system is implemented. Acknowledgements. This research has been financed by Jilin Province Education Department’s “Thirteenth Five-Year” Science and Technology Project: Research on Block Eigenvalue Inclusion Domain of Block Composite Matrix under Two Parts. (JJKH20180731KJ).
228
F. Hou
References 1. Solís-Pérez, J.E., Gómez-Aguilar, J.F., Escobar-Jiménez, R.F., Reyes-Reyes, J.: Blood vessel detection based on fractional Hessian matrix with non-singular Mittag-Leffler Gaussian kernel. Biomed. Sig. Process. Control 54, 101584 (2019) 2. Wen, G., Li, C.: Research on hybrid recommendation model based on PersonRank algorithm and TensorFlow platform. J. Phys. Conf. Ser. 1187(4), 042086 (2019) 3. Qi, X., Hou, J.: Nonlinear entanglement witnesses based on continuous-variable local orthogonal observables for bipartite systems. Quantum Inf. Process. 15(2), 741–759 (2015). https:// doi.org/10.1007/s11128-015-1156-0 4. Singh, B.K., Arora, G., Singh, M.K.: A numerical scheme for the generalized Burgers-Huxley equation. J. Egypt. Math. Soc. 24(4), 629–637 (2016) 5. Ammar, A., Fakhfakh, S., Jeribi, A.: Stability of the essential spectrum of the diagonally and off-diagonally dominant block matrix linear relations. J. Pseudo-Differ. Oper. Appl. 7(4), 493–509 (2016). https://doi.org/10.1007/s11868-016-0154-z 6. Gao, S., Chen, P., Huang, D., Niu, Q.: Stability analysis of multi-sensor Kalman filtering over lossy networks. Sensors 16(4), 566 (2016) 7. Sun, F., Qi, J.: A priori bounds and existence of non-real eigenvalues for singular indefinite Sturm-Liouville problems with limit-circle type endpoints. Proc. Roy. Soc. Edinb. Sect. A: Math. 150(5), 2607–2619 (2020) 8. Ilchmann, A., Reis, T.: Outer transfer functions of differential-algebraic systems. Esaim Control Optim. Calc. Var. 2015(2), 391–425 (2016) 9. Wang, X., Wei, Y.: H -tensors and nonsingular H -tensors. Front. Math. China 11(3), 557– 575 (2016). https://doi.org/10.1007/s11464-015-0495-6 10. Li, S.E., Qin, X., Zheng, Y., Wang, J., Li, K., Zhang, H.: Distributed platoon control under topologies with complex eigenvalues: stability analysis and controller synthesis. IEEE Trans. Control Syst. Technol. 27(1), 206–220 (2019) 11. Xing, Z., Qu, J., Chai, Y., Tang, Q., Zhou, Y.: Gear fault diagnosis under variable conditions with intrinsic time-scale decomposition-singular value decomposition and support vector machine. J. Mech. Sci. Technol. 31(2), 545–553 (2017). https://doi.org/10.1007/s12206-0170107-3
Research on Assistant Application of Artificial Intelligence Robot Coach in University Sports Courses Hongtao Pan(B) Jingdezhen University, Jingdezhen 333000, China
Abstract. With the development of China’s economy, the development of science and technology changes with each passing day. In various industries, the use of computers is becoming more and more common. The cloud platform is built on virtual technology, and people can use the data stored in the computer without being restricted by space and time. Multimedia network teaching platform plays an important role in college teaching. The multimedia network teaching platform in sports has also been paid more and more attention. This paper analyzes the application of the cloud platform multimedia network teaching platform in college physical education, and proposes the use of artificial intelligence robots to improve the teaching quality of physical education courses, which is conducive to the development and promotion of scientific and technological achievements. The use of cloud platforms can make resources fully shared and teaching methods more vivid and enriched. In addition, this is conducive to improving the level of teaching. Keywords: Genetic algorithm · College physical · Cloud platform · Multimedia network teaching platform
1 Introduction Computers have become very popular in our country. Colleges and universities have ushered in a new era of combination of science and technology and teaching. The multimedia network teaching can make the knowledge of physical education more obvious and vivid, and more intuitive to show students the study of sports [1]. The popularization of multi-media teaching in colleges and universities has broken the traditional teaching mode of physical education. Multimedia teaching can improve students’ understanding of sports knowledge. The use of this model makes physical education more than 45 min of class time. Students can study at any time [2]. However, the application of multimedia has not completely replaced the traditional PE teaching mode, but is complementary with the traditional teaching. These applications allow students to learn what they want without a teacher’s timely guidance. In addition, it is very convenient for college students to obtain teaching resources. The popularization of multimedia technology also solves the shortage of teaching resources and the shortage of resources [3]. In a word,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 229–237, 2022. https://doi.org/10.1007/978-981-16-6554-7_27
230
H. Pan
the application of multimedia network platform in colleges and universities has brought great convenience to both students and teachers. Based on cloud platform, the maximum sharing of these resources can be realized, and the benefit maximization can be realized in the limited funds. The data of these teaching resources can be focused on the cloud platform. The sharing of resources among universities can also be realized [4]. The cloud platform has considerable advantages. For example, the utilization of teaching resources is fully satisfied. The use of cloud platform can easily and easily search physical education courses [5]. This paper is an application of multimedia network teaching platform based on cloud platform in college physical education teaching, which can optimize China’s physical education system and benefit education reform [6].
2 State of the Art Genetic algorithm was first proposed in the United States in 1975. This algorithm is based on biological research and has evolutionary characteristics in biology. After being proposed, genetic algorithm is often used to solve the problem of intelligent technology. The genetic algorithm has many advantages, and it can search Global, it’s an algorithm that has the uncertainty of simulating natural biology. It is highly adaptable. The steps to describe the problem are simplified. Therefore, it is widely recognized in the field of science (Zhao N et al. 2015) [7]. Genetic algorithm brings great convenience to the users, and its ideas, running procedures and implementation steps are easy to learn. Its basic idea is to simulate the process of biological evolution, eliminate inferior results and get the best results. At the same time, it has good adaptability and can be applied to different types of functions. Currently, it is widely used in many fields such as automation, computer science and social management (Li X et al. 2017) [8]. In China, the research on genetic algorithm starts late, and the understanding of genetic algorithm is not very mature, but the development level is not backward (Hu H et al. 2016) [9]. Although it has become a hot spot in the field of intelligence, its research and development is still insufficient. It also has many undeveloped aspects. This paper is based on the application of multimedia network teaching platform in college physical education of the combination of genetic algorithm and cloud platform, and combine theory with research. The theory is connected with practice and put into practice. Additionally, it makes full use of the advantages of all parties, innovates teaching mode and improves the quality of physical education in China (Zeng L et al. 2017) [10].
3 Methodology 3.1 The Calculation Procedure and Overall Planning Process of Genetic Algorithm In the process of transmission, genetic algorithm was first developed by the United States. The model of genetic algorithm is put forward based on the convenience of practical operation, and the overall function of genetic algorithm is operated and improved on the basic control data of the model. Genetic algorithm can greatly improve the AI operation
Research on Assistant Application of Artificial Intelligence Robot
231
model. After analyzing all the computational results of the genetic algorithm, the best results and the most perfect data need to be selected. Considering the possible errors of each of our calculation results will be helpful for us to build a multimedia network teaching platform based on genetic algorithm. Under the help of this system model, the traditional biology genetic data type can be built up quickly. And then consider the actual test process according to the transformation and combination of biological genetic factor. After completing the model of the basic building system of genetic algorithm, the problem of how to use the genetic algorithm model need to be solved to perform the basic operation. In the process of building the system model, it is mainly to consider the different effects of different operations of the system model. In the operation of the data entry interface, the data transformation can be performed is stored as the overall data generated by the model. After storage is carried out, further calculation and simulation of genetic algorithm can be carried out. Using different data types in the light of different databases can reduce our computing gap. As soon as computing the storage space, the data build process that we need to do is to extract the information stored in the database. Once the extraction is completed, different information is processed separately according to the signal simulation method. Then the following overall operation process of the system is carried out according to the result of processing. After completing the data processing, the actual model operators will gradually optimize and process the calculation of the multimedia network teaching platform of genetic algorithm (Fig. 1).
Fig. 1. Architecture of the system.
In building complete multimedia network teaching platform based on genetic algorithm, and the research of multimedia network teaching platform in the sports teaching practice, we need to use Internet technology to login and management interface settings. After the login and management interface is set up, all the information we have collected need to be processed and summarized. The collected and processed information is stored in database. In the process of building the system model based on genetic algorithm,
232
H. Pan
we need the main login interface and information feedback processing skills, which is the problem we should pay attention to. In line with genetic algorithm, we can design multimedia physical education platform. Then, according to the teaching function of the physical education platform, the actual login interface selection should be made reasonably. The login interface is mainly the course selection system, the course classification, and the opinions and suggestions on sports teaching. All the information can be used to give us actual feedback through the system model. The moment the feedback is completed, we should plan and analyze the overall calculation data. As soon as the analysis is completed, we input the data of the system model we actually designed into the model library. The model is shown below (Fig. 2).
Fig. 2. Function modules of the system.
We should consider the influence of many factors in the construction of multimedia network teaching platform based on genetic algorithm in colleges and universities. In particular, we use genetic algorithm to build models, and we need to have a deeper understanding of genetic algorithm, so that we can ensure that our building process will not be wrong. After analyzing the results of the input, we find that students have their own opinions and opinions on the teaching methods in the course of physical education. In the process of making the questionnaire, according to the feedback information of students, we get the plan of course selection and course that most students like, which requires further optimization of the model to meet the students’ requirements. We can further optimize the model system according to the feedback information of students. On the basis of the information in the database and the student information we obtained from the questionnaire, we can improve and optimize the login interface. In accordance with the above reference scheme, we have a preliminary understanding of the construction of multimedia network teaching platform based on genetic algorithm. Secondly, what we need to do is to control the overall data impact of genetic factors, which is very important in the optimization and testing of genetic algorithm. According to the process of optimizing database information, the storage and processing of multimedia database are carried out. This is one of the main solutions to solve the problem of building Internet teaching platform based on genetic algorithm. In the system simulation test, we need to continuously integrate and filter the data as well as the control and selection of genetic
Research on Assistant Application of Artificial Intelligence Robot
233
factors suitable for genetic algorithm. All of these have helped us to build the Internet multimedia database. 3.2 A preliminary Study on Constructing Multimedia Network Teaching Platform Based on Genetic Algorithm In the foregoing we have a detailed understanding of the computational steps of genetic algorithms. Next, we will build the Internet multimedia network teaching platform on the basis of the genetic algorithm. And in the construction of the Internet multimedia network teaching platform, genetic algorithm is gradually optimized to make it more convenient to operate and complete the computing target. Then, the actual model will be constructed according to the selection of genetic factors in the genetic algorithm, and the actual model constructed is shown below (Fig. 3).
Fig. 3. Structure diagram of neural network
After the construction of the operation model is completed, we can make a reasonable comparison and analysis of the overall computing performance of the operation model. Before the process of comparison and analysis begins, the narration of genetic algorithm calculation program will be carried out, so that we can have a general understanding of the computational process of genetic algorithm. Variations in nature can produce individuals with higher fitness. But there may also be individuals with lower fitness. In a word, variation is not necessarily beneficial, as is the variation in genetic algorithm. There are auxiliary steps in the algorithm, which are carried out on the premise of crossing. In order to maintain the consistency of nature, variation also is incorporated into the algorithm, although it is an auxiliary step. So we set up the calculation formula of variation as follows: Set t represents the new value X1t+1 of the individual X1t = x1 , x2 , · · ·xn variant of n dimension vector, and select a variable element x according to the characteristics of uniform distribution. It will mutate into a uniform random number r. r if i = j , j ∈ {1, 2, · · ·n} Evenly random selection (1) xi = x if i = j
234
H. Pan
To calculate the fitness, we refer to the fitness function as: fi = f int ness(popi (t))
(2)
In this paper, Monte Carlo method is used to select operator. This method is also called roulette wheel selection, and its formula of selection probability is as follows: Pi =
fi N
(3) fi
i=1
Then use the approximate steepest descent method to update the offset value and weight: T W m (k + 1) = W m (k) − asm am−1
(4)
bm (k + 1) = bm (k) − asm
(5)
After k times network training, the weight matrix of m layer is W and b is the offset value of the m-th layer. Besides, am−1 is the output of the m-1-th layer, and sm is the error index of MM layer output, also known as the sensitivity index. Using geometric nonlinearity principle and constitutive nonlinear principle, the stiffness matrix can be expressed as: Ke is the stiffness matrix, [B] is the strain result, and [D] represents the constitutive matrix. Ke = [B]t [D][B]dv (6)
4 Result Analysis and Discussion After we have completed the construction of multimedia Internet operation platform based on genetic algorithm, we need to test the model of Internet operation platform built before. And according to the test results, the actual optimization measures are putted forward. In the case of data design, we need to improve them based on the overall data model. During the operation and testing process of the model, we need to gradually complete the optimization of the model. The convergence of the algorithm is tested which is very important for the data mining algorithm. In the practical use of the operating platform model of genetic algorithm, there will be many operational difficulties. All of these require us to find out these operational difficulties in the process of testing, and propose a reasonable solution to calculate the data model test. Once the calculation is completed, the final optimization measures of the model are obtained by comparing the results. After testing, we need to further process the data model. The steps are shown below (Table 1). This is considering the test evaluation of the three kinds of data model tests of Internet network teaching platform based on genetic algorithm design. By comparison, we can
Research on Assistant Application of Artificial Intelligence Robot
235
Table 1. Automatic evaluation and manual correction of comparative data. Contrast data
Accuracy rate
Objectivity
Feedback ability
Promoting effect
Manual correction
0.92
0.82
0.91
0.82
Automatic correction
0.93
0.92
0.92
0.97
Semi manual and semi-automatic
0.95
0.97
0.98
0.96
find that the Semi manual and semi-automatic data model we have designed is the most stable. The overall accuracy and operational level are over 95%, and there is a great deal of practical possibility. The data of this degree is completely satisfied with our actual needs, which can greatly save the time that we actually spend on the genetic algorithm for operation (Table 2). Table 2. Calculate the selection of some methods and parameters. Genetic process
Initialize the population
Genetic operation
Specific
Initial
Encoding
steps
population size
Parameter
100
settings
Feature selection initial probability
Mutation probability
End condition Crossover probability
Maximum number of iterations
0.33
Floating
0.35
0.95
200
point code
After the process of the above comparison model is completed, the next question we need to face is whether the calculation can meet our actual needs in the process of genetic algorithm building model. This requires a lot of experiments. First, we set up a primary computing rate of 33%. Under the guidance of the initial calculation rate, we started the data operation 100 times. The results of the data show that the reliability of the system model of the network teaching platform in colleges based on genetic algorithm is 95%. The higher the reliability is in the system model design process, the better the whole system model is. The data obtained in our test 95% indicates that the model we developed is very complete and has practical operability (Fig. 4). The individual with the highest fitness will be calculated, but will be affected by the number of individuals in the population. The larger the population density is, the longer the calculation is. The smaller the number of population is; the shorter the calculation times are. In addition, it can be seen that the magnitude of the probability density also affects the computation time. However, no matter how it changes, the optimal genetic algorithm used in this paper will accurately calculate the optimal individual, which indicates that the algorithm used in this paper is feasible. This paper can do a population calculation of 30 individuals, and iterate 100 times can also maintain good results, which shows that the algorithm can calculate the mass of the data. This is very important for the
236
H. Pan
Fig. 4. The design of the integrated algorithm calculation model diagram.
calculation of this paper, which can not only calculate the individual with 100% fitness, but also carry out large-scale calculation and analysis (Table 3) Table 3. Comparison of three algorithm test results table. Function
Traditional genetic algorithm
Others improved genetic algorithm
This paper improves the genetic algorithm
Carrel
The final solution of the average
Optimal solution
The final solution of the average
Optimal solution
The final solution of the average
Optimal solution
3
5
4
1
3
3
Due to the computational steps of the genetic algorithm, the computation time must be changed with the number of iterations and the population size, which cannot be changed. But this article did it in the shortest amount of iterations to calculate the fitness for 100% of the individual, the corresponding also reduced the calculation time of genetic algorithm, which saves a large amount of computing time for us. In addition, a series of comparative experiments is conducted to prove that our algorithm is better. Through the above test results, we can draw a conclusion that the optimized algorithm is far superior to the traditional algorithm in both calculation accuracy and computation time. The maximum difference is 5 points, which is a great affirmation to the optimized genetic algorithm.
Research on Assistant Application of Artificial Intelligence Robot
237
5 Conclusion With the emphasis on the national health status of our country, the state has issued some documents concerning sports. This has aroused the attention of the society to the physical health as well as the importance of the physical education in colleges and universities. The teaching of physical education in colleges and universities in our country has been carrying out with innovative teaching. Using cloud platform technology, education people can improve their education level in each other’s learning. It also allows students to be more active and easier to study sports, and to be more convenient and intuitive to master their physical skills. The combination of network technology and traditional teaching technology can make full use of the advantages of both, and can effectively improve teaching quality and teaching effect. The application of network technology in sports is still in the primary stage. There are not many universities that link physical education with cloud platform technology and multimedia technology. In particular, the poor areas also need the help of the state and society. However, the application of cloud technology can effectively solve the problem of the shortage of teachers in China, so that the teaching resources are more open and shared, which can promote the development of education. This paper is an application of genetic algorithm in the multimedia network teaching platform in the college physical education based on cloud platform technology. It uses large-scale data analysis to optimize the education enterprise in our country, so that promotes the development of physical education in China.
References 1. Zhou, B.: smart classroom and multimedia network teaching platform application in college physical education teaching. Int. J. Smart Home 10(10), 145–156 (2016) 2. Hu, C.: Application of E-learning assessment based on AHP-BP algorithm in the cloud computing teaching platform. Int. J. Emerg. Technol. Learn. 11(8), 27 (2016) 3. Cai, X., Cai, X., Cai, X.: Optimization of foreign language learning teaching model and multidimensional evaluation based on online cloud platform. Boletin Tecnico/Tech. Bull. 55(7), 569–575 (2017) 4. Yin, Y.: The influence of MOOC in the cultivation of the college students of electrical engineering based on the cloud platform and moodle. Open Electr. Electron. Eng. J. 9(1), 534–539 (2015) 5. Chang, D.M., Hsu, T.C., Lee, H.J.: Prototype development approach based on application of Program Design in cloud platform. Front. Artif. Intell. Appl. 274(7–8), 1875–1884 (2015) 6. Chen, Y., Chen, Y., Cao, Q., Yang, X.: PacketCloud: a cloudlet-based open platform for in-network services. IEEE Trans. Parallel Distrib. Syst. 27(4), 1146–1159 (2016) 7. Zhao, N., Xia, M., Xu, Z., Mi, W., Shen, Y.: A cloud computing-based college-enterprise classroom training method. World Trans. Eng. Technol. Educ. 13(1), 116–120 (2015) 8. Li, X.: Study on the influence of network multimedia on college aerobics and its countermeasures. Boletin Tecnico/Tech. Bull. 55(4), 243–251 (2017) 9. Hu, H., Zheng, J.: Application of teaching quality assessment based on parallel genetic support vector algorithm in the cloud computing teaching system. Int. J. Emerg. Technol. Learn. 11(8), 16 (2016) 10. Zeng, L., Sun, Y., Ye, Q., Qi, B., Li, B.: A centralized demand response control strategy for domestic electric water heater group based on appliance cloud platform. IEEJ Trans. Electr. Electron. 12(52), S16–S22 (2017)
Research on the Construction of English Teachers’ Classroom Teaching Ability System Based on Artificial Intelligence Qin Miao1(B) and Jun Yang2 1 Foreign Language School, Aba Teachers University, Wenchuan 623002, China 2 Chongqing Vocational College of Transportation, Chongqing 402247, China
Abstract. In the construction of English teachers’ classroom competence system, traditional evaluation methods are incomplete in data collection, inaccurate in classification and insufficient in decision-making basis. Therefore, this paper introduced the K-means clustering algorithm in artificial intelligence algorithm to evaluate the English teaching ability. Based on the research of the K-means clustering algorithm and the implementation method, a fuzzy clustering algorithm combined with big data and information analysis is established to cluster various indicators in the English teaching ability system, which is the basis for improving teaching plan and evaluating teaching ability. It is proved by the simulation experiment that the evaluation of the English teaching ability is more accurate, and the scientific nature of the construction of the teaching ability system is effectively improved. Keywords: Artificial intelligence · English · Classroom · Ability system · Research
1 Introduction In today’s Internet information age, data mining technology is used to process, evaluate and English classroom teaching information using, making quantitative planning for teaching management, and constructing teaching ability system has positive significance. The traditional evaluation methods, such as the use of questionnaires, conversation collection, etc., inevitably contain incomplete data collection, inaccurate classification and insufficient decision making basis [1]. So the introduction of artificial intelligence data mining technology is an effective management model of quantitative analysis. Data mining, as a decision support system, is mainly based on the model of machine learning artificial intelligence, to help people get the data support of scientific decision making under the support of statistics, data science and visualization technology, so as to achieve decision making, and more efficiently analyze all kinds of data automatically [2]. Data mining is a kind of way to search knowledge from database, and helps people find out knowledge hidden behind data by mathematical algorithm model [3]. Figurative, the original knowledge in the database is like mineral resources, mathematical algorithm is © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 238–248, 2022. https://doi.org/10.1007/978-981-16-6554-7_28
Research on the Construction of English Teachers’ Classroom
239
like mining tool, helping people develop and mine minerals effectively [4]. Data mining technology is widely used in many fields, which makes data mining technology have a better foundation for wider development [5]. At present, there are many types of data mining technology, such as abnormal analysis, cluster analysis, classification analysis and evolution analysis. Among them, clustering analysis is one of the most widely used analysis techniques, so this paper built the evaluation system of English teaching ability based on clustering algorithm.
2 Related Work The principle of cluster analysis is to assign data to different clusters or classes by attributes, so as to achieve the process of data collection and analysis [6]. The process of classification is to use groups to group physical or abstract objects into classes with similar attribute characteristics, then conduct directional analysis of different classes [7]. Cluster analysis is applied to many disciplines such as computer science, mathematics, statistics, biology, economics and so on as the basis of research [8]. As a research branch of statistics, cluster analysis is mainly focused on the cluster analysis based on distance. Clustering is a process of searching a cluster, which is an unsupervised learning state, and the cluster is also in a hidden state. The main difference between clustering and regular classification is that clustering does not need to pre design labeled classes or data, but automatically classify and mark categories, which is an exemplary study of an observational learning rather than a classification method [9]. In general, clustering will divide objects into several subclasses that are clearly defined; each object belongs to a unique category. But in the clustering of teaching indicators, because of the nonstationary signal characteristics, a fuzzy clustering algorithm model is introduced to achieve effective clustering analysis. Fuzzy clustering can divide the cluster objects into different categories, but the fact of membership is different from the clustering [10]. In order to solve the influence brought by the change of ability evaluation and avoid the smaller number of subset sets and the tight distance of some clustering centers, the time and space constraints are often used to adjust this kind of clustering.
3 Methodology 3.1 Clustering Algorithm The K-means clustering algorithm is the most typical clustering method based on distance partition; the principle is that if the distance between the two data objects is closer, they are more similar, the distance is the index to evaluate the similarity of data objects. This algorithm takes the cluster as a set of objects with close distance, and if closer between the clusters is, the goal of the classification is achieved. This algorithm takes the cluster as a set of objects with close distance, and if the closer the distance between the clusters the closer the cluster is, the goal of the classification is achieved. The distance from the selected data points to the center of the class is used to optimize the objective function; the iterative adjustment method is obtained by calculating the extreme value
240
Q. Miao and J. Yang
of the function. The basic idea of the K mean clustering algorithm is the central exploration method, which is to cluster the K points in the W space, the center position of the cluster is calculated from the location of each point belonging to the cluster after the iteration of the other points near the center, then these points are then divided into the shortest distance from the center until the final convergence is achieved. Figure 1 is a flow chart of the clustering algorithm. First of all, the number of data sets containing data objects is divided into the number of clusters according to the final need, then the data is randomly selected from the data set as the initial center point, when the similarity standard is satisfied, the iteration is performed by the principle that the attributes of the same cluster are similar, and the characteristics of different clusters are not similar and non-overlapping, in the iteration, the accuracy of the cluster is constantly improved, and the required classification results are finally obtained.
Fig. 1. Flow chart of clustering algorithm.
The calculation formula of cluster center is shown as formula (1). The error square sum is used as the test formula for the convergence effect, as shown in formula (2), K represents the number of clusters and E is the center of the cluster, which represents the average value. The complexity of time is expressed by O(tKmn), in which, T is the number of iterations, K is the number of clusters, M is the number of records, and N is
Research on the Construction of English Teachers’ Classroom
241
the dimension. The space complexity is expressed by O((m + k)n), where the number of K is the number of M of the cluster, and the n is the dimension. 1 Xi n n
Cj =
(1)
i=1
E=
K _ 2 X − X
(2)
l=1 X ∈CI
xi1 − xj1 2 + xi2 − xj2 2 + · · · xip − xjp 2 d (i, j) =
(3)
In similarity computation, Euclidean distance is generally used as the standard, the Euclidean distance is the optimal classification found by the initial clustering center vector V = (v1 , v2 , · · · vk )T of the technology, and the minimum evaluation index Jc can be obtained. The Euclidean distance formula, as shown in formula (3), is used to represent the Euclidean distance between the data I and the J. The degree of recognition between the target data of the K mean clustering algorithm is expressed in distance. It is not suitable for discontinuous data, which is mainly applicable to continuous data objects. To calculate the degree of recognition between data objects, the main use are the Euclidean distance, the Manhattan distance or the Kowski distance. In performance evaluation, a suitable criterion function should be chosen, and the result of clustering is generally judged by the square of mean square. The algorithm also needs to make use of continuous iteration to implement the clustering process. The algorithm is calculated by iteration. The algorithm uses iteration to calculate, and the direction of the algorithm is that each target function of iteration will be smaller. The final goal of clustering is to get the minimum of the objective function and achieve the optimal clustering. The selection of the initial clustering center will have a prominent influence on the algorithm. Figure 2 is a schematic diagram of the centroid selection of the K mean clustering algorithm.
Fig. 2. A schematic diagram of centroid selection by K mean clustering algorithm.
Early in the algorithm, the initial center point represents a cluster, random operator point as the initial clustering center. Each of the remaining data of iteration is reassigned
242
Q. Miao and J. Yang
according to the distance from the randomly selected cluster center. When all the remaining data are examined, an iterative operation is completed, and the new cluster center and the Jc value of the criterion function are also obtained. Before and after iteration, if the Jc does not change, it shows that the algorithm has converged, and the rule function is the basis for the end of the algorithm. In the iterative process, the rule function is gradually reduced to the minimum. Figure 3 is an iterative process diagram of the K-means clustering algorithm, the time complexity required by the algorithm is O(thn), the t here is the number of times of the algorithm cycle t 2’ means data of subject A01 is transferred to A02 for short. Compared with counterparts, our algorithms perform almost all the best (to appear as the bold) except A02 ≥ 4, A03 ≥ 2 and A03 ≥ 4. For these three, the best results come from ARRLS, which is also an algorithm applying joint distribution adaptation. AT-GM-b performs a little worse than ARRLS in a whole. It is worth noting that the counterparts including TCA [15], 2SW-MDA [16], ARRLS [17] and AT-GM-b [18] are representative and outstanding algorithms, which are frequently taken as comparison objects. TCA is a classic feature-extracting algorithm for domain adaptation. It improves MMD and takes differences of marginal distribution into account. Therefore, comparing to TCA can be taken as comparing to improved MMD. ARRLS is very similar with ours since it takes both marginal and conditional distribution into account. AT-GM-b is also an outstanding algorithm. It transforms data of different subjects into a common space, which aims to make data from subjects closer and similar to ours. Though ours do not run the best, power of ARRLS indicates the effectiveness of joint distribution adaptation for motor imagery data. In a whole, CDDSVM runs best 5 out of 9 conditions, while ARRLS and AT-GM-b wins the rest four. The mean and variance of accuracy of classification with CDDSVM is 74.6% and 5.9% across all the transferring conditions, which demonstrates proposed algorithm can take a robust and effective role in zero-training motor imagery BCIs. Averagely, CDDSVM outperforms counterparts by 7.3%, 5.3%, −0.8%, 0.7%, respectively. The results also demonstrate that our algorithms are also effective for real data recorded. It provides us more confidence and further study will be proceeded for online use. Table 1(a). Comparison results for online test. Algorithms
A001 -> 2
A001 -> 3
A001 -> 4
A002 -> 1
A002 -> 3
TCA+SVM
66.1%
72.3%
75.2%
64.4%
66.7%
2SW_MDA
67.0%
73.9%
76.6%
65.1%
66.1%
ARRLS
70.9%
75.8%
78.9%
67.0%
69.9%
AT-GM-b
72.1%
76.1%
78.7%
67.3%
69.5%
CDDSVM
72.2%
76.4%
81.2%
66.4%
70.6%
Conditional Distribution Adaptation Toward Zero-Training Motor Imagery
319
Table 1(b). Comparison results for online test. Algorithms
A002 -> 4
A003 -> 1
A003 -> 2
A003 -> 4
TCA+SVM
76.9%
70.3%
67.1%
67.7%
2SW_MDA
78.4%
71.6%
69.3%
70.4%
ARRLS
85.7%
75.9%
78.6%
75.9%
AT-GM-b
84.8%
75.6%
78.3%
75.3%
CDDSVM
86.1%
74.5%
73.2%
72.2%
4 Conclusions and Discussions To realize zero-training motor imagery BCI, the classifier trained with source data must be transferred to target subjects’ data, which leads into domain adaptation problems. Two algorithms of domain adaptation and a parameters-choosing method are proposed. Experiments results with MINST data, BCI competition data and data obtained by our laboratory in practical application prove the effectiveness of our algorithms. It performs well at most conditions but not all of the conditions. It demonstrates the diversity and complexity of domain adaptation problem in motor imagery data. Knowledge from data of one person transferred the other may have very different results. It also proves that people not only have differences but also have things in common. The study will be furthered to explore this utilizing big data and deep neural networks, which are also promising way for BCI. More and more researchers including us are paying much attention to them [19–20]. The proposed distance measures will be applied to deep neural networks in our next step. Acknowledgement. This work is partially supported by National Key R& D Program of China (No. 2018YFB1702200), Ningbo Science and technology innovation 2025 major project (2019B10116), the National Natural Science Foundation of China (Grant No.61633019). And partially by National Natural Science Foundation of China under Grant 61571247 and 31702393, in part by the International Cooperation Projects of Zhejiang Province under Grant NO.2013C24027, Ningbo Science and Technology Project (Grant No.2018C80004), and in part by Ningbo Public Welfare Project (No.2019C10098). And in part by the Open Project Funding of State Key Laboratory of Industrial Control Technology in Zhejiang University (ICT20004).
References 1. Zhang, Y., et al.: Sparse Bayesian classification of eeg for brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2256–2268 (2016) 2. Shi, T., Ren, L., Cui, W.: Feature extraction of brain-computer interface electroencephalogram based on motor imagery. IEEE Sens. J. 5(99), 1–10 (2019) 3. Krusienski, D.J., Wolpaw, J.R.: Brain-computer interface research at the wadsworth center developments in noninvasive communication and control. Int. Rev. Neurobiol. 86, 147–157 (2009)
320
X. Zhao et al.
4. Tu, W., Sun, S.: Semi-supervised feature extraction for EEG classification. Pattern Anal. Appl. 16(2), 213–222 (2013) 5. Panicker, R.C., Puthusserypady, S., Sun, Y.: Adaptation in P300 brain–computer interfaces: a two-classifier cotraining approach. IEEE Trans. Biomed. Eng. 57(12), 2927–2935 (2010) 6. Meng, J., et al.: Improved semisupervised adaptation for a small training dataset in the brain– computer interface. IEEE J. Biomed. Health Inf. 18(4), 1461–1472 (2014) 7. Blankertz, B., et al.: Optimizing spatial filters for robust EEG single-trial analysis. IEEE Sig. Process. Mag. 25(1), 41–56 (2008) 8. Haiping, L., et al.: Regularized common spatial pattern with aggregation for EEG classification in small-sample setting. IEEE Trans. Biomed. Eng. 12(57), 2936–2946 (2010) 9. Lotte, F., Guan, C.: Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms. IEEE Trans. Biomed. Eng. 58(2), 355–362 (2011) 10. Shao, L., Zhu, F., Li, X.: Transfer learning for visual categorization: a survey. IEEE Trans. Neural Netw. Learn. Syst. 26(5), 1019–1034 (2015) 11. Atyabi, A., Luerssen, M.H., Powers, D.M.W.: PSO-based dimension reduction of EEG recordings: implications for subject transfer in BCI. Neurocomputing 119, 319–331 (2013) 12. Tu, W., Sun, S.: A subject transfer framework for EEG classification. Neurocomputing 82, 109–116 (2012) 13. Borgwardt, K.M., et al.: Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14), 49–57 (2006) 14. Song, L.: Kernel embeddings of conditional distributions. IEEE Sig. Process. Mag. 30(4), 98–111 (2013) 15. Pan, S.J., et al.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2011) 16. Sun, Q., et al.: A two-stage weighting framework for multi-source domain adaptation. In: Advances in Neural Information Processing Systems, pp. 505–513 (2011) 17. Long, M., et al.: Adaptation regularization: A general framework for transfer learning. IEEE Trans. Knowl. Data Eng. 26(5), 1076–1089 (2014) 18. Zanini, P., et al.: Transfer learning: a Riemannian geometry framework with applications to brain-computer interfaces. IEEE Trans. Biomed. Eng. 65, 1–11 (2017)
Internal Quality Classification of Apples Based on Near Infrared Spectroscopy and Evidence Theory Xue Li, Liyao Ma(B) , Shuhui Bi, and Tao Shen School of Electrical Engineering, University of Jinan, Jinan 250022, China [email protected]
Abstract. Apple classification is of great significance to enhance market competitiveness of Chinese apple industry in the world. In this paper, the collected apple near infrared spectroscopy is taken as the data sample. Firstly, data processing was carried out on collected samples: Principal Component Analysis – Mahalanobis Distance method was used to eliminate abnormal samples, convolution smoothing filtering was used to remove the noise in the original spectrum, Multiple Scattering Correction and Standard Normal Variate were used to calibrate the baseline of apples spectra, Genetic Algorithm was used to eliminate the invalid wavelength information, and prediction models of Extreme Learning Machine and Partial Least Squares were established. In order to solve the problem of classification accuracy decline caused by hard segmentation, uncertainty was introduced, that is, based on evidence theory apple classification fusion algorithm. By assigning different discount factors to quality functions of different prediction models, a new basic probability function is generated, and then the data fusion of the new basic probability function is carried out by using evidence combination rules and it improves the accuracy of apple classification. Keywords: Apple classification · Extreme learning machine · Partial least squares · Evidence theory
1 Introduction China is a big fruit planting country in the world, the planting area and output of fruit show a steady growth trend [1]. The industrial value of fruit is next only to that of vegetables and grains [2]. Compared to other fruits, apples are rich in vitamin C and pectin, which can protect the heart and blood vessels, have anti-cancer properties and can reduce the risk of heart disease [3]. On average, people around the world eat nearly 400 billion apples a year, which is enough to show how much people love them [4].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 321–330, 2022. https://doi.org/10.1007/978-981-16-6554-7_37
322
X. Li et al.
According to industry analysis data from the Ministry of Agriculture, China accounted for 65.7% of the world’s total in 2018. In the same period, Chinese consumption accounted for 64% of the world’s total, reaching nearly 41.02 million tons, ranking first in the world at present [5]. However, China is far from being a powerful country in apple export trade. In 2018, Chinese total export volume was only 1.32 million tons, accounting for only 2.5% of the total output value, which is far lower than the world average of 8% apple export [6]. The main reasons for this phenomenon are as follows: quality of apples fails to meet reasonable export demand because of low inspection standard, unqualified quality control, low sorting efficiency and realization rate. As a result, there are lots of problems such as rejection, detention and breach of contract in the export trade [7]. Therefore, under the current situation, the primary task of Chinese apple industry should be transferred from the improvement of output and category to the quality control side, while maintaining output, improve the quality of export. With the progress of the times, the world’s requirements for the quality of life are gradually improved and the consumption of apples has shifted from “quantitative” to “quality-oriented”, which makes high-quality apples not only sell well and high economic benefits, but also strong international competitiveness. So, it is vital to classify the internal quality of apple to promote its industrialization development [8]. Near-infrared spectroscopy nondestructive testing technology with nondestructive, simple, fast and low cost, and it has gradually become a new technology in recent years. It has been applied in the detection of damage degree of cherries [9], the detection of blackheart and Soluble solid content (SSC) of pears [10], and the storage of fruits [11]. Lu et al. used near-infrared spectroscopy to predict the sugar content of several varieties of apples with and without peel [12]. Mao Shasha et al. established the prediction of sugar content, acid content and vitamin content by using Partial Least Squares(PLS) algorithm based on near-infrared diffuse reflectance spectroscopy detection technology [13]. The internal quality of apple is detected by using a single modeling method. However, due to the inaccuracy of the prediction model and the error of hard segmentation, there are some differences in the prediction results. In the classification of apple, image fusion and multi-angle and multi-region feature fusion were studied in the literature [4], and the fusion of the quality function of the prediction model was involved in the literature [7]. In this paper, uncertainty is introduced, and the basic probability function is generated by using the method of evidence discount. Based on evidence theory, apple classification is realized, which effectively avoid design defects of traditional classification model.
2 Data Processing and Feature Extraction 439 Fuji apples were selected and kept in the laboratory for 12 h and data was acquired by using integrating sphere diffuse reflection module. Spectra are collected three times for each apple at a sampling interval of 120°, and take the average of the three spectrum as the original spectrum. Spectrum diagram (see Fig. 1).
Internal Quality Classification of Apples
323
Fig. 1. Original spectrum
In the process of modeling, a reasonable preprocessing method can effectively filter the noise information in the Near infrared spectroscopy (NIR) and retain the effective information, thus reducing the complexity of the NIR quantitative model and improving the adaptability and robustness of the model. Therefore, spectral pretreatment is often essential. Data preprocessing process (see Fig. 2).
Principal Component Analysis - Mahalanobis Distance (PCA-MD)
Multiple Scattering Correction(MSC) Data preprocessing Standard Normal Variate(SNV)
Convolution smoothing filtering
Fig. 2. Data preprocessing process
The accuracy of the sample data directly determines the effectiveness of the modeling, and abnormal parameters will lead to a decrease in the stability and accuracy of the
324
X. Li et al.
sample model. Therefore, PCA-MD method was used to eliminate abnormal parameters in the original spectrum and store reasonable parameters. Convolution smoothing filtering is a filtering method based on local polynomial least square fitting in time domain. Its biggest feature is that it can ensure the shape and width of the signal unchanged while filtering out noise, so it is widely used in image processing [14]. Whether solid or liquid samples, it is difficult to achieve ideal uniformity. The inhomogeneity of the sample will cause light to pass through or scatter when it is reflected back from the sample and the scattering of light will cause the error of the sample spectrum. SNV can be used to correct the spectral error caused by scattering. MSC enables the pretreated spectrum to not only eliminate the impact of baseline drift, but also effectively eliminate the impact of scattering, thus achieving the purpose of improving the spectral signal-to-noise ratio [15]. MSC method believes that each spectrum is linear to the ideal spectrum, but the real ideal spectrum is not available. We need to select effective information from the whole wavelength. Therefore, Genetic Algorithm (GA) is used to screen the information variables of the whole spectrum, and the best characteristic wavelength is obtained. PLS and Extreme Learning Machine model (ELM) prediction models will be established as input. Image of original spectrum after preprocessing (see Fig. 3).
Fig. 3. Image of original spectrum after preprocessing
Internal Quality Classification of Apples
325
3 Establishment of Prediction Model We have obtained the near-infrared spectral information of apples, and in order to obtained the characteristic wavelength, we screened it after preprocessed. In this part, we will take the characteristic wavelength as the input to establish two prediction models, ELM and PLS. 3.1 Partial Least Squares Model PLS algorithm is essentially modeling among multiple variables. It optimizes the simple variable regression model into a new comprehensive variable formed by combining the independent variables with the information interpretation and the information quantity variation parameters, and takes this as the original parameters to construct the model. Compared with other methods, the characteristic quantity is more accurate, less computation, and the prediction ability is more excellent. When using PLS to analyze and process SSC, the collected near-infrared data and the content of SSC are defined as main components. The absorbance and content matrix are taken as independent variables. In this case, there is no problem of matrix inverse, and the calculation process is simple and results are accurate. The advantage of PLS method is that it does not correlate every variable with the final result or establish a regression relationship. Instead, it synthesizes the external relationship between the spectral matrix X and W to obtain the internal relationship between them and then decompose X and W into the following form: Where T and V are score matrices of matrix X and W , P and Q are load matrices of two matrices, E and F are residual matrices of two matrices. X = TP + E
(1)
W = VQ + F
(2)
V = TB
(3)
B = (T T T )
−1 T
T W
(4)
Finally, according to the above equation, the composition matrix B is obtained.B, is used to predict the results of the sample under test. Various data of test samples were collected and decomposed according to Eq. (1) and Eq. (4) to obtain the sample concentration W , as shown in Eq. (5). W = tB
(5)
Where W and t are respectively the concentration value and spectral decomposition score of the sample to be tested.
326
X. Li et al.
The implementation process of Apple SSC detection based on PLS is divided into four steps: · Divide the training set and the test set · Establish PLS model · The training set is used to train the network · A test set is used to train the network 3.2 Extreme Learning Machine Model ELM is a kind of training algorithm based on feed forward neural network, which has characteristics of fast learning speed, strong generalization performance and hardly falling local extreme value, so, it is widely used [17]. Suppose there is a single-layer neural network, and its training sample is N , and the number of hidden layer nodes is defined as L, then the following iterative equation can be used for this neural network: L βi G(Wi · Xi + bi ) = Oi , j = 1, 2, . . . , N (6) i=1
Where G(x) is excitation function, Wi = [wi1 , wi2 , · · · , win ]T is input weight, βi is the output weight, bi is the offset of the node in the hidden layer is the inner product. In order to minimize the output error of the single hidden layer neural network, it should satisfy: N j=1
oi − ti = 0
(7)
It indicates that βi G(Wi · Xi + bi ) = tj , j = 1, 2, . . . , N
(8)
Hβ = T
(9)
Its matrix form is
Where H is output of the hidden layer node matrix, β is the output weight matrix, T is the desired output matrix.
Internal Quality Classification of Apples
327
The implementation process of Apple SSC detection based on ELM is divided into two steps:
Step1: ELM network creation and training · Call the elmtrain function to create the network · Parameters of the judgment · Randomly generate the weight matrix · Calculate different outputs based on the activation function · Calculate the output weight · Returns the network training value Step2: simulation test · Call the elmtrain function to call the transfer function · Parameters of the judgment · Calculate the hidden layer output matrix · Determine the mode of the activation function · Calculate the output layer output
4 Apple Classification Fusion Based on Evidence Theory The traditional apple classification model is a rigid segmentation method with very accurate requirements for data. However, the results of SSC segmentation according to the threshold value are biased to some extent. Apples near the threshold value tend to be randomly divided into two categories. In this paper, uncertainty is introduced to reduce the result difference caused by its rigid classification. 4.1 Evidence Theory The evidence theory [1, 2] is a mathematical tool developed in 1970s. It does not require a priori information, description of the uncertain information with the method of “range”, close to the uncertainty of information representation method. The difference between “I don’t know” and “not sure” party, accurately reflect the aggregate evidence shows a lot of flexibility. Thus in the multi-source information fusion, target recognition and decision analysis in areas such as widely used. = {ω1 , ω2 , · · · , ωn }, is defined as a set containing all the theoretical values of incompatible elements, which is called the identification framework of the set. It is also known as the hypothesis spaceof sets [14]. Basic credibility assignment function: m 2 → [0, 1] meet: m(φ) = 0, A⊆ m(A) = 1. The mass function m(A) reflects the accurate trust of the recognizer on the proposition and is an effective support for the parameters. The PLS and ELM prediction models have been established in this paper. Assuming that mass functions corresponding to the two prediction models are m1 and m2 , then Dempster’s combination rule for synthesis of two mass functions is m1 (Ai )m2 (Bj ) (10) K1 = Ai ∩Bj =
328
X. Li et al.
m(C) =
Ai ∩Bj =C m1 (Ai )m2 (Bj ),
0,
φ = C ⊂ C = φ
(11)
Since the credibility of each group of evidence is different, a discount factor should be allocated to each quality function before fusion, and the discounted evidence is as follows: mα () = (1 − α)m() + α
(12)
mα (A) = (1 − α)m(A), ∀A ⊂ , A = φ
(13)
4.2 Realization of Apple Classification Based on Evidence Theory Apples were classified according to the height of SSC, SSC was third-grade fruit in the interval [8, 11), second-class fruit in the interval [11, 13] and first-class fruit in the interval (13, 16]. Traditional hard segmentation model can only determine the data processing, such as when the SSC is 12.99, it is considered a second-class fruit. But,the accuracy of the forecast model cannot be reached 100%, when the SSC is 12.99, no one can sure it is second-class fruit. Uncertainty was introduced in classification problem, it is expressed as “first-fruit or second-fruit” and it is used in the model of evidence classification study to solve the question caused by the hard segmentation. According to results of the classification, the basic probability function of the different categories is calculated respectively, set I is first-class fruit, II is second-class fruit and III is third-class fruit. When we make the prediction, y values range [8, 11), at this time, the identification framework is ={I, II, III, I II, I III, II III, I II III}. According to Eq. (14) we can get: apple for III, quality function m(III) = 1, y values range [11, 13], apple for II, quality function m(II) = 1, y values range (13, 16], apple for I, quality function m(I) = 1. ⎧ ⎨ I, 13 < y ≤ 16 (14) C = II, 11 ≤ y ≤ 13 ⎩ III, 8 ≤ y < 11 mαi (C) = αmi (C), i = 1, 2
(15)
mαi () = 1 − mαi (C), i = 1, 2
(16)
Where y is expected function value of SSC content and C is the classification grade, α is the discount factor and it is determined according the degree of trust in the prediction model.
Internal Quality Classification of Apples
329
5 Experimental Verification Taking one set of prediction results as an example, the actual measured SSC was 11.2, the ELM method predicted SSC was 10.83 and the PLS method predicted SSC was 11.69. The new mass function obtained from the ELM prediction model and the PLS prediction model were discounted for evidence theory fusion. The discount factors assigned to the quality function of ELM prediction model and PLS prediction model were 0.6 and 0.799. y1 = 10.83y2 = 11.69, it is obtained by Eq. (14), Eq. (15) and Eq. (16). C1 = III, C2 = II, mα1 (III) = 0.6, mα1 (I, II, III) = 0.4 mα2 (II) = 0.99, mα2 (I, II, III) = 0.01 According to the Eq. (10), it is concluded that: m1⊕2 (II) = 0.98, m1⊕2 (III) = 0.01, m1⊕2 (I, II, III) = 0.01 It can be seen from the above experimental data, after the integration of evidence theory, it was considered that the basic quality distribution of second-class fruit was 0.98, it was considered that the basic quality distribution of third-class fruit was 0.01, it was considered that the basic quality distribution of first-class, second-class and third-class fruit was 0.01. In the end, it was designated as a second-class fruit. According to the actual values and predicted values of the prediction model.The predicted results of PLS was second-class fruit and the predicted results of ELM was third-class, the actual result was second-class. So, the final result was more reliable than the single model.
6 Conclusion In this paper, NIR nondestructive testing of Apple SSC was studied, basic probability functions of two prediction models are generated by evidence discount method. The data fusion is carried out by dumpster evidence combination rule. First, PCA-MD was carried out to eliminate abnormal samples in the original spectrum. SNV and MSC effectively calibrated the data. Based on PLS and ELM, the model of apple SSC was established by using genetic algorithm select characteristic wavelength. Finally, the uncertainty of classification was analyzed based on evidence theory, which effectively avoid the design defects of the traditional classification model and makes detection results more accurate. In future work, more methods of generating basic probability functions and fusion of various prediction models will be adopted. Acknowledgment. This work is supported by Key R&D Development Program of Shandong Province under grant 2017GGX10116, Shandong Agricultural Machinery Equipment R&D Innovation Plan Project Under Grant 2018YF011, and Shandong Provincial Natural Science Foundation ZR2018PF009.
References 1. Zhang, J., Kong, F., Wu, J., Zhou, X.: Analysis of China’s vegetable market operation in 2018, prospect and countermeasures in 2019. Chin. Vegetables 2019(01), 7–12 (2019)
330
X. Li et al.
2. Wang, M., Mu, Y.: Research on farmers’ behavior from the perspective of industrial integration: a case study of 668 vegetable growers. Chin. Agric. Sci. Bull. 35(06), 158–164 (2019) 3. Dong, S.: Nutritional value and comprehensive utilization of apple pomace. Chin. Fruits Vegetables 37(02), 15–18 (2017) 4. Liu, Y.: Research on apple classification method based on feature fusion. Fujian Normal University (2019) 5. Meng, X., Zhang, Z., Li, Y., Ren, L., Song, Y.: Research status and progress of apple grading. Deciduous Fruits 51(06), 24–27 (2019) 6. Yao, X., Yang, J.: Current situation and future trend of China’s apple export. Chin. Fruits 2019(03), 110–112 (2019) 7. Yan, X., Ma, L., Shen, T.: Application of DS evidence theory in apple’s internal quality classification. In: Proceedings of the 10th International Conference on Computer Engineering and Networks (CENet2020), pp. 582–590 (2020) 8. Quan, P.: Research and development of apple internal multi-quality parameter integration portable detection device based on visible/NIR spectroscopy. Northwest A&F University (2019) 9. Shao, Y., Xuan, G., Hu, Z., Gao, Z., Liu, Lei.: Determination of the bruise degree for cherry using Vis-NIR reflection spectroscopy coupled with multivariate analysis. PloS one 14(9), e0222633 (2019) 10. Zhang, C., Tang, X., Guan, R., Qin, W., Nong, K.: Preliminary study on application of near infrared ray online monitoring quality of sugarcane in sugar mills. Sugarcane Sugar Industry 2019(05), 29–36 (2019) 11. Tian, X., Huang, X., Bai, J., Lu, R., Sun, Z.: Detection of anthocyanins in purple sweet posstato during storage based on near infrared spectroscopy. Trans. Chin. Soc. Agr. Mach. 50(02), 350–355 (2019) 12. Lu, R.: Prediction of apple fruit firmness by near-infrared multispectral scattering. J. Texture Stud. 5(3), 263–276 (2004) 13. Mao, S., Zeng, M., He, S., Zheng, Y., Yi, S., Wang, L., Zhao, X.: Nondestructive detection of internal quality of hamlin sweet orange fruits by Visible-Near-Infrared diffuse reflectance spectroscopy. Food Sci. 31(14), 258–263 (2010) 14. Xiaoyue, C., Zhao Longzhang, H., Qiong, S.J.: Real-time semantic segmentation based on expansion convolution smoothing and lightweight up-sampling. Laser Optoelectron. Prog. 57(02), 185–192 (2020) 15. Jingling, X., Xicun, Z., Gao Huaguang, Y., Ruiyang, W.X.: Hyperspectral estimation of soil moisture content of RAMS based on MSC and SVM in the great wall. Acta Pedol. Sin. 55(06), 1336–1344 (2018)
Multi-modal Speech Emotion Recognition Based on TCN and Attention Yifan Ye and Jing Chen(B) School of Physics and Optoelectronic Engineering, Guangdong University of Technology, Guangzhou, China [email protected]
Abstract. Emotion recognition is an important research direction in the field of deep learning and how to accurately recognize human emotions is a difficult and hot research topic. The previous work mainly focused on the two steps of feature extraction and regression. These two steps are usually independent of each other. At the same time, previous work usually only considers single-modal information such as voice, text, or image. This ignores the influence of modalities on emotion recognition, and therefore greatly limits the accuracy of emotion recognition. This paper adopts multi-modal input and simultaneously includes three modal information of speech, image and text to recognize emotions. This paper introduces the multi-modal network of TCN based on the attention mechanism to recognize user emotions. By introducing the TCN based on the attention mechanism, the results are greatly improved compared with the single-modal results. The effect of multi-modal network is also improved. Keywords: Emotion Recognition · Multimode · Attention · TCN
1 Introduction Due to the development of information era,people nowadays have a intimate relationship with Internet and,at the same time, we pay more attention to emotional interaction technology [12]. Unaffectedly, emotion calculation is gotten wide-range attention and emotion recognition,which is the basis of emotion calculation, is a important part of human-computer interaction. Emotion plays a role in our thoughts and actions and is an integral part of the way we communicate [11]. It’s psycho-physiological process that can be triggered by aware and/or unaware cognition of objects and situations, associated with multitude of factors such as mood, temperament, personality, disposition, and motivation [13].Until now, emotion recognition has been studied in many ways. It’s common to use speech, facial cues, text individually analyzing emotion. However, considering the complexity of emotion, using single way to detect emotion is insufficient. Recent years, the combination of verbal and non-verbal based detection skill has been proposed and achieved a satisfied accuracy. We use the open-sourced multimodal database IEMOCAP dataset [14] which include facial expression, speech and text information. Previous works [9, 11, 12] show that using multi-modal network can accurately recognize emotion. Based on previous works [9, 15], we combine new deep learning © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 331–338, 2022. https://doi.org/10.1007/978-981-16-6554-7_38
332
Y. Ye and J. Chen
based architectures which can achieve high accuracy on each of the different tasks to get higher accuracy. We use Attention based Long Short-Term Memory networks and improved Temporal convolutional networks with dense connection to detect speech information. As normal NLP task, we use Gated Recurrent Units(GRU) to detect text information. A combination of convolutional neural networks and LSTM structures is used for detecting video information, which is less memory consumption than 3DConvolutions and compared with previous work [15] the combination way is more accurate. After passing the three modes features will merge before the Softmax layer. We mention that our preprocessing and training methods are largely based on previous work [9] and our modal will be open source for other to enhance our study.
2 Related Work Emotion recognition has been an interdisciplinary research field in a long time [1]. Preliminary research in this field mainly involves visual and auditory processing. With the research of Alm et al. [2], the role of text in sentiment analysis has become more and more obvious. The current research in this field mainly analyzes the influence of different modalities on the recognition results from the perspective of multi-modality in order to obtain better recognition results. Han et al. used deep neural networks to model emotions at the discourse level [3]. Mirsamadi et al. used the attention mechanism-based CNN to extract frame-level features to achieve speech emotion recognition [4]. The IEMOCAP dataset has given a lot of motivation to multi-modal emotion recognition, and more and more work is focused on multi-modal emotion speech recognition. The current state-of-art classification on IEMOCAP is provided by [5] which builds on the prior work [6]. They use 3D-CNN for visual feature extraction, text-CNN for textual features extraction [7] and openSMILE for audio feature extraction [8].
3 Dataset Preparation The IEMOCAP database is a movement, multi-modal and multi-peak database [9], recently collected in the Sail Lab at the University of Southern California. It contains approximately 12 h of audiovisual data, including video, voice, facial motion capture, and text transcription. It consists of two stages, in which participants perform impromptu performances or scripted scenes, especially choosing to elicit emotional expressions. The IEMOCAP database is annotated by multiple commentators into category labels, such as anger, happiness, sadness, neutrality, and dimension labels, such as valence, activation, and dominance. Detailed motion capture information, interactive settings that stimulate real emotions, and the size of the database make this corpus an important supplement to the existing database in the community, helping to study and model multimodal and expressive human communication.
Multi-modal Speech Emotion Recognition Based on TCN and Attention
333
Before using data from speech, text, and motions captured from face expressions from IEMOCAP, we extracted a 40-dimensional acoustic feature vector as the input of the acoustic network, including 13-dimensional MFCC features, 20-dimensional Fbank features and 7-dimensional other features (zero-crossing rate, spectral centroid, etc.). The length of the speech frame is 200 ms, and the frame shift is 100 ms. For the input of the text part, we used Glove to extract a text feature vector with a length of 500 * 300. For face, hand, head rotation data we concatenate all into an array.
4 Model Introduction 4.1 Speech Based Emotion Detection We first established a single-modal sentiment analysis model for speech. The input of the speech sentiment analysis model is 100 * 34. Firstly, we use LSTM structure consists of 128 nodes and follow by two dense layers with 128,4 hidden neural units with Relu as activation and 4 output neurons with Softmax. From Fig. 1 we can see how LSTM work. The model trains using cross entropy loss with Adam as the optimizer.Secondly,speech_model2 was as same as speech_model1 but the LSTM layers improved by the Attention implementation. Thirdly, in order to get higher accuracy we propose our final model speech_model3 which we add TCN Layer after the improved LSTM.
Fig. 1. LSTM
The details of our speech_model3 are shown in Fig. 2 while the TCNLayer we added is largely based on the original work but we change the output layer of the original TCN by two stacked SoftmaxLayer and the casual Convolutions are no longer used in our TCN structure.
334
Y. Ye and J. Chen
Fig. 2. Speech model
Table 1. Speech Model
Accuracy
Speech_model1
0.5071
Speech_model2
0.5251
Speech_model3
0.5864
Previous [15]
0.5663
Our 3 speech models performances on IEMOCAP dataset are shown in Table1,and apparently with the increasing complexity of the modal structure, which without considering the memory cost and FLOPS, the higher accuracy our modal get. Surprisingly, our Speech_model2’s structure is as same as previous work [15] but our accuracy was 0.04 less. With the further studying, we find that the resources we have and training skills we used are worse than the previous work, by enlarging the batch size or enhancing the preprocessing skills may be our Speech_model3 can get an higher accuracy. 4.2 Text Based Emotion Detection Then we build a pure text emotion recognition network. As the normal NLP task, by simply using the GRU layer can get a satisfying result and successfully complete the detecting. The text input is first mapped through an Embedding, which is based on the Keras Embedding function while we abandon the extra information over 500 dimension, the result of the mapping is sent to a 256-node GRU layer, and then through a 128-layer GRU layer, and finally through a 128-node GRU layer After the node Dense layer, it passes through a 4-node Dense layer. The specific network structure is as follows (Fig. 3).
Multi-modal Speech Emotion Recognition Based on TCN and Attention
335
Fig. 3. Text modal structure
4.3 Video Based Detection Considering the great memory cost of 3D Convolutional neural networks and the relatively slow training speed, we decided to use a combination of convolution layer and LSTM structure which keras has already packed it and named as ConvLSTM. While actually Transformer and LSTM structure were used to solve the NLP filed problems, but,nowadays, the Encoder-Decoder modal are also widely use in CV area. Many works have shown that using the Attention based Encoder-Decoder structure can be qualified to classification, recognition, segment and nearly all CV tasks. Previous work also show simply using the classic network such as DenseNet need a great amount of training tricks. The initial work we did was to preprocess the video data. For further efficiently extract the features, we using the previous work[15] preprocessing method which is vectorize the information from start to end and merge both hands information and facial cues data together as the input. The details of our modal are as follow. First, it will be sent to a 32-node ConvLSTM2D layer. The size of the convolution kernel is 2 * 2, and the activation function is Relu. The output goes through a dropout layer. Next, repeat this structure and modify the number of nodes in the ConvLSTM2D layer to 64, 64, 128, and 128 respectively. Then the data is converted into one-dimensional data through the Flatten layer, and finally through a 256-node Dense layer and then a 4-node Dense layer.
336
Y. Ye and J. Chen
4.4 Multimode Structure The following describes the emotion recognition performance of the three models. When using speech alone as input for speech emotion recognition, the accuracy rate is 58.8%. When using text alone as input for emotion recognition, the accuracy rate is 65.7%. When Mocap data is used alone for emotion recognition, the accuracy rate is 53.8% It can be seen that when only a single mode is used for emotion recognition, the performance of each model is difficult to reach a satisfactory level. Therefore, we will integrate voice text and Mocap data for emotion recognition. The fusion method is to remove the Dense layer whose last node of the three models is 4. Then connect the output to form a fusion matrix, and finally pass a 256-node Dense layer and then a 4-node Dense layer to obtain the final classification result. The specific structure is shown in the figure below (Fig. 4). In order to compare the influence of different modules on the results, we selected the following four structures for experiments: the first is to remove the TCN and Attention layers in the voice emotion model, and the second is to remove the TCN layer in the voice emotion model and retain Attention Layer, the third is to remove the Attention layer in the voice emotion model, and only retain the TCN layer, and the last one is to retain the TCN and Attention layers. The final performance is shown in the Table 3.
Speech Model
Text Model
Merge Output
Dense(256)
Dense(4)
Fig. 4. Mocap emotion model
Mocap Model
Multi-modal Speech Emotion Recognition Based on TCN and Attention
337
Table 2. Single-mode model performance Model type
Accuracy
Speech emotion model
58.8%
Text emotion model
65.7%
Mocap emotion model
53.8%
Table 3. Multi-mode model performance Model Type
Accuracy
Without TCN and Attention
67.5%
Without TCN
71.2%
Without Attention
69.4%
With TCN and Attention
72.3%
5 Conclusion In this article, we introduced speech, text and Mocap data to perform emotion recognition on speech. At the same time, we compared the performance of single-modal and multimodal models. It can be seen that the performance of multi-modal models is much higher than that of single-modal models. Looking at the impact of TCN and Attention on the performance of the entire multi-modal model, it can be seen that the model recognition rate is the highest when both TCN and Attention are available, and the model recognition rate reaches 72.3%.
References 1. Chen, T., Chen, Z., Yuan, X., et al.: Emotion recognition method based on instantaneous energy of electroencephalography. Computer Engineering (2019). Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) CONFERENCE 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016) 2. Alm, C.O., Dan, R., Sproat, R.: Emotions from text: machine learning for text-based emotion prediction. In: HLT/EMNLP 2005, Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 6–8 October 2005, Vancouver, British Columbia, Canada. DBLP (2005) 3. Han, K., Dong, Y., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Interspeech (2014) 4. Matsumoto, D., Leroux, J., Wilson-Cohn, C., et al.: Emotion Recogn. 24(3), 179–209 (2019) 5. Poria, S., Majumder, N., Hazarika, D., Cambria, E., Gelbukh, A., Hussain, A.: Multimodal sentiment analysis: addressing key issues and setting up the baselines. IEEE Intell. Syst. 33(6), 17–25 (2018) 6. Poria, S., Cambria, E., Hazarika, D., Majumder, N., Zadeh, A., Morency, L.P.: Contextdependent sentiment analysis in user-generated videos. In: Proceedings of the 55th Annual
338
7. 8.
9. 10. 11.
12. 13. 14.
Y. Ye and J. Chen Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 873– 883 (2017) Kamnitsas, K., Ledig, C., Newcombe, V., et al.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61 (2016) Eyben, F., Schuller, B., München, T.U.: Music classification with the munich opensmile toolkit (MIREX 2010 Submission) (2013). Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010) Tripathi, S., Beigi, H.: Multi-modal emotion recognition on IEMOCAP dataset using deep learning (2018) Lea, C., Flynn, M.D., Vidal, R., et al.: Temporal convolutional networks for action segmentation and detection. IEEE Comput. Soc. (2016) Choi, W.Y., Song, K.Y., Lee, C.W.: Convolutional attention networks for multimodal emotion recognition from speech and text data. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pp. 28–34, Melbourne, Australia, July 2018. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/W183304.https://www.aclweb.org/anthology/W18-3304 Soleymani, M., Lichtenauer, J., Pun, T., Pantic, M.: A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3(1), 42–55 (2012) Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008) Xi, Z., Guo, J., Bie, R.: Deep learning based affective model for speech emotion recognition. IEEE (2017)
Edge Perception Strategy Based on Data Fusion and Recurrent Neural Network Yize Tang1 , Xinjia Wang1 , Junxiao Shi1 , Yushuai Duan1 , and Qinghang Zhang2(B) 1 Information and Communication Branch of State Grid, Zhejiang Electric Power Co., Ltd,
Hangzhou, China 2 Beijing University of Posts and Telecommunications, Beijing, China
[email protected]
Abstract. With the development of the power grid Internet of things (IoT), more and more sensor devices access the power grid network, putting a lot of pressure on cloud servers. As an effective solution, edge computing can effectively reduce the possibility of network congestion and improve the delay of sensor perception. However, the storage capacity and coverage area of edge node is limited. In this paper, an edge perception strategy based on data fusion and recurrent neural network is proposed. Considering the features of sensor devices in edge area, we first propose a data preprocessing scheme. In order to fuse congeneric data in a region, we then proposed a data fusion method based on self-adaptive average weighting algorithm. Furthermore, a perceptual prediction method based on recurrent neural network (RNN) is proposed. Simulation results show that our proposed strategy can effectively complete sensor data fusion and perception in edge area, and extend the perceptive scope of the edge node. Keywords: Edge perception strategy · Power grid Internet of things · Data fusion · Recurrent neural network
1 Introduction With the development of the power Internet of Things, there are more and more IoT terminals used to collect information at the end of automation, metering, and smart power distribution rooms [1]. The number and variety of IoT terminals are huge, which brings great pressure to cloud computing. As a new type of computing framework, edge computing can effectively reduce the occurrence of network congestion, reduce the delay of sensing services, and thereby improve the safety and reliability of the power system [2]. Collecting a large amount of ubiquitous power IoT terminal perception data through edge nodes has key functions such as shielding the underlying differences, unifying the data format, and carrying ubiquitous access [3]. However, the edge node still has some shortcomings in the perception of multi-source information collected by the sensor terminal. Firstly, the sensory data fusion model is lacking: various sensors currently used at the end of the ubiquitous power Internet of © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 339–348, 2022. https://doi.org/10.1007/978-981-16-6554-7_39
340
Y. Tang et al.
Things have different protocols and data formats. It is necessary to build a unified edge sensor data fusion model to support the protocol compatibility and data fusion of the end edge agent device. Secondly, the collection range of the edge agent device is insufficient: the edge agent device has a small coverage area and limited computing and storage resources. There is a problem of insufficient data in collecting data through edge proxy devices. It is necessary to predict and perceive other data in the space-time dimension based on the collected data, to obtain the information of the electric power Internet more effectively. At present, edge area sensing based on multi-sensor data aggregation is widely used in power, environment monitoring and intelligent home furnishing, etc. Reference [4] proposed a data fusion algorithm based on the time prediction model to achieve effective monitoring and extend the network lifetime. Reference [5] proposed a wearable sensorbased system which using Recurrent Neural Network to predict the activities. Reference [6] proposed a random forest approach to predict air quality to cluster the multiple sensor data which used to group the data and partition the data. Reference [7] proposed a datafusion approach based on long short-term memory RNN to predict the fuel performance, and improves the prognostics accuracy effectively. Reference [8] proposed a multi-sensor data mining prediction model of combining information fusion technology and phasespace reconstruction which can more accurately predict the future. In a word, these works do not fuse the edge area data well and the combination with edge nodes is not close enough. This paper proposes an edge domain sensing strategy based on data fusion and recurrent neural network. Firstly, based on the edge domain perception, fusion, analysis and application of the ubiquitous power Internet of Things, a multi-source data fusion model in the edge domain is established to preprocess data. Secondly, a data fusion method based on adaptive average weighting is proposed. Finally, a perceptual prediction method based on recurrent neural network (RNN) is proposed to further process the data. Simulation results show that the strategy can effectively complete sensor data fusion and perception in edge area, and extend the perceptive scope of the edge node. The strategy proposed in this paper can be directly applied to expand the function of edge agent device, improve the edge perception ability, and support the construction of edge computing in power IoT.
2 System Framework The framework of edge perception based on data fusion and recurrent neural network in this paper is shown in Fig. 1, which mainly includes three layers: cloud computing layer, edge layer and sensor layer. Compared with cloud perception framework, the functions of cloud server are reduced in this framework. Edge layer close to the sensors layer, can efficiently complete sensor data acquisition and analysis [9]. And the sensor layer is the bottom layer, is responsible for collecting environmental parameters, such as temperature, humidity, smoke, etc. The coverage area of edge node is relatively limited, which means cover all areas requires a lot of operation and maintenance overhead. An edge perception area extend method is needed. What’s more, because of the relatively limited computing capacity and
Edge Perception Strategy Based on Data Fusion
341
storage capacity, reasonable data preprocessing scheme and fusion method are needed to reduce the pressure of edge nodes. Therefore, our strategy is divided into three steps, which includes data preprocessing, data fusion, and edge perception prediction method. Sensor Layer Edge Layer
Cloud Layer
Cloud Server Edge Nodes Sensors
Fig. 1. The framework of edge perception based on data fusion and recurrent neural network
3 Cloud Edge Shared Resource Allocation Algorithm Based on Service Partition In this section, we introduce the edge perception strategy based on data fusion and recurrent neural network. 3.1 Data Preprocessing For multi-sensor data, due to the different models and types of sensors, the output data format, data type and data unit are different. And there are many kinds of data in power system, large amount of data, fuzzy correlation information, high redundancy of initial data and high noise content. If there are abnormal data in the data collected by the sensor, one of these abnormal data is caused by hidden dangers in the surrounding environment, which is called effective abnormal data; the other is the invalid abnormal data collected due to abnormal factors such as node failure, which will affect the data fusion results, so it is necessary to preprocess the data measured by the sensor before data fusion. The data preprocessing methods are as follows: Step 1. Handle outliers and missing values: The data collected by sensors will inevitably have outliers. Outliers are deleted directly, and missing values are interpolated by means. Since the distance of the sample attribute is measurable, the average of the valid values of the attribute is used to interpolate the missing values. Interpolation is only to supplement the unknown value with subjective estimation, which is not necessarily in line with the objective facts. Step 2. Feature selection: When the data preprocessing is completed, it is necessary to select meaningful feature input data for training. The feature selection is based on
342
Y. Tang et al.
two aspects: whether the feature is divergent or not. If the feature is not divergent, for example, the variance is close to 0, so the sample has no difference on this feature, and this feature has little influence on the sample differentiation. In terms of the correlation between the feature and the target, the feature with high correlation with the target should be selected. In this paper, through the filtering method, according to the divergence or correlation of each feature score, we set the threshold to be selected. The correlation between qualitative independent variables and qualitative dependent variables was tested by chi-square test: x2 =
(A − E)2 E
(1)
In formula 1, A is the actual value, E is the theoretical value. Step 3. Data normalization: The normalized data of specification and pattern representation is more beneficial to the subsequent data fusion work. In this paper, the physical values with wave function properties are changed into relative values with certain relative relations, so as to narrow the gap between values. The main purpose is to eliminate the difference between the attribute values by normalizing the statistical attributes and cumulative density function of the eigenvector. This paper uses the standard score normalization method to standardize the mean value μ and standard deviation σ of the original data, so that the processed data conforms to the standard normal distribution, that is, the mean value is 0 and the standard deviation is 1. Of course, the mean and standard deviation can be optimized with parameters to make them more universal. The conversion function is as follows: X =
X −μ σ
(2)
3.2 Data Fusion Method Based on Self-adaptive Average Weighting Algorithm Because the measurement accuracy of each sensor in each area is different, if the data received by the sensors in each area are directly fused, the calculation amount is relatively large. Therefore, the data fusion method first uses an adaptive weighted fusion algorithm to process the data of multiple similar sensors in the same domain. Based on the minimum total mean square error and the data measured by each sensor in real time, the dynamics are assigned the corresponding optimal weighting factor Wi (i = 1, 2, . . . ), where the greater the variance of the sensor, the smaller the corresponding weight assigned to obtain the final estimated value X close to the true value X. We adaptively find the corresponding weights according to the method with the smallest mean square error, multiplies the preprocessed data by the weights and adds them to obtain the fusion of the same type of sensors in each area. Suppose the monitoring of n sensors are respectively X1 ,X2 ,...Xn , the variances are σ12 , σ22 , ..., σn2 , and they exist independently of each other, and X is the true value to be estimated; ωi is the corresponding weight, then the fused X and the weighting factor satisfy the following formula:
∧
X=
n i=1
ωi Xi
(3)
Edge Perception Strategy Based on Data Fusion n
ωi = 1
343
(4)
i=1
The overall mean square error σ 2 is: σ 2 = E[
n
Wi2 (X − Xi )2 ] =
i=1
n
ωi2 σi2
(5)
i=1
The adaptive weighted fusion algorithm proposed by this paper finds the corresponding optimal weighted value of each sensor by means of adaptive search, and obtains the optimal fusion result on the premise of minimum total mean square error. According to the theory of extremum of multivariate function, the minimum overall mean square 2 is: error σmin 2 = n σmin
1
1 i=1 σ 2 i
(6)
The corresponding weighting factor of each sensor is ωi is: wi =
σi2
1 n
1 k=1 σ 2 k
, i = 1, 2, ..., n
(7)
In formula 7, we can get the weight of multiple sensors ωi . The final fusion result X . 3.3 Perceptual Prediction Method Based on Recurrent Neural Network After data fusion, a perceptual prediction method based on recurrent neural network is proposed to predict data of the domain not covered by edge node. For RNN, the input of the hidden layer neuron includes not only the input of the upper layer neuron, but also the output of the same layer neuron at the last time, so it has memory for the previous data information [10]. This paper studies the prediction based on location, so the input of RNN is related to the coordinates of the region. Figure 2 depicts the selection method of regional prediction sequence in edge domain. According to the location of the target region, six consecutive regions adjacent to the coverage region are selected as the prediction sequence x1–x6. Recurrent neural network predicts the target through the x1–x6 sequence related to the location, which can better analyze the change trend, so as to effectively expand the sensing range of the edge service agent device. The structure of recurrent neural network based on position weight is shown in Fig. 3. RNN includes input layer, hidden layer and output layer, in which the number of neurons in input layer is 6, and the input data is the environmental parameters of x1–x6 region; The model uses one hidden layer, and the number of neurons in hidden layer is 64; The number of neurons in output layer is one, and the output result is the key environmental parameters of the region predicted by the target, which is the shadow area in Fig. 2, such as temperature, humidity, vibration, etc.
344
Y. Tang et al.
Coverage area Prediction area
x1 x2
x3
x4 x5
x6
y Target prediction domain
Edge node
Fig. 2. Selection method of regional prediction sequence in edge domain y + a3 a4 a5 a6 a1 a2 W W W W W W h5 h6 h4 h3 h1 h2 h0 U
x1
U
x2
U
x3
U
x4
U
x5
U
x6
Fig. 3. Recurrent neural network based on position weight
In this paper, the recurrent neural network based on position weight is proposed. Based on the recurrent neural network, the term of weight coefficient is added to measure the role of each data segment in recognition, so that the network can adaptively find the separable region. In this network, for location xi , there are: di = b + Whi−1 + Uxi
(8)
hi = tanh(di )
(9)
In formula 8, b is the bias vector of the neural network, U ∈ Rm×m is the weight matrix between the input data and the hidden layer, W ∈ Rm×p is the weight matrix between the hidden layers at different times, p is the dimension of the input data, and m is the dimension of the hidden layer. It should be noted that the output category of the current model is different from that of the original recurrent neural network. The current model is to get the final hidden layer by weighted summation of the hidden layer at each time, and then use this hidden layer for prediction. The hidden layer vector c is determined by the hidden layer at each time. The hidden layer hi of each location contains the information of the current time and the previous time, and mainly focuses on the current time. The hidden layer vector c is the weighted sum of the hidden layers at each time: c=
6 j=1
aj hj
(10)
Edge Perception Strategy Based on Data Fusion
345
The weight coefficient aj of each position is: 6 exp(ei ) aj = exp ej /
(11)
i=1
c = U2T tanh(W2 hi )
(12)
In formula 12, W2 ∈ Rl×m , U2 ∈ Rl×l , M is the dimension of hidden layer, l is the dimension of matrix U2 . The coefficient a is obtained by normalizing it with softmax function. ei corresponds to the energy of the hidden layer at the i-th moment, which is mainly determined by the hidden layer at the current moment. Because the hidden layer at the current time mainly corresponds to the input data at the current time, the weight coefficient is data related. The attention mechanism of the model is also determined by parameter aj . By training parameters U2 and W2 , the model can automatically give hidden layer feature hj at different times with different weight coefficient aj . It makes the model more focused on the part that plays a role in recognition. In this paper, a forward network is used to solve the weight coefficient aj . In the network, the parameters can be updated by calculating the partial derivatives of the weight coefficients through the objective function. The output of the neural network is as follows: y = softmax(Vc)
(13)
In formula 13, the matrix V is the weight matrix from hidden layer to output vector. The coefficient aj is related to the input data, so even if the input data has a certain translation sensitivity, the model can still find the separable region to identify. Before the model is put into application, real data need to be used for pre-training, and the learning rate of RNN neural network needs to be adjusted continuously in the training process. The final learning rate of this paper is 0.01, the number of iterations of the final pre-training model is 3000, and the loss function in the training process is the mean square error function. Let sj be the actual value, then the loss function L is: L=
1 n (yj − sj )2 j=1 n
(14)
The gradient function of RNN neural network back propagation is as follows: ∇hi L = W T (∇hi+1 L)diag(1 − h2i+1 ) + V T (∇c+Vhi L)
(15)
After the completion of the pre-training, the neural network can be used in the actual prediction, and then the RNN neural network will be updated with the newly collected real sensor data.
346
Y. Tang et al.
3.4 Perception Strategy Formulation
Start Input sensor data Data preprocessing Data fusion based on selfadaptive average weighting algorithm Perceptual prediction based on RNN End
Fig. 4. The whole flowchart of edge perception strategy
Figure 4 depict the whole flowchart of edge perception strategy. Many sensors in the area are connected to the edge node device. Firstly, edge node preprocesses the data, then uses the adaptive average weighting algorithm to fuse the data, and uses RNN neural network to predict the data in the uncovered area, and finally obtains the edge domain sensing results. The specific steps are as follows: Step 1. Input the real sensor data from the cloud and pre-train the recurrent neural network deployed in the edge service agent device. Step 2. The edge node collects the information of sensor data in its coverage area and preprocesses the data. Step 3. The edge node uses the adaptive average weighted algorithm to fuse the sensor data and extract the key features of the sensor data for perception and prediction. Step 4. The edge node inputs the fused data into the trained recurrent neural network based on position weight, and the recurrent neural network predicts the data of the uncovered edge area. Step 5. The edge node synthesizes the data and features obtained in Step 3 and Step 4, sends the key data to the cloud, and extracts some real sensor data to further train the recurrent neural network.
4 Simulation Experiment In this paper, we used PyCharm software, using python, to conduct simulation experiments for this part. The data collected by multiple sensors at different times and locations, including temperature and humidity, used in this paper, first need to be pre-processed and fused. Figure 5 shows the temperature data before and after the data fusion process, and Fig. 6 shows the humidity data before and after the data fusion process. The blue plot line is the data before processing, and the black highlight is the data after processing.
Edge Perception Strategy Based on Data Fusion
347
In the processing process, we remove unique attributes, handle outliers and missing values, and unify the data, etc. It can be seen that the curve after data fusion is smoother and the normalized data is more convenient for subsequent processing.
Fig. 5. Temperature data processing results
Fig. 6. Humidity data processing results
In the RNN prediction stage, this paper uses TensorFlow to build an RNN neural network, in which the number of neurons in the input layer is 6, the input data are the environmental parameters of 6 consecutive regions, the model uses a hidden layer, the number of neurons in the hidden layer is 64, and the output data are the sensor monitoring values of the target region. The learning rate used in the simulation is 0.01, and the number of iterations of the final pre-trained model is 3000. As a simulation example of this paper, Fig. 7 shows the performance of the training method proposed in this paper with the CNN neural network under this training set. The error of both training methods decreases with the increase of the number of iterations, and the prediction accuracy is improving. Among them, the error rate of the improved RNN-based training method proposed in this paper is lower than that of the CNN under the condition of the same number of iterations.
Fig. 7. Training results of RNN
5 Conclusion Considering the limitation of edge node’s coverage and capacity, this paper proposes an edge perception strategy based on data fusion and recurrent neural network. Firstly,
348
Y. Tang et al.
considering the features of sensor devices in edge area, we propose a data preprocessing scheme. Secondly, we propose a data fusion method based on self-adaptive average weighting algorithm to fuse congeneric data in a region. Thirdly, a perceptual prediction method based on RNN is proposed. Simulation results show that the strategy can effectively complete sensor data fusion and perception in edge area. As a future work, we plan to set up more experiments to evaluate our strategy. Meanwhile, our edge perception strategy will be extended to other application scenarios. Acknowledgment. The work is supported by the Science and Technology Project of State Grid Zhejiang Electric Power Co., Ltd (5211XT19006F).
References 1. Wang, C., Li, X., Liu, Y., Wang, H.: The research on development direction and points in IoT in China power grid. In: 2014 International Conference on Information Science, Electronics and Electrical Engineering, Sapporo, pp. 245–248 (2014). https://doi.org/10.1109/InfoSEEE. 2014.6948106 2. Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016) 3. Chen, S., et al.: Internet of Things based smart grids supported by intelligent edge computing. IEEE Access 7, 74089–74102 (2019). https://doi.org/10.1109/ACCESS.2019.2920488 4. Yang, M.: Data aggregation algorithm for wireless sensor network based on time prediction. In: 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC), pp. 863–867 (2017) 5. Uddin, M.Z.: A wearable sensor-based activity prediction system to facilitate edge computing in smart healthcare system. J. Parallel Distributed Comput. 123, 46–53 (2019) 6. Brumancia, E., et al.: Air pollution detection and prediction using multi sensor data fusion. In: 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 844–849 (2020) 7. Rui, M., et al.: Data-fusion prognostics of proton exchange membrane fuel cell degradation. IEEE Trans. Ind. Appl. 55, 4321–4331 (2019) 8. Zhang, B.: Fusion prediction of mine multi-sensor chaotic time series data. J. Comput. Appl. (2012) 9. Zhang, Y., Liang, K., Zhang, S., He, Y.: Applications of edge computing in PIoT. In: 2017 IEEE Conference on Energy Internet and Energy System Integration (EI2), Beijing, pp. 1–4 (2017). https://doi.org/10.1109/EI2.2017.8245749 10. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 6645–6649 (2013). https://doi.org/10.1109/ICASSP.2013.6638947
DQN-Based Edge Computing Node Deployment Algorithm in Power Distribution Internet of Things Shen Guo(B) , Peng Wang, Jichuan Zhang, Jiaying Lin, Chuanyu Tan, and Sijun Qin China Electric Power Research Institute Co., LTD., Beijing 100192, China [email protected]
Abstract. In the environment of power distribution Internet of Things, the large network energy consumption and network traffic of edge computing nodes is an urgent problem to be solved. To solve this problem, this paper proposes a DQNbased edge computing node deployment algorithm in the power distribution Internet of Things. According to the business logic of the power distribution Internet of Things, a network model of the edge computing network architecture is proposed. The network model includes computing request nodes, network nodes, network links, and edge computing nodes. The energy consumption model and network traffic model are constructed, and the objective function is constructed with the goal of minimizing network traffic and network energy consumption under the premise of completing the calculation tasks of smart nodes. By defining the states, actions and rewards in DQN theory, a DQN algorithm is proposed to solve the optimal deployment strategy of edge computing nodes. In the experimental part, the performance of the algorithm is analyzed from the two dimensions of the network scale and the number of calculation requests, and it is verified that the algorithm in this paper can reduce network energy consumption and network traffic. Keywords: Power distribution Internet of Things · Edge computing · Edge computing nodes · Network energy consumption · DQN
1 Introduction With the rapid development of power distribution Internet of Things, the number of smart nodes in the power distribution Internet of things is increasing rapidly, and the number of computing task requests for each smart node is rapidly increasing. The network architecture that only provides computing services with cloud servers cannot meet the computing needs of smart nodes for delay-sensitive tasks [1, 2]. To solve this problem, edge computing technology is integrated into the network architecture of the power distribution Internet of Things to provide fast computing services for smart nodes. Edge computing technology adopts the distributed deployment strategy of edge nodes to deploy edge computing nodes closer to smart nodes, thereby reducing the computing time of smart node computing tasks, and reducing computing tasks’ requirements for network traffic and network energy consumption [2]. However, the number of edge computing nodes © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 349–357, 2022. https://doi.org/10.1007/978-981-16-6554-7_40
350
S. Guo et al.
is limited. How to deploy the optimal edge computing nodes to quickly respond to the computing request tasks of the delay-sensitive smart nodes, thereby reducing the network energy consumption and network traffic of the smart node computing tasks, is an important issue research content. Literature [3] uses dynamic migration technology to propose a heuristic dynamic migration strategy to improve edge computing resource utilization. Literature [4] adopts the design idea of distributed multi-center, and proposes the resource deployment strategy of edge nodes in different regions, which improves the utilization of network resources. Literature [5] proposes a resource deployment strategy in which cloud computing platforms and edge nodes cooperate with each other to reduce network energy consumption. Literature [6] uses machine learning algorithms to propose edge node resource deployment strategies with self-learning capabilities, which improves the utilization of edge node resources. Through the analysis of the existing research, it can be known that the existing research has achieved more research results in the deployment strategy of edge nodes. However, existing research mainly solves the problem of edge node deployment when the network scale is fixed. When the edge node network environment changes, how to solve the optimal edge node deployment strategy is an urgent problem to be solved. In order to solve this problem, this paper models the edge node deployment problem as a DQN network, and proposes a DQN-based edge computing node deployment algorithm, which better solves the optimal edge computing node deployment problem. It is verified through experiments that the algorithm in this paper reduces the network traffic and network energy consumed by the computing tasks of smart nodes.
2 Problem Description In the power distribution Internet of Things environment, in order to reduce the traffic consumption and energy consumption of the computing tasks of smart nodes in the network environment, this paper proposes a network model of the edge computing network architecture. The network model includes computing request nodes, network nodes, network links, and edge computing nodes. The computing request node refers to the intelligent node of the power distribution IoT environment under the coverage of the edge network. Each computing request node can make multiple computing service requests to the edge computing node. Use q ∈ Qr to represent computing service requests. The attributes of each computing service request q include the amount of computing tasks and the amount of communication bandwidth. Use (oq , cq ) to represent the attributes of computing service request q. Among them, oq represents the amount of computing tasks proposed by computing service request q to the edge node, and cq represents the amount of communication that needs to be consumed when computing service request q submits computing service tasks to the edge node. The network node and the network link refer to the network resources through which the computing request node uploads the computing service request to the edge computing node. Each network node and network link have a certain network bandwidth. With the development of 5G network and optical transmission technology, network bandwidth is getting larger and larger. This article ignores the limitation of network bandwidth, and
DQN-Based Edge Computing Node Deployment Algorithm
351
mainly considers the energy consumed by the computing requesting node to upload the computing task to the edge node. Use ptr,link to indicate the amount of energy consumed when each unit of computing task passes through each network link, and use ptr,node to indicate the amount of energy consumed when each unit of computing task passes through each network node. The task of the edge computing node is to complete the task of the computing request node. This paper mainly studies the optimal deployment strategy of edge computing nodes, so as to effectively reduce energy consumption and network traffic on the premise of completing the computing tasks of computing request nodes. Assuming there are N edge computing nodes, each edge computing node is represented by siE ∈ SiE , and the computing capability threshold of each edge computing node is represented by os . Since edge computing nodes generally use virtualization technology, each edge computing node will contain multiple virtual machines. Use pactive to represent the energy consumption per unit of computing task executed on the virtual machine, and use pstatic to represent the basic operating energy consumed when the edge computing node does not execute the computing task. The energy consumption when computing tasks are executed on virtual machines includes computing energy and transmission energy. Computing energy consumption is the energy consumption of edge computing nodes when performing computing tasks and the basic energy consumption of edge computing nodes when they are running. Among them, the energy consumption of edge computing nodes performing computing tasks is q calculated using formula (1), which represents the dynamic energy consumption Eactive of edge computing nodes performing computing tasks. The basic energy consumption when the edge computing node is running is calculated using formula (2), which represents the static energy consumption of the edge computing node not performing computing tasks. Among them, mq is the energy consumption of static virtual machines when edge computing nodes do not perform computing tasks. q
(1)
q
(2)
Eactive = oq · pactive Estatic = mq · pactive
Transmission energy consumption is the energy consumption of network nodes and network links consumed in the process of computing tasks from smart nodes to edge computing nodes. The transmission energy consumption of computing task data is represented by Etr , and calculated using formula (3), which includes link transmission energy consumption ptr,link and node transmission energy consumption ptr,node . dq represents the average number of hops from computing task request q to edge computing node. Etr = cq · [ptr,link · dq + ptr,node · (dq + 1)]
(3)
The network traffic that the computing task request needs to be uploaded from the smart node to the edge computing node is represented by Tq , and the solution is solved by formula (4), which represents the network traffic of the computing task request q ∈ Qr . Tq = cq · dq
(4)
352
S. Guo et al.
The goal of this article is to minimize network traffic and network energy consumption on the premise of completing the calculation tasks of smart nodes. Considering that this paper uses Deep Q-Learning Network (DQN) to solve this problem, it is necessary to define the activation function, so this paper adopts the definition of the objective function as the maximization objective function of formula (5). Among them, ϕ and γ represent the balance coefficient between energy consumption and network traffic. q Etotal represents the total energy consumption of the calculation task, calculated using q formula (7), and Ttotal represents the total network traffic, calculated using formula (8). Constraint (6) indicates that the sum of the computing tasks of the computing request nodes carried on the edge node s cannot exceed its computing capability threshold os . osq represents the computing request node q that offloads the computing task on the edge node s. max F(ϕ s.t. q
Etotal =
q∈Q q
Ttotal =
1 q Etotal
+γ
X
os q=1 q
q
1
q ) Ttotal
(5)
≤ os
(6)
q
q
(Estatic (mq ) + Eactive + Etr (mq )) q∈Q
Tq =
q∈Q
cq dq
(7) (8)
3 Algorithm The deep Q-learning network algorithm is an upgraded version of the reinforcement learning algorithm, with self-learning capabilities for complex problems. In the DQN algorithm, a neural network is used to replace the Q table in the Q-Learning algorithm to solve the problem of large space and high computational complexity in the large continuous state space. The following first models the deployment of edge computing nodes as a problem that can be solved by the DQN algorithm, and then proposes an edge computing node deployment algorithm based on DQN. 3.1 DQN Model In order to use the DQN algorithm to solve the deployment algorithm of edge computing nodes, it is necessary to define the state, action, and reward first, and then use the DQN algorithm to solve the optimal deployment strategy of edge computing nodes. The DQN model is described below.
DQN-Based Edge Computing Node Deployment Algorithm
353
In terms of state, the optimal computing node deployment strategy is defined as state. Take all feasible deployment strategies DPi as state space S = {DP1 , DP2 , ..., DP2n −1 }, a total of 2n − 1 strategies, where n represents the number of nodes that can deploy computing nodes. In terms of action, the adjustment factor of the deployment strategy of computing nodes is defined as action. That is, from the node set of deployable computing nodes, node is arbitrarily selected, and its deployment status is reversed. The action value is a ∈ [6, 5, 4, 3, 2, 1]. In terms of reward, the sum of the reciprocal of energy consumption and the reciprocal of network traffic of each edge computing node deployed is defined as the reward of the current edge computing node, which is calculated by formula (9). At this time, the total revenue of all network nodes deployed as edge computing nodes is calculated using formula (10). Among them, K represents the number of network nodes deployed as edge computing nodes. ψ and ζ represent adjustment factors for energy consumption and network traffic. rewardk = ψ rewardall =
K k=1
1 q Etotal
rewardk =
+ζ
K k=1
1
(9)
q Ttotal
ψ
1 q Etotal,k
+ζ
1 q Ttotal,k
(10)
Reward is the reward obtained after taking an action in the current state. The deployment strategy of edge computing nodes will affect the performance of computing tasks in multiple time slices. Therefore, a value function needs to be defined to evaluate the long-term return of the edge computing node deployment strategy. The value function is defined as the sum of the maximum value of current return and future return, calculated using formula (11). γ represents the discount factor of future returns, which is used to reduce the impact of future returns on the execution results of the current action. Q(st , at ) = rt + γ max(Q(s , a ))
(11)
The state environment of the edge node deployment strategy solved in this paper is related to the network scale. When the network scale increases, the state environment will increase rapidly. The traditional Q-Learning algorithm is only suitable for scenarios with few discrete state environments. If the Q-Learning algorithm is used to solve the problem in this article, the search process of the state space will be very complicated. Therefore, this paper uses the DQN algorithm based on neural network theory to solve the problem. In order to obtain the optimal Q value, the neural network is optimized using the loss function L of formula (12). L=
1 · [r + max Q(s , a ) − Q(s, a)]2 2 a
(12)
3.2 DQN-Based Edge Computing Node Deployment Algorithm The DQN-based edge computing node deployment algorithm (ECNDAoDQN) proposed in this article is shown in Table 1. The algorithm includes three steps: initializing the
354
S. Guo et al.
key parameters of training the DQN network, training the DQN network, and using the DQN network to solve the optimal deployment strategy. In the step of initializing the key parameters of training the DQN network, the key parameters include memory pool size D, training pool size d , DQN network weight, greedy algorithm probability ε, and state space DPi . In the steps of training the DQN network, the steps include using a greedy algorithm to select the action with the maximum Q value, perform the action to obtain a reward, obtain a new state, store the state transition result in the memory pool, and train the DQN network. In the step of using the DQN network to solve the optimal deployment strategy, based on the trained DQN network, the action that can get the maximum Q value in the current state is solved, and the state is adjusted. The new state obtained is the optimal edge computing node deployment strategy DPi . Table 1. DQN-based edge computing node deployment algorithm
DQN-Based Edge Computing Node Deployment Algorithm
355
4 Performance Analysis In order to analyze the performance of the algorithm, the GT-ITM tool is used to generate the network topology environment [7]. In terms of node deployment, edge network nodes are used as computing request nodes, and non-edge network nodes are used as candidate nodes for edge nodes. In order to verify the performance of the algorithm under different network scales, the number of network nodes was increased from 50 to 100. In order to analyze the performance of the algorithm under different calculation requests, the amount of calculation tasks and communication requests requested by each calculation request node is increased by 20%. In terms of algorithm performance evaluation indicators, the edge network benefits of the algorithm are analyzed from the network scale and the number of calculation requests. Compare the algorithm ECNDAoDQN in this paper with Random deployment algorithm for edge computing nodes (RDAoECN). The algorithm RDAoECN randomly deploys edge nodes according to the number of tasks requested by the calculation to meet the tasks of the calculation request.
Fig. 1. Comparison of network revenue under different network scales
Figure 1 shows the comparison result of edge network revenue under different network scales. The X-axis in the figure represents the increase in network size from 50 to 100, and the Y-axis represents the revenue of the edge network. It can be seen from the figure that as the scale of the network gradually increases, the energy consumption and network traffic of the network gradually increase, and the revenue of the edge network gradually decreases. In terms of performance comparison of the two algorithms, the network revenue of the algorithm in this paper is higher than that of the comparison algorithm. It shows that the edge computing nodes deployed by the algorithm in this paper are located better, which reduces network energy consumption and network traffic consumption.
356
S. Guo et al.
Fig. 2. Comparison of network revenue under different requested resources
Figure 2 shows the comparison result of the edge network’s revenue under different computing requests. The X axis in the figure represents the amount of computing tasks and communication requested by each computing request node starting from 100 and increasing in increments of 20%. The Y axis represents the revenue of the edge network. It can be seen from the figure that as the number and scale of computing requests increase, the energy consumption and network traffic of the edge network gradually increase, and the revenue of the edge network decreases. This is because the amount of computing requests increases and more network resources need to be consumed, so the revenue of the edge network gradually become smaller. From the comparison of the results of the two algorithms, it can be seen that the network revenue under the algorithm in this paper are higher. It shows that the edge network node deployment strategy of the algorithm in this paper is better than the comparison algorithm.
5 Conclusion With the gradual expansion of the power distribution Internet of Things, the network architecture based on edge computing nodes has gradually become the main technology for processing smart node tasks. In this context, reducing network energy consumption and network traffic by deploying optimal edge computing node locations is an urgent problem to be solved. To solve this problem, this paper models the deployment problem of edge computing nodes, and adopts DQN theory to propose an optimal deployment strategy algorithm for edge computing nodes. It is verified by experiments that the algorithm in this paper can reduce network energy consumption and network traffic. Due to the limited computing power of edge computing nodes, when the amount of computing tasks of smart nodes is large, how to use the cooperation of cloud computing platforms and edge computing nodes to complete computing tasks is an important issue. In the next step, based on the research results of this article, the collaborative algorithm of cloud computing platform and edge computing node will be studied to further improve
DQN-Based Edge Computing Node Deployment Algorithm
357
the performance of task processing algorithm of intelligent node of power distribution Internet of Things. Acknowledgments. This work is supported by State Grid Corporation of China Research Program “Research and Application of Power Distribution Internet of Things (PD-IoT) based on Edge Computing and Software Defined Terminals” (SGJSDK00DWJS1900205).
References 1. Luo, X., Zhang, S., Litvinov, E.: Practical design and implementation of cloud computing for power system planning studies. IEEE Trans. Smart Grid 10(2), 2301–2311 (2019) 2. Negash, B., et al.: Leveraging fog computing for healthcare IoT. In: Rahmani, A., Liljeberg, P., Preden, J.S., Jantsch, A. (eds) Fog Computing in the Internet of Things. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-57639-8_8 3. Moghaddam, M.H.Y., Leon-Garcia, A.: A fog-based internet of energy architecture for transactive energy management systems. IEEE Internet Things J. 5(2), 1055–1069 (2018) 4. Ouyang, T., Zhou, Z., Chen, X.: Follow me at the edge: mobility-aware dynamic service placement for mobile edge computing. IEEE J. Sel. Areas Commun. 36(10), 2333–2345 (2018) 5. Skarlat, O., Nardelli, M., Schulte, S., et al.: Towards QOS-aware fog service placement. In: 2017 IEEE 1st International Conference on Fog and Edge Computing (ICFEC), pp. 89–96. IEEE (2017) 6. Xu, J., Chen, L., Zhou, P.: Joint service caching and task offloading for mobile edge computing in dense networks. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications, pp. 207–215. IEEE (2018) 7. Müller, S., Atan, O., van der Schaar, M., et al.: Context-aware proactive content caching with service differentiation in wireless networks. IEEE Trans. Wireless Commun. 16(2), 1024–1036 (2016) 8. Zegura, E.W., Calvert, K.L., Bhattacharjee, S.: How to model an internetwork. In: Proceedings of IEEE INFOCOM 1996. Conference on Computer Communications, vol. 2, pp. 594–602. IEEE (1996)
Research on Power-Stealing Behaviors of Large Users Based on Naive Bayes and K-means Algorithm Liming Chen1(B) , Xuzhu Dong1 , Baoren Chen2 , Xiaoping Qiu3 , Zhengrong Wu1 , Zhiwen Liu1 , and Qunying Lei3 1 Department of Smart Grid, Electrical Power Research Institute of CSG, Guangzhou 510663,
Guangdong, China [email protected] 2 Dispatch Center, China Southern Power Grid (CSG), Guangzhou 510663, Guangdong, China 3 Chongqing Xiaomu Technology Co., Ltd., Chongqing 401121, China
Abstract. Firstly Using Naive Bayes mining the event data from power user electric energy data acquire system (DAS), establishes an anomaly model of the electricity characteristics; then integrating the anomaly model, data from DAS, the industry characteristic data of the user profile, the weather data, the geographical location, the holiday data and other data in multiple heterogeneous source systems, using K-means mine and reveals risk users of power-stealing. The above algorithm is used to calculate the outlier degree of the user, thereby locating the user’s suspicion degree of stealing electricity and the method of stealing electricity, Therefore, targeted high-risk users, precise inspection methods are used to obtain evidence of stealing electricity. To narrow inspection down and provide more accurate inspection objectives and methods, so as to improve electricity antistealing efficiency through this method analyzing high-risk users from multiple dimensions and multiple angles. Finally, through user profile and DAS’s data about 5100 big users from 2017–2018, combined with the geographical location of the user, the weather and holidays of the year, after calculation, six high-risk users are obtained through the scene. The on-site inspection confirmed that the calculation results are in line with the calculation expectations. Keywords: Electricity anti-stealing · Big data · Naive Bayes · K-means
1 Introduction With the development of the power industry all over the world, since 2000, smart grid construction has gradually rised around the world. Among them, smart meters and power consumption information management are important components of smart grid. Europe, the United States, Japan, and China have successively carried out the process of digitizing smart meters and user information. In 2001, Italy realized the online measurement of more than 30 million smart meters; in 2008, France began to implement the smart meter replacement program; in 2008, the United States built the first smart grid city in Corolla; © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 358–367, 2022. https://doi.org/10.1007/978-981-16-6554-7_41
Research on Power-Stealing Behaviors of Large Users
359
in 2010, Japan began the meter the smart meter replacement plan; Beginning in 2008 by 2018, State Grid Corporation of China (SGCC) and China Southern Power Grid Company Limited (CSG), the two world’s number one and second grid companies, have basically built smart meters covering the entire network, and corresponding meter data acquisition system. At present, SGCC operates more than 400 million smart meters on the network, CSG operates about 70 million ones. Based on the smart meter and its acquisition system, SGCC and CSG have also established a digital marketing platform. The main source of power marketing data is the marketing business application platform and power user electric energy data acquire system platform. The data mainly includes the user profile, power user electric energy historical data, and the user’s current data. Related studies have shown that [9], the amount of data generated globally has roughly doubled every two years. The growth of data volume in China’s power industry has also shown a similar situation. The marketing SG186 system and DAS established by SGCC have collected power consumption information of hundreds of millions of users and generated massive power data. These data have typical big data 4V (Volume, Variety, Velocity, and Value Density) features. According to the preliminary analysis of China power grid corporation, there are power-stealing behaviors in power users. Some units and individuals use power-stealing as a means of profit, and adopt various methods to discard or reduce the amount of electricity in order to achieve the purpose of not paying or paying less. These powerstealing behaviors not only make the line loss high, resulting in a large loss of power, economic losses, and even threaten the safe operation of the power grid. Especially in recent years, with the development of social economy, power-stealing has become more technical and group-oriented, the means are more sophisticated, and the methods are more concealed. The existing methods of electricity anti-stealing can not find and adapt in time, and the difficulty of electricity anti-stealing is getting bigger. Based on power Big Data, combined with the industry characteristic data of the user profile, the weather data, the geographical location, various event data and user data from DAS, using Data mining algorithms to establish anti-stealing monitoring and analysis models, timely positioning risk-user, narrowing anti-stealing work down is an effective means to improve the efficiency of anti-stealing work and improve the level of anti-stealing management.
2 Big Data Anti-stealing Analysis Method 2.1 Common Power-Stealing Means Analysis and Common Ani-stealing Means Analysis The common technical power-stealing methods are shown in Fig. 1, which mainly includes two types of inside-meter stealing method and outside-meter stealing method. Inside-meter stealing method is to reduce the amount of electricity used by changing the measuring components inside the meter to make measurement errors. Outside-meter stealing method usually bypasses the normal metering of the meter, which makes the metering of the meter unable to calculate the amount of electricity used. Inside-meter stealing method mainly includes three types of inside-meter voltage division method by increasing divided resistor in the meter, inside-meter expansion
360
L. Chen et al.
method by changing the CT ratio or changing sampling circuit, inside-meter shunting method by remote controlling shunting or adding shunting wire or shielding inductive shunt. Outside-meter stealing method mainly includes four types of outside-meter voltage division method by loosening the voltage contiguous piece, outside-meter shunting method adding hidden shunting wire or adding short-circuit ring, high magnetic failure method, outside-meter changing wire by inversing connection of in-wire and outwire\zero-breaking\zero-borrow grounding, changing the three-phase wire or hanging wire in front of the meter.
Fig. 1. Common power-stealing means
Among the data from DAS, there is a large amount of event information, which can initially determine whether the power user behavior is abnormal, and whether there is suspicion of power-stealing behavior. In DAS, the following abnormal events can usually be acquired [2]: metering three-phase voltage imbalance event, metering no-voltage event, voltage reverse phase sequence event, metering three-phase current imbalance event, metering no-current event, metering current reverse phase sequence event, metering current reverse event, meter programming event, metering phase-breaking event, meter cover-opened event, power-cutting event, etc. Through the above event data, the naive Bayes classification algorithm can be used to establish a basic abnormal power consumption model, but this model cannot be used
Research on Power-Stealing Behaviors of Large Users
361
as the exact criterion for power-stealing behavior, can analyze the users with power anomalies. The model can be used as input data for further analysing. The naive Bayes classification algorithm provides a classification method for power-stealing by the event data, and then uses the K-means clustering algorithm to implement semi-supervised learning mode to obtain risk-users. 2.2 Abnormal Power Consumption Model Analysis Method Based on the Event Data from DAS In the data from DAS by the power grid company, there are two types of event data sets, one is the manually confirmed data set A, and the other is the unconfirmed data set B, using the naive Bayes classification method, to count the events in set A, and obtain the conditional probability of each event, and then applying the probability to set B, and obtaining all abnormal power usage events. The naive Bayesian algorithm [8] is a very simple classification algorithm, which is realized by the extension of the number constructed by the Bayesian network model [7]. It is one of the best classifiers at present. For a given item to be classified, the conditional probability of each event occurrence under each category is solved, and the given item belongs to the category that has the maximum probability factor. The naive Bayes classification is defined as follows: (1) Suppose the item x to be classified has m attributes, where am is the m’th attribute: x = {a1 , a2 , . . . , am } (2) The classification set C has n classifications, where yn is the n’th classification: C = {y1 , y2 , . . . , yn } (3) Calculate the probability values that x belongs to each category: P(y1 |x), P(y2 |x), . . . , P(yn |x) (4) Comparing the probability value of item x belonging to each category, where the probability value is the largest, then the item x belongs to this category: P(yk |x) = max{P(y1 |x), P(y2 |x), . . . , P(yn |x)} then x ∈ yk The method of realizing the abnormal power consumption model by using naive Bayes classification algorithm is as follows: (1) Establish a training sample set: Find a collection of known classifications that are implemented by manual classification. (2) Count the conditional probability of each category: P(a1 |y1 ), P(a2 |y1 ), . . . , P(am |y1 ); P(a1 |y2 ), P(a2 |y2 ), . . . , P(am |y2 ); . . . .; P(a1 |yn ), P(a2 |yn ), . . . , P(am |yn )
362
L. Chen et al.
(3) If each feature attribute is conditionally independent, it is derived from Bayes’ theorem as follows: P(yi |x) =
P(x|yi )P(yi ) P(x)
The denominator of the above formula is constant for all categories, maximizing the numerator, and because each feature attribute is conditionally independent, there is the formula as follows: P(x|yi )P(yi ) = P(a1 |yi )P(a2 |yi ), . . . , P(am |yi )P(yi ) = P(yi )
m P aj |yi j=1
The implementation process of the above algorithm is as follows: (1) Data set A is a data set of known classifications, and this data set is used as a training sample. Each data item x in the data set A has 12 properties: metering three-phase voltage imbalance event, metering no-voltage event, voltage reverse phase sequence event, metering three-phase current imbalance event, metering nocurrent event, metering current reverse phase sequence event, metering current reverse event, meter programming event, metering phase-breaking event, meter cover-opened event, power-cutting event. The value of the attribute is 0 or 1, and 0 means the event has occurred, 1 means the event hasn’t occurred. The category is y = {0, 1}, and the number of categorie is 2, where 0 is abnormal category and 1 is normal category. (2) Using the manually confirmed data set A, the conditional probability for each category are calculated and implemented as follows:
based on data set A, using naive Bayes to obtain Anomaly model. (3) Using the data in the unconfirmed data set B to calculate the conditional probability value of the item x to be classified as each category value, and confirm its category, thereby classify data in data set B as naive events and normal event items by naive Bayes, and providing abnormal event attribute of item for the following clustering of suspected power-stealing users.
Research on Power-Stealing Behaviors of Large Users
363
2.3 Suspected Power-Stealing User Clustering Method Based on Multi-source Data After the data classification set is obtained by the Naive Bayes, K-means cluster analysis is performed in combination with weather data, geographic location data, and user profile industry feature data to obtain a set of suspected power-stealing users. The K-means clustering algorithm divides the data objects with higher similarity into the same cluster according to the principle of similarity, and divides the data objects with higher dissimilarity into different clusters. Steinhaus in 1955, Lloyd in 1957, Ball & Hall in 1965, and McQueen in 1967 independently proposed the K-means clustering algorithm in their respective different scientific research fields [4]. Although the Kmeans clustering algorithm has been proposed for more than 60 years, it is still one of the most widely used clustering algorithms [5]. K-means algorithm takes distance as the measure of similarity between data objects. Usually, Euclidean distance is used to calculate the distance between data objects. The formula for calculating the Euclidean distance is as follows: D (xi,d , xj,d )2 dist xi , xj = d =1
Where D is the number of attributes of the data object. Define the class cluster center of the kth as Center k , the cluster center update method is as follows: Center k =
1 xi |C k | xi ∈Ck
In the K-means algorithm clustering process, the cluster center of each iteration needs to recalculate the mean of all data objects in the updated cluster, and the updated value is taken as the new cluster center. Ck represents the kth cluster, |Ck | indicates the number of data objects in the kth cluster, Center k is the average of the corresponding attribute values of all data objects in the k-th cluster, Center k has D attributes, and its j’th attribute formula is as follows: Center k,j =
1 xi,j |C k | xi ∈Ck
The sum of squared deviations of clusters is used to determine the pros and cons of clustering results, the formula is as follows: J=
K k=1 xi ∈Ck
dist(xi , Center k )
364
L. Chen et al.
The implementation process of the algorithm is described as follows:
K-means is relatively scalable and efficient for large data sets, but the disadvantage of K-means is that the number k of clusters must be given in advance. Inaccurate k values can lead to a decrease in cluster quality, and for more complex Clustering structure, clustering results are susceptible to the initial clustering center, resulting in unstable clustering results [10]. For the precise location of users suspected of power-stealing, it is necessary to classify the data into two categories, that is, suspected power-stealing users and non-suspicion power-stealing users, that is, K = 2, and the number of clusters is determined. The K-means algorithm is generally used for unsupervised learning, and the classification of data is uncertain. In the data set studied in this paper, some of the data were manually identified. Therefore, this paper proposes a semi-supervised learning method using prior knowledge to be applied to the K-means classification algorithm. The initial values are set as follows: (1) Cluster number K = 2. (2) The data of the manual confirmation classification in the data set is extracted, and is divided into the power-stealing user data set A and the non power-stealing user B, and a is randomly selected in the data set A, b is randomly selected in the data set B, take (a, b) as the initial clustering center of the K-means. Because the number of clusters and the value of the initial cluster center are completely consistent with the requirements, the clustering results obtained by applying K-means that calibrate the initial cluster center value are not easy to enter the local optimal value, and it is easier to reach the global optimal value. 2.4 Implementation Method of Large User Anti Power-Stealing Data Analysis According to the abnormal power consumption model analysis of the collected events, and the clustering method of suspected power-stealing users based on multi-source data, the specific implementation process is shown in Fig. 2.
Research on Power-Stealing Behaviors of Large Users
365
Fig. 2. Overall implementation process
First, the collected event samples are confirmed according to Naive Bayes, and the result is input as an attribute of the data item; then the user’s power consumption, electricity bill, quarterly average power consumption, quarterly average electricity bill, and annual average power consumption, annual average electricity bill, average power consumption in the last three months, average electricity bill in the last three months, quarterly power consumption variance, quarterly electricity bill variance, annual power consumption variance, annual electricity bill variance, three consecutive months of power consumption Variance, three consecutive months of electricity bill variance, the data of the region, weather, holidays, etc. are also input as the attributes of the data item, the data is standardized, and then clustered using the K-means, thereby obtaining the suspected power-stealing users with priority. (1) The event data in manual validated data set A is used as training sample, and the anomaly feature model is obtained by Naive Bayes. The anomaly feature model and Naive Bayes are applied to the unacknowledged data set B to obtain anomaly event data. The event data in manual validated data set A is used as training sample, and the anomaly feature model is obtained by Naive Bayes. The anomaly feature model and Naive Bayes are applied to the unacknowledged data set B to obtain anomaly event data. (2) Take abnormal event, the user’s power consumption, electricity bill, quarterly average power consumption, quarterly average electricity bill, and annual average power consumption, annual average electricity bill, average power consumption in the last three months, average electricity bill in the last three months, quarterly power consumption variance, quarterly electricity bill variance, annual power consumption variance, annual electricity bill variance, three consecutive months of power consumption Variance, three consecutive months of electricity bill variance, the data of the region, weather, holidays as attributes of item.
366
L. Chen et al.
(3) Randomly selecting a manually determined power-stealing user and a manually confirmed unstealing user as the initialization cluster center of the K-means algorithm, and then performing K-means clustering; (4) K-mean clusters the suspected power-stealing user group, and according to the European distance of the object and the cluster center as the sorting basis of the suspected power-stealing user group priority sequence, the closer the cluster center is to the suspected power-stealing user high.
3 Experimental Result According to the profile data of 5,100 large users of 2017–2018 provided by a power company and data from DAS, the user’s house number is obtained from the user profile data, and then the geographical location information is obtained from the Baidu map interface, and the local daily weather information is obtained from the weather network, from the public website to obtain national holiday information as an external data source. After standardizing the above data, it is saved as a txt file, and the sequence of the suspected power-stealing user is obtained by calculation, as shown in Table 1. According to the calculation results, after the on-site inspection by the power company, it was found that the above-mentioned users did have power-stealing behaviors. This method can narrow inspection down, provide preliminary criteria for inspection, can improve the anti-stealing technology level of the power company and improve the work efficiency. Table 1. The sequence of the suspected power-stealing user No
Household name
Household number
Meter number
Serial number
Suspected power-stealing means
1
San Zhang
3011037053
333001000100046740765
1
Hanging wire in front of the meter
2
Qian Li
3011037000
333001000100046740335
2
Hanging wire in front of the meter
3
Daming Zhang
3011037043
333001000100046748799
3
Inside-meter shunting
4
Si Li
3011037567
333001000100046747565
4
Inside-meter shunting
5
Er Wang
3011037236
333001000100046748021
5
Inside-meter voltage division
6
Hu Zhao
3011037052
333001000100046748028
6
Inside-meter voltage division
Research on Power-Stealing Behaviors of Large Users
367
4 Conclusion The techniques of power-stealing are constantly improving, and the difficulty of inspection is increasing. The use of big data technology for anti-stealing analysis is of great significance for improving the technical inspection methods of power companies and improving work efficiency. This paper uses Bayesian classification algorithm to generate anomalous event feature model, and then uses K-means clustering algorithm to combine user profile data, use electrical information data, geographic location data, weather data, and holiday data to locate suspected power-stealing users, Analysis from multiple dimensions and multiple angles avoids the isolation and non-coordination of a single analytical model. The above algorithm is used to calculate the outlier degree of the user, thereby locating the user’s suspicion degree of power-stealing and the method of power-stealing, so as to targeted high-risk users use accurate inspection methods to obtain evidence of power-stealing. To narrow inspection down and provide more accurate inspection objectives and methods, so as to improve electricity anti-stealing efficiency through this method analyzing high-risk users from multiple dimensions and multiple angles. The next step will be to combine the primary topology model of the grid and the secondary scheduling data, from the overall operation of the substation and the wire, through the grid model and theoretical line loss analysis, more accurately locate the power-stealing user method, combined with specific defense Stealing on-site inspection equipment to achieve fast and efficient on-site inspection.
References 1. Zhou, S.: Talking about the application of big data in power user information collection system in customer behavior research. East China Sci. Technol. (8), 271–327 (2014) 2. Zhang, J., Liu, X., Zhang, S.: Analysis of electricity abnormal events statistics and powerstealing characteristics based on marketing big data. Distrib. Utillization 35(6), 77–82 (2018) 3. Cai, J., Wang, S., Wu, G.: The user’s electric power prediction and electricity inspection plan based on machine learning research on auxiliary arrangement. Electron. Test (2), 108–109 (2018) 4. Wang, Q., Wang, C.: Review of K-means clustering algorithm. Electron. Des. Eng. 20(7), 21–24 (2012) 5. Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651–666 (2010) 6. Lu, H., Lin, J., Zeng, X.: Research and application of improved naive Bayesian classification algorithm. J. Hunan University (Nat. Sci.) 39(12), 56–61 (2012) 7. Pearl, J.: Fusion, propagation, and structuring in Belied networks. Artif. Intell. 29(3), 241–288 (1986) 8. Mitchell, T.M.: Machine learning, pp. 167–175. McGraw-Hill, New York (1997) 9. Fan, J., Chen, X., Zhou, Y.: An intelligent analytical method of abnormal metering device based on power consumption information collection system. Electr. Measur. Instrum. 50(11), 4–9 (2013) 10. Zhou, S., Xu, Z., Tang, X.: Method for determining optimal number of clusters in K-means clustering algorithm. J. Comput. Appl. (8), 1995–1998 (2010)
Interference Control Mechanism Based on Deep Reinforcement Learning in Narrow Bandwidth Wireless Network Environment Hao Li1 , Jianli Guo2(B) , Xu Li2 , Xiujuan Shi2 , and Peng Yu1 1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing 100876, China 2 Science and Technology on Communication Networks Laboratory, The 54th Research
Institute of China Electronics Technology Group Corporation, Shijiazhuang 050081, China
Abstract. With the expansion of application scenarios of communication network and the complexity of communication network structure, in specific practical application scenarios, communication network often needs to consider all available communication network resources while providing services for business needs. The abstract modeling of network communication resources, the standardized description of network communication resources and the basis and mechanism of virtual network communication resource pool construction play an important role in the unified management and scheduling of network resources. At the same time, according to the business requirements in specific special scenarios, how to quickly build the virtual network, ensure the demand interconnection among network members and meet the differentiated transmission needs of the business in specific application scenarios. The network communication resources are scheduled in a unified and coordinated manner on demand. The communication network demands of different businesses are taken as the optimization objective, and the optimal scheduling and configuration of the network communication resources are realized under the condition that the business needs are met. This paper aims at the communication network under special application scenarios, based on the virtualization modeling of communication network resources, and realizes the optimal scheduling and configuration of network communication resources through deep reinforcement learning optimization algorithm. Keywords: Virtual resource modeling · Resource allocation optimization
1 Introduction Nowadays, the application scenarios of communication network are expanding, and at the same time, the network structure of communication network is becoming more and more complex. Research on the general communication network has been relatively complete, but in the specific special communication network, the communication network resources that can be scheduled at the current moment should also be considered when providing services for business needs. This requires us to carry out abstract modeling © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 368–377, 2022. https://doi.org/10.1007/978-981-16-6554-7_42
Interference Control Mechanism
369
of network communication resources in advance, and then we can coordinate and unify the reasonable and efficient allocation of network communication resources according to different business requirements. At present, the research on network communication resource modeling and network communication resource scheduling configuration optimization mainly includes: 1) throughput maximum allocation. For the network system, in addition to providing resource services that can meet the requirements of certain quality of service, the throughput should be maximized, to obtain the maximum economic benefits [1]. 2) Fair distribution. Although the throughput maximum allocation method mentioned above improves the network resources to the greatest extent, it cannot guarantee the fairness of resource allocation. Fairness ensures that the malicious behavior of one data stream does not adversely affect other data streams. Fairness includes rate maximization fairness, proportion fairness, maximum and minimum fairness, etc. 3) Minimum power distribution. Many nodes in the wireless network are mobile nodes, which carry limited energy. Besides their own communication needs, they often need to provide relay services for other users, which requires that the transmission power of the wireless nodes should not be too large. If the transmission power can be minimized under the premise of meeting the communication demand, the energy of nodes can be effectively saved, the working life of the whole network will be extended, and the throughput and service quality of the network will be further improved. 4) Resource allocation under multi-objective and multi-constraint conditions. Different services have different QoS requirements, including throughput, delay, packet loss rate and jitter, etc. Resource allocation under multiple constraints can be reasonably allocated under the premise of satisfying multiple constraints, including bandwidth and time slots, to improve the quality of service and save resources at the same time. At present, there are many different research ideas on multi-constraint resource allocation, such as dynamic programming, which can dynamically adjust the allocation strategy according to the changes of network conditions and maximize the overall utility of the network [2]. The paper focuses on the goal to drive, the interference, bandwidth, etc. as constraint conditions, and not considering the village wireless access position of interference control routing planning reliability more than business differentiation characteristics of different network application scenarios, such as adopt unified resource scheduling scheme, it is hard to meet specific scenarios of high security, weak connection quickly recover and other special requirements. At the same time, the consideration of network resources is relatively isolated, and there is no unified resource perception and scheduling scheme. The scheduling algorithm is mainly static, does not consider the dynamic change process of the network, difficult to adapt to the highly elastic change process of the network. In this paper, the above deficiencies will be studied. The main contributions include: 1) to build a unified resource platform based on different networks for the special environment of the business. 2) According to the strong disturbance scenarios faced by the business with special requirements, the corresponding resource allocation model is proposed with the schedulable resources in the resource platform as the core. 3) The implementation of the adaptive scheduling algorithm for the special requirements of business scenarios can be realized, and the algorithm can be adjusted autonomously for the dynamic network architecture and multi-dimensional target requirements, to meet the requirements of the transmission process with high reliability of the special requirements of services.
370
H. Li et al.
In the second part, the construction mechanism of virtual network communication resource pool is introduced, and the abstract modeling of the main network communication resources is completed. In the third part, the intelligent resource scheduling model and algorithm for strong interference connection network are introduced. The fourth part mainly introduces the simulation results and analysis and evaluates the resource optimization algorithm numerically.
2 Construction of Network Abstract Resource Model Based on the network resource modeling method, the corresponding communication network resource pool model construction method is further proposed for the hierarchical network structure and the business and application requirements. The model is shown in the figure below. Specifically, the resource pool model is divided into three layers from bottom to top: the interrelated basic bearer network model, the network resource model of the joint service and the business resource model. The basic bearer network model includes all kinds of available network models of heterogeneous edge networks and is further divided into two layers. The first layer mainly maps to the data link network and virtual network, while the data link network can also be mapped to the virtual network. Furthermore, the data link network and virtual network are further carried by various underlying implementation networks, including but not limited to Internet, optical wired access network, wireless access network, satellite access network, optical transmission network and other networks. The next step is to introduce the generic model of hierarchical resource pools [3] (Fig. 1).
Fig. 1. Network resource pool model construction.
Interference Control Mechanism
371
3 Resource Scheduling Technology for Different Network Scenarios 3.1 Resource Scheduling Technology for Strong Interference Scenarios Because of the special application scenarios for a particular wireless network structure is complex, the network is often faced with internal or external interference of various factors, some of the scenes formed under strong interference conditions will bring a lot of the process of link quality and communication services, therefore, how to face the strong interference scene, build reasonable resource allocation optimization model, is one of the important problems need to be solved. In the wireless network resource scheduling scenario oriented to strong interference, this paper first analyzes various generated interference sources and characteristics, then, considering strong interference, proposes the corresponding global optimization resource model, and finally, proposes the corresponding optimization method based on deep reinforcement learning. The specific implementation process is shown in the figure below: (Fig. 2) 3.2 An Anti-jamming Optimization Model for Maximizing Global Throughput in Strong Jamming Scenarios Special wireless communication network, the common jamming types are suppression jamming, including aiming jamming, half aiming jamming and blocking jamming. When there are external interference sources in the network randomly appearing in the communication frequency band, the number of interference sources appearing in the communication frequency band is a Poisson process with time, the duration of external interference sources in the communication frequency band is in accordance with the Gaussian distribution, and the duration of each time in the communication frequency band is independent. Therefore, the behavior of external interference source is very similar to that of communication terminal in wireless communication network [4]. Against outside interference sources, the first discriminant effect and the size of its power, that is to identify the ongoing wireless communication link signal interference to noise ratio (SINR, signal–to–interference–and–noise-ratio), if the link distance outside interference sources distance far away, at this time of the wireless link signal interference noise is still greater than a certain threshold, can choose to increase the transmission power, or reduce the emission rate. If the distance between the link and the external interference source is relatively close, and the transmitting power of the external interference source is relatively large, then the signal interference to noise ratio of the wireless link is less than a certain threshold value, in this case, the current communication can only be terminated [5]. When the signal-to-noise ratio is still higher than the specified threshold, to suppress and identify these disturbances, we can increase the transmission power of the link to maximize the QoS requirements of various services in the network while suppressing the disturbances. Assume that the number of links in the wireless network is N and the number of inter-node communication links is L, then the sum Ii of the interference of the ith link in the wireless network by other wireless nodes in the network can be expressed
372
H. Li et al.
Fig. 2. Algorithm process.
as: Ii =
N
εn Gi,n pn , n = 1, 2, ..., L,
(1)
n=1,n=i
where εn ∈ 0, 1 indicates whether the nth link is communicating. Gi,n represents the gain of the transmitting node on the nth link relative to the ith link, and Pn is the transmitting power on the nth link. Let the current interference generated by other strong interference sources relative to link i be IiO , then the SINR on link i can
Interference Control Mechanism
373
be expressed as follows: SINRi =
Gi,i Pi . Ii +IiO + σ 2
(2)
In the above equation, σ 2 is the background thermal noise. Further, the transmission rate Ri on link i can be obtained as follows: Ri = Bi
ti log2 (1 + SINRi ), T
(3)
where Bi is the spectrum bandwidth allocated by the system for link i, and ti is the communication duration allocated to link i on the communication slot length T . To suppress interference, within the controllable range, resources such as spectrum bandwidth, power and time slot in the network should be taken as the optimization object to construct the optimization model for the maximum rate of the whole system as follows: L
Ri , {Bi },{Pi }, {ti } i=1 L ti ≤ T , s.t. i=1 L Bi ≤ BA , i=1 max
(4)
Rmin ≤ Ri ≤ Rmax , 0 ≤ Pi ≤ Pmax , SINRi > μ. In the above optimization model, to guarantee the quality of service of various services and the model solvable, necessary constraints need to be satisfied. The first constraint guarantees that the sum of all slot allocations cannot be higher than the entire slot length T . The second constraint guarantees that the sum of the spectrum allocated to all users is within the available bandwidth BA [6]. The third constraint is to ensure that the user rate is within the specified range [Rmin , Rmax ], the fourth constraint is the requirement of the link transmission power, which cannot be higher than the allowable maximum Pmax , and the last constraint is to ensure that the SNR is above the specified threshold μ. The above constraint is a complex multi-dimensional nonlinear constraint problem, which is difficult to be solved by general methods. To ensure the adaptability of the algorithm, this project plans to select deep reinforcement learning method to complete the allocation of corresponding resources. (2) Adaptive resource allocation algorithm based on deep reinforcement learning [7]. In this section, we propose an on-demand scheduling algorithm for special task resources based on deep reinforcement learning, which ensures that the goal of maximizing the rate of the whole network is satisfied. The deep reinforcement learning process includes two stages: offline learning network construction and online deep Q learning. Before learning, the state space, action space and reward function of learning need to be constructed. Specifically expressed as:
374
H. Li et al.
State space S: As a three-dimensional variable, it is composed of the value space of optimized objects {Bi }, {ti }, {Pi }; Action space A: corresponding to the four-dimensional state space, it is composed of the allocated spectrum bandwidth, the length of the communication time slot and the three-dimensional progressive variation of the transmitted power. Reward Re: To maximize the network energy efficiency while considering the service quality of the user side, the reward is defined here as the increment of the sum of the optimization goal and the service quality of special task, namely: Re(s, a) = μC + ν, s ∈ S, a ∈ A, μ + ν = 1. The definitions are as follows: = ϕi (fk ), fk ∈ F, s.t.∀i, k, ξ (fk ) ≥ , i
k
(5)
(6)
where F is the set of parameters related to the user’s quality of service, including but not limited to the user rate specified in the constraint, SINR, etc. ϕi (·) ∈ [0, 1] is the normalized efficiency value of each service quality parameter. ξi (·) represents the current value of the QoS parameter of user i and is the corresponding constraint threshold [8]. For the construction stage of offline learning network, an appropriate deep learning training model should be selected to obtain the relationship between action pair (s, a) and value function Q(s, a) = C. Due to the numerous influencing factors, it is possible to directly quantify and define the quantitative relationship between them and the parameters. Therefore, through the training model, enough value function estimation and corresponding action samples can be accumulated, and memory playback can be used to smooth the training process. For these deep learning training models, Convolutional Neural Network (CNN) or recurrent Neural Network can be selected. After obtaining enough training samples in the offline learning stage, the next step is to carry out online Q learning. The brief learning process to be adopted in this project is as follows: Step 1: randomly initialize the parameter θ of Q function, and set an initial state st ; Step 2: If the number of iterations or the change of Q value function reaches the termination condition, it is concluded. Otherwise, the action at is randomly selected in the state space at with a certain probability or the action pair corresponding to the current maximizing value function is selected at = argmaxQ(st , at ; θ ), where Q value was obtained through deep learning training in the offline stage. Step 3: Immediately reward Ret and the next state st+1 are observed in the interaction with the environment, the state transition (st , at , Ret , st+1 ) is stored, and a set of new states (st , at , Ret , st+1 ) is randomly sampled from the storage space. Step 4: Calculate the minimum expectation M (θ ) of the mean square error in the current state as follows: (7) M(θt ) = E (zt − Q(st , at ; θt ))2 . Target Zt is defined as follows: zt = Ret + γ max Q(st , a ; θt−1 ), a
(8)
Interference Control Mechanism
375
where 0 < γ < 1 is the discount factor. After that, the updated value of θ is obtained based on the gradient descent method, namely: θ = α[zt − Q(st , at ; θt )]∇Q(st , at ; θt ).
(9)
Step 5: Set θt+1 = θt + θ and calculate the new Q value, returning to Step 2. In view of the above process, it is necessary to further study the selection method of different deep learning models in the offline learning stage, the transfer strategy of state space, and the value method of θ , γ and other parameters in the Q learning process, to obtain the best convergence result. At the same time, this algorithm should be compared with other optimization algorithms to analyze its advantages and disadvantages [9].
4 Fitting Results and Analysis The wireless network covers an environment of 1000 m × 1000 m, including 11 base stations and 44 users. The coverage radius of the base station is 300 m, the initial transmitting power of the base station is set to 5 W, and the upper limit of the transmitting power of the base station is 10 W. The lower limit of user transmission rate is set at 2 Mbits/s. The initial rate of each user meets the minimum rate requirement. When there is an interference source in the area covered by the wireless network, the transmission rate of some users will be affected by the interference source and decrease. At this point, the system will automatically increase the transmission power of part of the base station, to bring the user’s rate back to the minimum requirements of the service (Figs. 3 and 4).
Fig. 3. (a) The scene before adding the interference source (b) The scene after adding the interference source.
376
H. Li et al.
Fig. 4. Transfer rate changes for affected users.
5 Conclusion The simulation results show that for the interference source with fixed power, the users around the interference source will be affected to different degrees, which is embodied in the decrease of user transmission rate. By adjusting the power of the surrounding base stations, the rate of the disturbed users can be increased again to achieve the goal of achieving the minimum rate requirement. Acknowledgement. This work has been supported by open fund project of Science and Technology on Communication Networks Laboratory (Grant No. SXX19641X073).
References 1. Tasnim Rodoshi, R., Kim, T., Choi, W.: Deep reinforcement learning based dynamic resource allocation in cloud radio access networks. In: 2020 International Conference on Information and Communication Technology Convergence (ICTC), pp. 618–623 (2020) 2. Zhu, M., Gu, J., Zeng, X., Yan, C., Gu, P.: Delay-aware energy-saving strategies for BBU pool in C-RAN: modeling and optimization. IEEE Access 9, 63257–63266 (2021) 3. Abdallah, H.B., Sanni, A.A., Thummar, K., Halabi, T.: Online energy-efficient resource allocation in cloud computing data centers. In: 2021 24th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN), pp. 92–99 (2021) 4. Lee, J., Rim, M., Kang, C.G.: Decentralized slot-ordered cross link interference control scheme for dynamic time division duplexing (TDD) in 5G cellular system. IEEE Access 9, 63567– 63579 (2021) 5. Kusaladharma, S., Zhu, W.P., Ajib, W., Aruma Baduge, G.A.: Achievable rate characterization of NOMA-aided cell-free massive MIMO with imperfect successive interference cancellation. IEEE Trans. Commun. 69(5), 3054–3066 (2021) 6. Yang, J., et al.: Instinctual interference-adaptive low-power receiver with combined feedforward and feedback control. IEEE Microwave and Wireless Components Letters 7. Hong, J., Cho, Y.H., Kim, S.K., Na, J.H., Kwak, J.: Spatio-temporal degree of freedom: interference management in 5G Edge SON networks. In: 2021 International Conference on Information Networking (ICOIN), pp. 491–494 (2021)
Interference Control Mechanism
377
8. Dai, Y., Liu, J., Sheng, M., Cheng, N., Shen, X.: Joint optimization of BS clustering and power control for NOMA-enabled CoMP transmission in dense cellular networks. IEEE Trans. Veh. Technol. 70(2), 1924–1937 (2021) 9. Wang, C., Deng, D., Xu, L., Wang, W., Gao, F.: Joint Interference Alignment and Power Control for Dense Networks via Deep Reinforcement Learning. IEEE Wirel. Commun. Lett. 10(5), 966–970 (2021)
RLbRR: A Reliable Routing Algorithm Based on Reinforcement Learning for Self-organizing Network Liyuan Zhang1 , Lanlan Rui2 , Yang Yang2 , Yuejia Dou2 , and Min Lei1(B) 1 Cyberspace Security Academy, Beijing University of Posts and Telecommunications, Beijing
100876, China {2017212975,leimin}@bupt.edu.cn 2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China {llrui,yyang,docin}@bupt.edu.cn
Abstract. With wireless communication rapid development, self-organizing network has been applied widely in various fields. Self-organizing network routing algorithm is a significant research direction. This work explores a reinforcement learning approach applied to self-organizing network packet routing. It is found that the existing routing algorithms for self-organizing network always use a single metric for next hop selection and fail to consider the influence of neighboring nodes on the forwarding nodes. The selection of the next hop does not have a comprehensive and long-term consideration, which is detrimental to promote routing reliability. Reinforcement learning can be used to solve this problem. In this paper, a Reinforcement Learning based Reliable Routing (RLbRR) algorithm is designed after studying to the existing methods in the literatures. RLbRR algorithm evaluates the quality of the forwarding node by reinforcement learning algorithm and combines with multi components. In this way, the selection of the next hop can become all-sided, so as to establish a stable and reliable routing path. The simulation results show that RLbRR algorithm performs well in reliability. Keywords: Routing algorithm · Reinforcement learning · Self-organizing network
1 Introduction Self-organizing network is connected by mobile nodes through wireless links without unified management of network management center and fixed network topology [1, 2]. Nodes are directly connected with each other through wireless links. Multiple nodes are directly interconnected to form a network. Nodes in self-organizing network can be temporarily organized by themselves to communicate with each other. Compared with the traditional network, self-organizing network has the characteristics that the link is unreliable and the topology is unstable [2, 3]. Therefore, the original traditional routing protocol is not effective in self-organizing network. Researchers have proposed many © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 378–386, 2022. https://doi.org/10.1007/978-981-16-6554-7_43
RLbRR: A Reliable Routing Algorithm
379
new routing algorithms for the special requirements of self-organizing networks, and have designed and implemented many routing protocols based on different strategies. Currently, the existing self-organizing network routing algorithms have some shortcomings. In short, the selection metric of next hop is limited and one-sided. It is not forward-looking and the environment of neighbours is not considered. Reinforcement learning can be applied to deal with the dynamic link and node information in the self-organizing network [4, 5]. The node working process of receiving and forwarding messages can be regarded as a Markov decision process which can be solved by reinforcement learning algorithm. Under this circumstances, the system overhead is not increasing too much and packet sending and receiving work on the node is efficient and reasonable. In the solution, the topology and link stability of self-organizing networks is also considered. This paper aim is to design a reliable routing algorithm for self-organizing network, to which reinforcement learning is applied. Combined with reinforcement learning method, the selection of the next hop in self-organizing network is able to consider various aspects comprehensively to choose the proper node as next hop. Performance evaluation of the proposed routing algorithm shows that RLbRR presents excellent reliability.
2 Related Work Many researchers have been investigating in the field of self-organizing network. The article [6], proposes a cost model as the solution for vehicular self-organizing network routing problem by considering the network quality metrics. In the article [7], security analysis and proposal for vehicular self-organizing network is provided. Reinforcement learning approach can be utilized in dynamic environment in networks. In the article [8], a routing scheme for mobile ad hoc networks referred to as two-hop relay selection by multi-metric based reinforcement learning algorithm. In the article [9], deep reinforcement learning was used to solve the packet routing problem caused by dynamic and autonomous communication networks. The approach can adapt to the dynamic demand and scale in large networks, utilize the network resources efficiently and can be applied to segment routing. In the article [10], Double Q-Learning Routing (DQLR) protocol is proposed to use reinforcement learning to achieve a reasonable balance between routing performance and cost in wireless mobile networks, where the nodes are sparse and end-to-end connectivity is rare.
3 RLbRR Mechanism RLbRR algorithm is designed to promote routing reliability in self-organizing network using reinforcement learning. Q-learning model with multi components is used to make next hop selection more reasonable. The parameter symbols used in RLbRR algorithm is shown in Table 1.
380
L. Zhang et al. Table 1. Explanation of parameters in RLbRR algorithm.
3.1 Q-learning Model Q-learning is a typical value-based reinforcement learning algorithm. Without prior knowledge of the environment, Q-learning can obtain delay reward through the process of trial and error to update its action selection strategy [10, 11]. In each iteration, the agent perceives the state of the current environment as st ∈ S and selects the action to act on the environment according to the selection policy π . At this time, the environment state will be transferred from st to st+1 and a reinforcement signal, instantaneous reward, R(st , at ) will be generated. The agent updates its policy according to the instantaneous reward and the state obtained from the current transfer. Q(st+1 , at+1 ) = Q(st , at ) + α[Rt + γ max Qt (S , a) − Q(S, A)] at+1 ∈A
(1)
For the t-th learning iteration, the agent performs the following steps: Observe the current state st , which is equivalent to observe the current node and its neighbour. Observe S and Rt which is equivalent to observe the neighbour nodes and reward based on next hop quality. Select action at using ε-greedy; which is equivalent to select a forwarding node by ε-greedy. Then the state st+1 was observed, which is equivalent to update the forwarding node as current node. Last, get the instantaneous reward Rt and update Qt and state. In order to establish a reliable and durable routing path, RLbRR considers node stability, link sustainability and the relative change of distance between nodes as metrics. 3.2 Multi-component Model Distance Component. Due to the limited energy and rapid change of network topology and the different relative distance between nodes, the link quality and link survival time between nodes is different. If the link quality is poor and the link survival time is short, it is easy to lead to link breakage and reduce the reliability. In routing forwarding, the relative distance between nodes is vital to establish a reliable routing path. shows the distance between to nodes.
RLbRR: A Reliable Routing Algorithm
381
In RLbRR, the Distance Component between nodes is calculated by the relative distance, which is the distance between the nodes relative to the communication range. If the relative distance between nodes is little, the signal intensity will be stronger, and the link between nodes will holds good reliability. If the relative distance between nodes is larger, the signal intensity will decrease, and the link quality will decrease and the link survival time is likely to be short, which is easy to lead to link fracture. In order to evaluate the relative distance of neighbour nodes, RLbRR defines a Distance Component (DC) to quantify the relative distance which is defined as Eq. (2). |D(c, x)| < rc 1 − |D(c,x)| rc DC(c, x) = (2) 0 otherwise D(∗) represents the Euclidean distance between nodes, which can be calculated according to the node coordinates in each Hello message received. rc is the transmission radius of the node. c represents the current forwarding node. x is a one hop neighbour node of node c, that is x ∈ NC , where NC is the set of one hop neighbour nodes of the node c. It’s easy to know DC ∈ [0, 1). For any node, the greater the Distance Component between two adjacent time slot, the better the stability. Sustainment Component. Node degree is an essential index of node quality evaluation. The node degree is the sum of its neighbour nodes in one hop. If the node degree is not considered, a node with insufficient neighbours could be chosen as the next hop forwarding node. At this time, because there are scarce or no neighbour nodes, it is easy to cause link break, resulting in the low sustainment of routing. Therefore, from the perspective of reliability, the routing algorithm requires considering the degree of next hop. In order to measure the sustainment of a neighbour node, RLbRR algorithm uses the node degree of the neighbour node and defines the Sustainment Component (SC) of neighbour nodes. The specific calculation method is as Eq. (3): SC(c, x) =
NUM x NUM max
(3)
NUM x represents the node degree of the neighbour node of the current node c, and NUM max represents the maximum degree that the node can have. On the premise of rationality, we can take NUM max as the total number of all nodes and the Sustainment Component SC ∈ [0, 1). Orientation Component. For reliable and efficiency routing, we should not only choose the forwarding node which is nearer to the destination but also consider the node orientation. The node of orientation toward destination is of better quality. In order to comprehensively evaluate the distance and orientation relation between the forwarding node and destination node, RLbRR uses Orientation Component (OC). According to the included angle and relative distance between the destination node and the forwarding node. Specific definition of Orientation Component is shown as Eq. (4). D(c,d )−D(x,d ) rc cosθ × e , D(c, x) < rc OC (c, x) = (4) 0, otherwise
382
L. Zhang et al.
d is the destination node. θ is the angle between the current node c, the destination node d and the neighbour node x. For the neighbour node which is far away from the destination node, the calculated Orientation Component is smaller, which reduces the possibility of that node being selected as the next hop forwarding node. At the same time, the orientation of neighbour node compared with destination node is considered. If the angle θ is larger than right angle, its cosine will be a minus value. Orientation Component helps prevent such nodes being chosen.
3.3 Complete Process of RLbRR Combined with the design above, the whole working flow of RLbRR algorithm is as Fig. 1 shows.
Fig. 1. The working flow chart of RLbRR algorithm
In this paper, RLbRR has been implemented with java. The type and description of used critical variables are shown in Table 2.
RLbRR: A Reliable Routing Algorithm
383
Table 2. Critical variables explanation.
The detailed RLbRR algorithm is as follows.
4 Simulation Results Analysis 4.1 Simulation Experiments As for reliability, it is generally defined as the ability of the product to complete the specified functions under specified conditions and within the specified time. The role of routing protocol for self-organizing network is to monitor the change of network topology, exchange routing information, locate the position of destination node, generate,
384
L. Zhang et al.
maintain and select routing path, and forward data according to the selected routing to provide network connectivity [12]. The reliability of routing algorithm can be measured by the success rate of route discovery. In this paper, the success rate of route discovery under different conditions is tested. In brief, the simulation algorithm generates a specified number of mobile nodes within the distribution range of nodes, and then randomly selects two of them as the source node and the destination node respectively. Then RLbRR algorithm is used to find the routing path. The reliability of routing algorithm is defined as: Reliability =
numsuccess numtotal
(5)
In Eq. (5), numsuccess is the times of success route discovery and is the total times of initiating route. Specifically, the coordinates of each node are generated by the random number function within the range of node distribution. In this way, within rationality, the generated nodes are randomly and evenly distributed in the range. The energy and the speed of each node are generated by the normal distribution function. In the simulation experiments, the motion rate of the node in X and Y directions conforms to the standard normal distribution, and the energy of the node has the normal distribution N(10000, 100). The coordinate ranges of the nodes are 100 * 100, 200 * 200, 300 * 300, 400 * 400 and 500 * 500 respectively. The number of nodes changes from 10 to 100 and the interval is 10. 4.2 Result and Analysis
Fig. 2. Time consuming of RLbRR under different node number and different node distribution range.
Figure 2 reflects the relationship between the reliability of RLbRR algorithm and the number of mobile nodes in the distribution range with different sizes. The horizontal axis represents the number of nodes, and the vertical axis represents the average reliability of 1000 discovery routes. We take the number of nodes in the unit distribution as the
RLbRR: A Reliable Routing Algorithm
385
density of nodes. It can be seen from Fig. 2 that when the node density is small, due to the large distance between nodes, it is unable to establish enough link connections, and the probability of multi hop communication is low, so the probability of successful route discovery is also low. With the increase of node density, the link connection between nodes increases and the probability of multi hop communication increases. Therefore the probability of successful routing also increases. When the nodes in the distribution range are sufficient, the probability of successful route discovery is 100%, which indicates that the reliability of RLbRR algorithm is excellent. The simulation result of time consuming of RLbRR to finish route discovery is shown in, in which the other condition is the same as the simulation of success rate.
Fig. 3. Time consuming of RLbRR under different node number and different node distribution range.
Figure 3 shows the relationship between the number of nodes and the time taken to discover a route within the distribution range of nodes of different sizes. The horizontal axis represents the number of nodes, and the vertical axis represents the time (in milliseconds) required to discover a route 1000 times. As can be seen from Fig. 3, when the number of nodes is small, the route contains fewer hops and takes less time. With the increase of the number of nodes, the number of hops required for end-to-end communication also increases, and the time spent in discovering routing increases.
5 Conclusion By investigating different types of existing self-organizing network routing protocols and analyzing the problems existing in existing routing protocols, it is found that the existing routing protocols have a single metric for the selection of forwarding nodes, and the selection of forwarding nodes is not forward-looking, which fails to consider the quality of neighboring nodes around the next hop forwarding nodes. Based on this, a reinforcement learning based reliable routing protocol RLbRR is designed. The algorithm is based on reinforcement learning to evaluate the quality of neighboring nodes, so as to ensure the stability and reliability of the link. Python is used to simulate. The
386
L. Zhang et al.
location and motion information of mobile nodes is generated by normal distribution. Simulation results show that the proposed RLbRR protocol has good stability. Acknowledgements. This work is supported by National Key R&D Program of China (2020YFB1807802).
References 1. Kose, A., Gökcesu, H., Evirgen, N., Gökcesu, K., Médard, M.: A novel method for scheduling of wireless ad hoc networks in polynomial time. IEEE Trans. Wirel. Commun. 20(1), 468–480 (2021) 2. Conti, M., Giordano, S.: Mobile ad hoc networking: milestones, challenges, and new research directions. IEEE Commun. Mag. 52(1), 85–96 (2014) 3. Anand, A., Aggarwal, H., Rani, R.: Partially distributed dynamic model for secure and reliable routing in mobile ad hoc networks. J. Commun. Netw. 18(6), 938–947 (2016) 4. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, London (2017) 5. Aitygulov, E.E.: The use of reinforcement learning in the task of moving objects with the robotic arm. In: Osipov, G.S., Panov, A.I., Yakovlev, K.S. (eds.) Artificial Intelligence. LNCS (LNAI), vol. 11866, pp. 119–126. Springer, Cham (2019). https://doi.org/10.1007/978-3-03033274-7_7 6. Gnanasekar, T.S., Samiappan, D.: Optimal routing in VANET using improved meta-heuristic approach: a variant of Jaya. IET Commun. 14(16), 2740–2748 (2020) 7. Feng, Q., He, D., Zeadally, S., Liang, K.: BPAS: blockchain-assisted privacy-preserving authentication system for vehicular ad hoc networks. IEEE Trans. Ind. Inform. 16(6), 4146– 4155 (2020) 8. Muneeswari, B., Manikandan, M.S.K.: Energy efficient clustering and secure routing using reinforcement learning for three-dimensional mobile ad hoc networks. IET Commun. 13(12), 1828–1839 (2019) 9. Ali, R.E., Erman, B., Bastug, E., Cilli, B.: Hierarchical deep double Q-routing. IEEE International Conference On Communications 2020, (IEEE ICC), Dublin, pp. 1–7. IEEE (2020) 10. Yuan, F., Wu, J., Zhou, H., Liu, L.: A double Q-learning routing in delay tolerant networks. IEEE International Conference On Communications 2019, (IEEE ICC), Shanghai, pp. 1–6. IEEE (2019) 11. Li, F., Lam, K.-Y., Sheng, Z., Zhang, X., Zhao, K., Wang, L.: Q-learning-based dynamic spectrum access in cognitive industrial internet of things. Mob. Netw. Appl. 23(6), 1636–1644 (2018). https://doi.org/10.1007/s11036-018-1109-9 12. Sasirekha, S., Swamynathan, S.: Cluster-chain mobile agent routing algorithm for efficient data aggregation in wireless sensor network. J. Commun. Netw. 19(4), 392–401 (2017)
A Computation Task Immigration Mechanism for Internet of Things Based on Deep Reinforcement Learning Yifei Xing1(B) , Chao Yang2 , Hao Zhang3 , Siya Xu1 , Sujie Shao1 , and Shi Wang1 1 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing, China [email protected] 2 Information and Communication Branch, State Grid Liaoning Electric Power Co., Ltd., Shenyang, Liaoning, China 3 China Global Energy Interconnection Research Institute, Nanjing, China
Abstract. The rapid development of smart cities has led to a large number of IoT devices connected to the network. The introduction of mobile edge computing technology can solve the problem of increasing network congestion caused by centralized cloud computing and large-scale data transmission. However, the diverse demands of IoT tasks and the mobility of users still pose challenges to network transmission and task processing. It is easy to cause unbalanced load of edge servers, resulting in high network energy consumption. To solve the above problems, a cloud edge terminal collaboration network task immigration model was established and a task immigration mechanism for Internet of things was proposed. In the proposed mechanism, deep reinforcement learning algorithm is used to solve immigration policies, and user mobility is considered to meet the resource demand of the task in the region. The simulation results show that the proposed mechanism can reduce the service request delay, system energy consumption and enhance user experience. Keywords: Mobile edge computing · Deep reinforcement learning · Task immigration · Artificial intelligence · Internet of things
1 Introduction With the rapid development of 5G technology, the Internet of Things (IoT) is playing an increasingly important role in improving the quality of urban life. The number of various mobile devices, such as wearable devices and smartphones, has increased rapidly, which leads to the increase of user data. Therefore, cloud computing technology was proposed to provide high quality services for applications [1]. However, cloud computing technology cannot meet the low latency task requests of some applications, and most mobile devices have problems such as low computing power and small battery capacity, which restrict the development of IoT. Through mobile edge computing (MEC) can
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 387–396, 2022. https://doi.org/10.1007/978-981-16-6554-7_44
388
Y. Xing et al.
effectively solve this problem [2, 3], there are still many challenges with MEC technology, such as computing offloading and resource allocation, but this paper mainly focuses on the mobility management of users [4]. The mobility of users increases the difficulty of resource allocation, it is necessary to consider the mobility of users reasonably and make task immigration decisions according to their mobility to guarantee the reliability of services. Therefore, MEC and computing immigration are getting more and more attention, the number of studies on user mobility is also increasing. Recently, some researchers have started to pay attention to the computation immigration in different scenarios. Kondo et al. [5] developed a MEC platform supporting service immigration for MEC servers. This platform used IP mobility support gateway, but they lacked consideration for the movement of the mobile user requiring service immigration. Jan P et al. [6] used communication path selection and virtual machine placement to solve the problem of user mobility, and designed the dynamic virtual machine placement algorithm and communication path selection algorithm respectively by predicting mobility, but they lacked consideration of energy consumption. Nasrin et al. [7] proposed a Shared-MEC architecture to support service immigration. This architecture introduces a small cloud to host all the MEC servers, but the small cloud center also required additional overhead. Considering the limitations of the existing work, this paper studies the problem of task immigration under the IoT scenario, and also considers the impact of user mobility on task immigration. The contributions of this paper are listed as follows: • To reduce the cost of task immigration, we set up an immigration model to calculate the cost of task immigration and guide the decision of task immigration. • To offload the task quickly, we design a computing task immigration mechanism based on the Proximal Policy Optimization (PPO) algorithm of deep reinforcement learning to realize the load balancing of mobile edge computing servers. • To reduce the impact of user mobility on task immigration, we set up a user mobility prediction model, and the simulation results show that the proposed algorithm has good performance. The organizational structure of the rest of the article is as follows: we introduce the system model and user mobility prediction model in the second section, describe problem formulation and the algorithm in the third section, show simulation experiments in the fourth section, and finally introduce conclusions in the fifth section.
2 System Model 2.1 Network Architecture For the practical application scenarios, this paper establishes a cloud-edge-terminal collaboration network framework, which consists of four layers, namely the terminal layer, the mobile edge access layer, the fixed edge sink layer and the cloud platform layer, as shown in Fig. 1.
A Computation Task Immigration Mechanism for Internet of Things
389
Terminal Layer: It is composed of various mobile user equipment (UEs). This paper only considers the task data uploaded by UEs through wireless network, which can be offloaded to the mobile edge access layer, the fixed edge sink layer or the cloud platform layer for processing. Cloud platform layer UE
Cloud
Sub-MECSs Fixed edge sink layer
M-MECS
Cloud Service Mobile edge access layer
Sub-MECSs
coverage area UE uploads
data to Sub-MECSs
Terminal layer
UE uploads
data to M-MECSs
M-MECSs
immigrate data
M-MECSs
upload data
Fig. 1. Cloud-edge-terminal collaborative offloading system model.
Mobile Edge Access Layer: It is composed of mobile vehicles. The mobile vehicle carries the subordinate MEC servers (Sub-MECSs, subordinate MEC servers) for fast deployment. Sub-MECSs receives and processes delay-sensitive tasks of the terminal layer, realizes offload balancing and resource sharing, and does not involve task immigration. Fixed Edge Sink Layer: It is composed of MEC servers (M-MECSs, Main MEC Servers) and their base stations, which can receive and process UE tasks of the terminal layer, and can offload some tasks with delay-insensitive requirements to the cloud platform layer for computing processing. In addition, the fixed edge sink layer will make task immigration decisions for part offloaded tasks to ensure the user’s service quality. Cloud Platform Layer: It is composed of cloud server. The cloud platform layer receives the task packet header data of the terminal, collects the computing resource and delay requirements required, and makes decisions on the task offloading strategy. It can also process the tasks sent by the fixed edge sink layer.
390
Y. Xing et al.
2.2 Mobility Prediction Model According to the actual movement trajectory of users, the mobility prediction model designed in this paper is mainly oriented to two-dimensional mobile scenes, as shown in Fig. 2. At each intersection, the mobility prediction model will choose the user’s moving direction according to the probability, and the probability of choosing any direction is 0.25. When the user moves to the boundary (non-corner) of the two-dimensional region, the probability of each direction is 1/3. When the user moves to the corner of the twodimensional region, the probability of each direction is 0.5. When the user only has front and back directions, the two-dimensional mobility prediction model will become the one-dimensional mobility prediction model. Due to the inertia of user movement, multiple movements in a certain direction will appropriately increase the probability of the prediction model to choose the next movement direction and reduce the probability of choosing other directions.
Possible direc on of travel
Direc on of travel
Fig. 2. Two-dimensional mobility prediction model.
2.3 Immigration Model The system is divided into time slots in time domain. The set of time slots is denoted by t = {t1 , t2 , . . . tn }. Let U = {u1 , . . . , ui , . . . ux } and M = {m1 , . . . , mk , . . . mz } denote the sets of UEs and M-MECSs, Sub-MECSS is not considered because it only deals with delay-sensitive tasks of the terminal layer and does not involve task immigration. This paper assumes that UEs are randomly distributed and mobile in the network architecture, the location of M-MECSs in the fixed edge sink layer remains unchanged. The proximity principle is adopted to solve the connection problem between UEs and M-MECSs. Then, the UEs tasks are offloaded to sub-MECSs, M-MECSs or cloud platform layer for task processing according to the offloading strategy [8]. Due to user mobility, there are some cases that the M-MECS offloaded by the user’s task is different from the M-MECS currently linked to, as shown in Fig. 3 for User2.
A Computation Task Immigration Mechanism for Internet of Things
391
When the user moves away from the M-MECs which the task offloaded, the user’s Quality of Service (QoS) will decrease, seriously affecting the user experience. Therefore, task immigration is necessary. Let xi (t) ∈ {0, 1} denote the task immigration strategy of ui in M-MECS in time slot t. When xi (t) = 1, it means that the offloaded tasks of ui will be immigrated to the now linked M-MECS, and when xi (t) = 0, it means that tasks of ui will not be immigrated.
Cloud MECS1 UE1
MECS2
UE2 UE2 Completing task
Completed task
Fig. 3. User mobility causes task immigration.
When the mi unloaded by the user’s task is different from the mj currently linked, the cost of task processing will increase, and task immigration will reduce the cost consumption of the task in some degree, as shown below: when xi (t) = 0, the remaining tasks will continue to be processed on mi without task immigration. The cost of user task is mainly the energy consumption and delay of long-distance wireless transmission by mi . The wireless transmission rate riu between user and mi is: riu = W log2 (1 +
hi Pir ) σ2
(1)
where Pir is the received power for wireless transmission between ui and mi , W is the channel bandwidth, hi is the channel gain, and σ 2 is the noise power[9]. Due to the power loss in free space is proportional to the square of the distance between the transmitter and receiver[10], i.e.: Pir K = 2 Pis S
(2)
where K is the influence factor in different environments, Pis is the transmitting power of wireless transmission between ui and mi , and S is the distance between the transmitter and receiver. Therefore, when the transmission distance increases, the wireless transmission power of mi and energy consumption also increase. Then, the energy consumption of mi downward transmission is: Eiu (t) =
Pis • Ri (t) riu
(3)
392
Y. Xing et al.
Transmission delay is: tiu =
Ri (t) riu
(4)
where Ri (t) is the number of downlink task data. when xi (t) = 1, perform task immigration. In this case, the cost of the task includes not only the above transmission delay and downlink transmission energy consumption, but also the immigration energy consumption and immigration delay. Similarly, the energy consumption of immigration is as follows: Eim (t) =
m • M (t) Pi,j i m ri,j
+
Pjs • Ri (t) rju
(5)
The immigration delay is: tim =
Mi (t) Ri (t) m + ru ri,j j
(6)
m and P m are respectively the power and rate of wireless transmission between where ri,j i,j mi and mj , and Mi (t) is the size of the task of immigration.
3 Proposed Algorithms 3.1 Problem Formulation When the task offloading server of user is different from the current linked server, to simplify the model, the task cost of ui is described by different weights of energy consumption and delay, i.e.: Ci (t) = VEi (t) + (1 − V )ti where parameter V is the weight coefficient. Therefore, the task cost of task immigration is: VEiu (t) + (1 − V )tiu xi (t) = 0 Ci (t) = VEim (t) + (1 − V )tim xi (t) = 1
(7)
(8)
The optimization objectives of this paper are as follows: 1 E{ Ci (t)} T →∞ T
max lim xi (t)
i
s.t.(a) : xi (t) ≤ 1 (b) : ti ≤ t
(9)
where constraint (a) represents the availability of the task immigration policy, constraint (b) represents that the delay requirements the delay requirements of each task.
A Computation Task Immigration Mechanism for Internet of Things
393
3.2 PTIPO Algorithms Based on the classic PPO reinforcement learning algorithm [11], this paper designed an algorithm named proximal task immigration policy optimization (PTIPO) to solve the optimal immigration strategy. The whole scenario consists of three parts: environment, individual and action. Individuals interact with the environment, starting from a state, choosing actions according to their own strategy distribution, and getting rewards. The environment consists of the physical devices and base stations providing individuals with environmental state information. Individuals can make different actions according to the state, impose on the environment, and calculate the corresponding reward, feedback to the individual. Then, the immigration operation is performed. PTIPO will be run like below.
S represents a limited state space. In this paper, the state is used to represent the position coordinates and the current task data size in each time slot of ui . A represents a limited workspace and is defined as a immigration decision xi . R represents the change of the objective function value after the state and action pair (S, A) of the current time slot appears, i.e.: before
R(St , At ) = Ci
after
(t) − Ci
(t)
(10)
with the accumulation of iterations, the system can converge to the optimal state, in which all Ci (t) values will not change and remain at the minimum.
394
Y. Xing et al.
4 Simulation Results 4.1 Parameter Setting In this paper, we assume that the square area of the patrol scene is 100km × 100km, and the number of UAVs with data acquisition and transmission functions is randomly distributed, and the number is 50. According to the IoT scenarios, each M-MECS can correspond to 10 UEs. The channel bandwidth is 20M. The channel gain distribution is the average value g0(1/100)4 , where g0 = −30 dB is the path loss constant of 1m. Suppose the noise power is σ 2 =10−10 W/Hz. 4.2 Analysis of Simulation Results To verify the performance of the PTIPO proposed in this paper, it is compared with simulated annealing algorithm (SAA) [12] and Deep Q-learning algorithm (DQN) [13], results are as follows.
Fig. 4. System average reward.
Figure 4 shows the convergence trend of various immigration algorithms. Compared with DQN algorithm and SAA algorithm, the PTIPO algorithm proposed in this paper has a faster convergence rate and is more suitable for practical application scenarios with low delay requirements in IoT. At the same time, compared with other algorithms, the PTIPO mechanism has lower energy consumption and system delay. This is because it takes the mobility of users into consideration, which makes it more rapid convergence, more stable task immigration. To prove that PTIPO has a better immigration decision, we compare the various algorithms by using the time delay index. As shown in Fig. 5, the time consumption of PTIPO is lower than that of DQN and SAA, and becomes more apparent as the number of UEs increases.
A Computation Task Immigration Mechanism for Internet of Things
395
Fig. 5. System average delay
5 Conclusion To better fulfill the diversified requirements and reasonably allocate the communication and computing resource of the base station, this paper proposes a computing task immigration mechanism PTIPO. This algorithm takes user mobility into consideration, so it can accomplish task immigration and resource allocation more quickly. The simulation results show that compared with DQN and SAA algorithms, PTIPO mechanism can reduce the immigration delay by 27% and 60% respectively, which can complete the task immigration faster and maintain the stability of immigration. In the future work, the traffic prediction model will be introduced to further improve the service efficiency. Acknowledgment. This work is supported by the Science and Technology Project of State Grid Corporation of China: Research and Application of Key Technologies in Virtual Operation of Information and Communication Resources. The corresponding author is Yifei Xing with e-mail address [email protected].
References 1. Zhang, Y., Qiu, M., Tsai, C., et al.: Health-CPS: healthcare cyber-physical system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017) 2. Li, H., Shou, G., Hu, Y., et al.: Mobile edge computing: progress and challenges. In: 2016 4th IEEE International Conference on Mobile Cloud Computing, Services and Engineering. IEEE (2016) 3. Abbas, N., Zhang, Y., Taherkordi, A., et al.: Mobile edge computing: a survey. IEEE Internet Things 5(1), 450–465 (2018) 4. Mach, P., Becvar, Z.: Mobile edge computing: a survey on architecture and computation offloading. IEEE Commun. Surv. Tutor. 19(3), 1628–1656 (2017)
396
Y. Xing et al.
5. Kondo, T., Isawaki, K., Maeda, K.: Development and evaluation of the MEC platform supporting the edge instance mobility. In: 2018 IEEE 42nd Annual Computer Software and Applications Conference, Tokyo, vol. 2, pp. 193–198 (2018) 6. Plachy, J., Becvar, Z., Strinati, E.C.: Dynamic resource allocation exploiting mobility prediction in mobile edge computing. In: 2016 IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications. IEEE (2016) 7. Nasrin, W., Xie, J.: SharedMEC: sharing clouds to support user mobility in mobile edge computing. In: 2018 IEEE International Conference on Communications, Kansas City, pp. 1–6 (2018) 8. Xu, S., et al.: Deep reinforcement learning based task allocation mechanism for intelligent inspection services in energy internet. J. Commun. 42, 191–204 (2021) 9. Wang, S., Zhang, X., Yan, Z., et al.: Cooperative edge computing with sleep control under nonuniform traffic in mobile edge networks. IEEE Internet Things J. 6(3), 4295–4306 (2019) 10. Shaw, J.A.: Radiometry and the Friis transmission equation. Am. J. Phys. 81(1), 33–37 (2013) 11. Schulman, J., Wolski, F., Dhariwal, P., et al.: Proximal Policy Optimization Algorithms (2017) 12. Hu, Y.J.: Research on Task Offloading and Resource Allocation Algorithm in Mobile Edge Computing. Chongqing University of Posts and Telecommunications (2017) 13. Zhang, C., Zheng, Z.: Task migration for mobile edge computing using deep reinforcement learning. Future Gener. Comput. Syst. 96, 111–118 (2019)
Action Recognition Model Based on Feature Interaction Dengtai Tan(B) , Changpeng He, and Yiqun Wang School of Public Security and Technology, Gansu University of Political Science and Law, Lanzhou 730070, Gansu, China
Abstract. In order to solve incomplete information expression and large amount of calculation, this paper proposes an action recognition model based on feature interaction. Firstly, high frequency and low frequency components are extracted from the video, then the high frequency and low frequency fusion algorithm proposed in this paper is used to compress the original video to solve incomplete expression of action information. Secondly, feature interaction network is designed to extract the features of the odd and even frame sequences, and interact with each level of the two-way network to extract the hidden information in the video sequence, the two-way features are fused by fusion network to extract the spatial temporal feature. Finally, the effectiveness of the model was verified, and the recognition accuracy of 96.15% and 78.51% was achieved on the UCF101 and HMDB51 datasets, respectively. Keywords: Action recognition · Feature interaction network · Feature fusion · High frequency feature · Low frequency feature
1 Introduction Action recognition is widely used in intelligent monitoring, human-computer interaction, intelligent robot, network video retrieval, virtual reality and other fields. According to the different methods of feature extraction, action recognition can be divided into traditional methods based on artificial design features and deep learning methods based on large data sets training. The quality of extracting features is the decisive factor affecting the identifying performance. Among the traditional methods based on artificial design features, the most representative ones are DT [1] and IDT [2], which all use four characteristic description methods: the histogram of flow (HOF), the histogram of gradient (HOG), motion boundary histograms (MBH), and trajectory. The traditional methods based on artificial design characteristics are based on the experience and prior knowledge of researchers to extract features, which is time-consuming and laborious, with poor generalization degree and does not fully conform to the distribution law of action characteristics. With the development of deep learning technology, traditional methods based on artificial design characteristics have been replaced by deep learning methods in recent years. According to the network structure, action recognition can be divided into
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 397–407, 2022. https://doi.org/10.1007/978-981-16-6554-7_45
398
D. Tan et al.
four categories: two stream network, three-dimensional convolutional neural network, Long Short-Term Memory network and fusion network structure. On the study of two-stream network, at the beginning of the rise of deep learning, Li et al. [3] learned and classified through the overlay input of video frames into the convolutional neural network, and experimentally found that this method was worse than manually extracting features. The authors in [4] proposed a two-stream convolutional neural network, which greatly improved the accuracy of action recognition. The TSN [5] is constructed based on the two-stream network, which solves the problem of inaccurate modeling of action categories with relatively long intervals in the two stream network. Aiming at the problem that 2D convolutional neural network is difficult to extract video timing information, Baccouche et al. [7] proposed 3D convolution to extract spacetime characteristics, which achieved good results on small data sets. However, due to the limitation of network structure, 3D convolution did not extract better spatial and temporal features. Tran et al. [8] proposed a C3D network that can model both appearance and motion at the same time, with a convolution kernel size of 3 × 3 × 3, characterized by fast inference and very high computational efficiency, which greatly improves the accuracy of action recognition. For extraction features, the researchers improved the convolution kernel [9]. From the perspective of recognition accuracy, the C3D method is less accurate than the two-stream method, but C3D is still the focus of research, it has the characteristics of universal, efficient and compact. On the other hand, LSTM network is widely used to deal with timing information. In order to extract the time sequence information in the video, Donahue et al. [11] proposed the LRCN network, which extracted the spatial characteristics information of the frame through the convolutional neural network, modeled the spatial information through the LSTM network, and finally completed the behavior classification. Since the CNN network selected was the Alexnet network with relatively shallow layers, not learning enough about the spatial characteristics of the lower levels. The authors in [12] innovatively introduced the attention mechanism, focusing on the areas strongly related to the behavior category in the video, which improved the accuracy of LRCN network structure. In order to improve the accuracy of action recognition, multiple models are fused to improve the effectiveness of action recognition. Using a hybrid depth model of 3D CNN and LSTM to extract the sequential characteristics of video clips, which can effectively share information among multiple action categories [13]. With the rise of the graph neural network, the graph convolution neural network models the relationship between different proposals and provides a novel method for action recognition by studying powerful representations of motion classification and positioning [14]. Inspired by group convolution, Tran et al. [15] designed a structural-channel separation convolution network (CSN), which breaks down 3D convolution by separating channel interactions from space-time interactions, while providing a positive form for 3D channel-separated convolution, which is simple, efficient, and accurate. Inspired by retinal cells in the primate vision system in biology, Feichtenhofer et al. [16] proposed SlowFast network for video classification, the highest recognition accuracy was achieved on the Kinetics data set without using any pre-training.
Action Recognition Model Based on Feature Interaction
399
As video contains a huge amount of hidden information, the amount of information stored in it is far greater than traditional media types such as pictures and texts. Meanwhile, video has the characteristics of time-sequence correlation, event integrity and content diversity, which brings great challenges to the study of action recognition. Through the summary of the above literature, it is found that taking a fixed-length video clip as the input of the network will lead to incomplete information expression, and the advantages and disadvantages of extracting features play a decisive role in the recognition results, so action recognition model based on two-way feature interaction is proposed. The original video is preprocessed by high-low-frequency information fusion module, so that the video contains complete behavior information, and then through the two-way feature interactive networks efficiently extract video features. The main contributions to this article: 1. Inspired by the theory of spatial scaling, not only signals and images have high and low frequency components, but also video has high and low frequency components. The low frequency components in the video determine the basic structure of the image, while the high frequency components determine the edges and details of the image. Obviously, the low frequency components are redundant. In order to solve the problem of incomplete expression of action information caused by using fixed-length video clips as input to the network, a fusion algorithm of high and low frequency components is proposed to improve the expression and analysis ability of the video, so that it could fully express behavior information, and provide data input for the subsequent feature interactive network. 2. Traditional two-stream networks do not interact with features earlier in the processing process, but simply extract space-time features in their respective networks and connect to the feature output layer, with weak character capabilities. In order to enable the network model to better extract the depth characteristics in the video, a deep two-way feature interaction network is designed, which is mainly composed of Vgg3D_Net1 and Vgg3D_Net2 networks in parallel. The two networks extract deep-seated features at each layer through different interaction methods and different connection directions. In addition, the input of the deep two-way feature interactive network all uses the RGB image sequence, discarding the optical flow data in the traditional two-stream network, and improving the execution speed of the network.
2 The Network Structure The action recognition model based on feature interaction proposed in this paper consists mainly of high and low frequency fusion modules, feature interaction networks, and fusion networks, as shown in Fig. 1. The high-low-frequency information fusion module reduces the redundant information contained in the video through the high-lowfrequency fusion algorithm, which enables it to fully express the behavior information and provides input for the characteristic interactive network. Feature interactive networks consist of Vgg3D_Net1 and Vgg3D_Net2. Vgg3D_Net1 extract the detail features of even frame sequences, Vgg3D_Net2 extracts the whole characteristics of odd frame sequences, and interacts at each level to extract space-time fusion features. The fusion
400
D. Tan et al.
network fuses the two features to further extract the features. The design process for each module is described in detail below.
Fig. 1. The whole structure
3 High-Low-Frequency Fusion Module The fusion process of high and low frequency information is shown in Fig. 1. Firstly, two time points are randomly selected in the training sample, and a k-frame and 2k-frame video clip is successively sampled to form. Then, the 2k-frame video clip is formed by the high-low-frequency fusion module. Finally, each training sample can be represented as two samples: original video, high-low-frequency fusion video. In this way, the problem of incomplete expression of behavior information caused by using fixed-length video clips as the input of the network can be solved. K video fused by the high-low frequency fusion module contains information of 2K frames in the original video. The video’s high-frequency information refers to the video without Gauss filtering of the original video, low-frequency information refers to the video obtained by Gauss filtering. The definition of Gaussian filtering in two-dimensional space is shown in Eq. (1), δ is the scale parameter, which is related to smoothness. The more violent the smoothing. Assuming fk (x, y) represents a two-dimensional image, where k ∈ {0, 1, 2..., n}, low-frequency image Lk (x, y) is the convolution of the fk (x, y) and G(u, v, δ). As shown in Eq. (2). G(u, v, δ) =
1 − u2 +v2 2 e 2δ 2π δ 2
Lk (x, y) = G(u, v, δ) ∗ fk (x, y)
(1) (2)
High-low-frequency fusion frame diagram is shown in Fig. 2. fk−1 (x, y), fk (x, y) and fk+1 (x, y) donate three consecutive frames of images. Convolute the current frame fk (x, y) and Gaussian kernel G(u, v, δ) to obtain the low-frequency information Lk (x, y), and then reconstruct the low-frequency and high-frequency information fk+1 (x, y) into
Action Recognition Model Based on Feature Interaction
401
a fusion map, as shown in (3), where L(x, y) is the low-frequency component, fk+1 (x, y) is the high-frequency component of the image, F(x, y) the fusion image, and the λ1 and λ2 are the weights. F(x, y) = λ1 L(x, y) + λ2 fk+1 (x, y)
(3)
Fig. 2. High-low frequency fusion algorithm
4 Feature Interactive Network 4.1 Base Two-Way Network The feature interaction network consists of Vgg3D_Net1 and Vgg3D_Net2, as shown in Fig. 3. Vgg3D_Net1 and Vgg3D_Net2 are fully symmetrical network structures, consisting mainly of 6 convolution layers (Conv3D), 4 pooled layers (Pooling), Consisting of 4 Dropout layers and 4 Batch Normalization layers. All convolution kernels are 3 × 3 × 3. The size of the first pooling layer is 1 × 2 × 2, and the step length is 1 × 2 × 2. The stride is 2 × 2 × 2, and the step length is 2 × 2 × 2, where the activation function is used LeakyReLU, the interactive layer activation functions using Sigmoid. Vgg3D_Net1
conv1
conv2
conv3
conv4
conv1
conv2
conv3
conv4
Vgg3D_Net2
Fig. 3. Feature interaction network
The difference between the Vgg3D_Net1 and Vgg3D_Net2 is the video input layer and the pooling layer. The input layer divides the video into odd and even frames,
402
D. Tan et al.
and Vgg3D_Net1 takes even frames as the input and selects the maximum pooling layer to extract texture features. Vgg3D_Net2 takes odd frames as input and uses the average pooling layer. The main purpose of Vgg3D_Net2 is to extract the overall data characteristics of video. 4.2 Feature Interaction The two-way feature interaction strategy, as shown in Fig. 4, integrates interaction through different ways and connection directions. represents multiplication interaction and ⊕ represents addition interaction.
Fig. 4. Feature interaction process
The fusion layer takes the fusion features of Vgg3D_Net1 and Vgg3D_Net2 networks as inputs, and the fusion process is shown in (4). F = f1 (xl ; Wl ) • f2 (xl ; Wl )
(4)
xl is the output l of the layer network, Wl is the weight of the layer convolution core, f1 (xl ; Wl ) and f2 (xl ; Wl ) are the characteristic output of Vgg3D_Net1 and Vgg3D_Net2, respectively. • represents the point multiplication between the feature matrices. The main function of fusion network is to extract the detailed features and overall fusion of the two-way network, and further extract relevant information such as target, scene and action. The fusion network is composed of 2 convolution layers (Conv3D),1 pooling layer,1 Dropout layer,1 global average pooling layer, and 1 batch standardization layer, all convolution cores are 3 × 3 × 3, with a pooled layer size of 2 × 2 × 2 and a step length of 2 × 2 × 2, where the activation function uses LeakyReLU.
5 Analysis of Experimental Details and Results To verify the efficiency and superiority of the algorithm proposed in this paper, the challenging data sets UCF101 and HMDB51 are used for testing and analysis. The experimental hardware environment is Intel Core i7–8700 k CPU, NVIDIA GTX2080Ti graphics card, 64 G internal storage. The software environment uses CUDA’s accelerated Tensorflow deep learning framework, with Keras version 2.1.6, program language Python 3.6.5, and development tool PyCharm.
Action Recognition Model Based on Feature Interaction
403
5.1 Network and Training Parameter Settings
Table 1. Network structure parameter Layer name
UCF101 parameter
HMDB51 parameter
Conv_1
3 × 3 × 3, 32
3 × 3 × 3, 32
Conv_2
3 × 3 × 3, 32
3 × 3 × 3, 32
Conv_3a
3 × 3 × 3, 64
3 × 3 × 3, 64
Conv_3b
3 × 3 × 3, 64
3 × 3 × 3, 64
Conv_4a
3 × 3 × 3, 128
3 × 3 × 3, 128
Conv_4b
3 × 3 × 3, 128
3 × 3 × 3, 128
Conv_5a
3 × 3 × 3, 256
3 × 3 × 3, 128
Conv_5b
3 × 3 × 3, 256
3 × 3 × 3, 256
The original video is clipped to a video clip with a resolution of 112 × 112, and the long video is compressed into short video by fusion algorithm, which randomly extracts non-repeating video clips from the original data. Finally, the original dataset and converged data are used as input to the network at a length of 16 frames. When the feature interacts with network input, the odd and even frames in the data are separated and entered into the network separately. The parameters of UCF101 and HMDB51 in the network are shown in Table 1. Table 2. Parameter settings Name
UCF101 HMDB51
Learning rate
Lr = 0.01
Learning rate decline factor
α=0.5
Optimization
Optimizer = “SGD”
Loss
Loss = “categorcal_crossentropy”
Epoch
Epoch = 100
Bacth_size
Bacth_size = 16
Weight_decay
Weight_decay = 0.006
The experiment is trained with small batches of data, once every 16 batches, and the number of iterations epoch was 100. The accuracy of the model is improved by automatically adjusting the learning rate, and the linear attenuation method is adopted in the adjustment process. The super-parameters involved in the algorithm are shown in Table 2.
404
D. Tan et al.
5.2 Interactive Testing and Analysis During the experiment, the UCF101 and HMDB51 datasets are divided into 3 groups, and the average accuracy of the 3 sets of tests was used as an indicator to evaluate the effectiveness of the model. By comparing feature interaction networks with different interaction strategies, it can be find that one-way connectivity from Net1 to Net2 has the highest recognition accuracy. As shown in Table 3, ‘—’means no interaction, ‘ → ’ means a one-way connection from Vgg3D_Net1 to Vgg3D_Net2, and ‘ ← ’ represents a one-way connection from Vgg3D_Net2 to Vgg3D_Net1. Table 3. Comparison of recognition accuracy of different fusion methods Interact methods
Interaction direction
UCF101
HMDB51
Net1—Net2
93.98%
74.36%
Net1→Net2
95.26%
76.62%
Net1←Net2
94.17%
75.53%
⊕
Net1→Net2
96.15%
78.51%
⊕
Net1←Net2
95.90%
77.32%
—
From the different interactions in Table 3, it can be concluded that the recognition rates on both UCF101 and HMDB51 are low when there is no interaction between feature interaction networks, due to the independent extraction of features between the Vgg3D_Net1 and Vgg3D_Net2 networks, the lack of fusion of information, and only the fusion layer for information fusion. When one-way interactions are used, it can be found that the accuracy of addition interaction is higher than multiplication interaction. When the multiplication connection and addition interaction are adopted, it can be found that the recognition effect of ‘ → ’ connection mode is better, that is, the interaction mode from VGG3D_Net1 to VGG3D_Net2 has a high accuracy, because the detailed features extracted from VGG3D_Net1 flow to the overall features extracted from VGG3D_Net2. The integration of detailed features into overall features can better represent human actions and behaviors. 5.3 Compared with the Current Mainstream Model To demonstrate the effectiveness of this algorithm, the identification accuracy of the feature interaction network presented in this paper on UCF101 and HMDB51 datasets is listed in Table 4. The algorithm presented in this paper achieves 96.15% recognition accuracy on UCF101 data sets and 78.51% recognition accuracy on HMDB51 data sets, which is better than most of the latest behavior recognition models. However, lower than the D3D, I3D Two-Stream [9] and Hidden Two-Stream models [19]. However, these methods use optical flow information as input for such high accuracy, and the optical flow represents the video Motion information can complement the RGB information of
Action Recognition Model Based on Feature Interaction
405
the video image, thereby improving the effect of the dual-stream model, while the optical flow can represent the motion information of the target, but calculating the optical flow of the video requires a large amount of computation, and the video RGB information contains all the information about the action. Therefore, the characteristic interaction network, which is proposed in this paper, uses only video RGB information, abandon the light flow information, and extracts the original video information through the high and low frequency information fusion algorithm in the time domain, so that it can express the complete behavior information. Finally, the fusion network fuses the two features to gather the target, scene and action information of the video. The experimental results show that the characteristic interactive network can effectively improve the accuracy of behavior recognition through RGB information and fully extract potential information in different behavior categories. Table 4. Comparison with current typical models Model
UCF101
HMDB51
iDT [2]
86.40%
57.2%
Spatiotemporal ResNets [17] + IDT
94.6%
70.3%
Two-Stream [18]
88.00%
59.4%
Two-Stream Fusion [6]
92.5%
65.4%
Hidden Two-Stream [19]
97.1%
78.7%
I3D Two-Stream [9]
98.0%
80.7%
C3D [8]
82.3%
56.8%
P3D [10]
88.6%
—
I3D [9]
95.6%
74.8%
Long-term ConvNets [20]
91.7%
64.8%
Spatiotemporal Multiplier Networks [21]
94.9%
72.2%
D3D (Kinetics-600 pretrain)
97.1%
79.3%
In this article, Net1→Net2
95.26%
76.62%
In this article, Net1→Net2 ⊕
96.15%
78.51%
6 Conclusion The action recognition model of feature interaction proposed in this paper adopts video RGB information as network input. Since the RGB information in the video contains all the information about the action, the optical flow information is discarded, improving the speed of network execution. Then the feature interaction network is designed, the detail features of even frame extraction and the overall features of odd frame extraction are interacted at each level, and the space-time fusion features are extracted. Moreover, through the fusion network to further extract the video with the target, scene,
406
D. Tan et al.
action-related information, so as to improve action recognition purposes. Finally, the identification accuracy of the feature interaction network under different interaction directions and modes is verified. The classification accuracy of the model on UCF101 and HMDB51 datasets reaches 96.15% and 78.51%, respectively, which effectively improves the accuracy of action recognition. Acknowledgment. Scientific Research Foundation of Gansu Institute of Political Science and Law (GZF2018XZDLW17, No.GZFXQNLW03, No.jdzxyb2018–06, No.jdzxyb2018–09), The Educational Commission of Gansu Province of China (No. 2019B-119, No. 2020B-164).
References 1. Wang, H., Klaser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3169–3176 (2011) 2. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: The IEEE International Conference on Computer Vision (ICCV), 3551–3558 (2013) 3. SravyaPranati, B., Suma, D., ManjuLatha, C., Putheti, S.: Large-scale video classification with convolutional neural networks. In: Senjyu, T., Mahalle, P.N., Perumal, T., Joshi, A. (eds.) ICTIS 2020. SIST, vol. 196, pp. 689–695. Springer, Singapore (2021). https://doi.org/ 10.1007/978-981-15-7062-9_69 4. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems (NeurIPS), 568–576 (2014) 5. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2 6. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1933–1941 (2016) 7. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: the Second International Conference on Human Behavior Understanding, 29–39 (2011). https://doi.org/10.1007/978-3-642-25446-8_4 8. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: The IEEE International Conference on Computer Vision (ICCV), 4489–4497(2015) 9. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) 10. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: The IEEE International Conference on Computer Vision (ICCV), 5534–5542 (2017) 11. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 677–691 (2015) 12. Sharma, S., Kiros, R., Salakhutdinov, R.: Action Recognition using Visual Attention. arXiv: Learning (2015) 13. Ouyang, X., Xu, S., Zhang, C., Zhou, P., Li, X.A.: 3d-CNN and LSTM based multi-task learning architecture for action recognition. IEEE Access 7(99), 40757–40770 (2019)
Action Recognition Model Based on Feature Interaction
407
14. Zeng, R., Huang, W., Gan, C., Tan, M., Huang, J.: Graph convolutional networks for temporal action localization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2019) 15. Tran, D., Wang, H., Feiszli, M., Torresani, L.: Video classification with channel-separated convolutional networks. In: International Conference on Computer Vision (ICCV), 5552– 5561 (2019) 16. Feichtenhofer, C., Fan, H., Malik, J., He, K.: SlowFast networks for video recognition. In: The IEEE International Conference on Computer Vision (ICCV) (2019) 17. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal residual networks for video action recognition. In: Advances in Neural Information Processing Systems (NeurIPS), 3468–3476 (2016) 18. Zhu, Y., Lan, Z., Newsam, S., Hauptmann, A.: Hidden two-stream convolutional networks for action recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 363–378. Springer, Cham (2019). https://doi.org/10.1007/978-3-03020893-6_23 19. Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1510–1517 (2018) 20. Feichtenhofer, C., Pinz, A., Wildes, R.P.: Spatiotemporal multiplier networks for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7445–7454 (2017) 21. Stroud, J.C., Ross, D.A., Sun, C., Deng, J., Sukthankar, R.: D3D: distilled 3D networks for video action recognition. In: The IEEE Winter Conference on Applications of Computer Vision (WACV) (2020)
Semantic Segmentation of 3-D SAR Point Clouds by Graph Method Based on PointNet Zerui Yu and Kefei Liao(B) School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China [email protected]
Abstract. Three-dimensional (3-D) synthetic aperture radar (SAR) has undergone continuous development, and the high resolution of SAR images has produced a large number of sample data, which makes semantic segmentation based on point clouds possible. This paper proposes a new end-to-end deep learning network for 3-D SAR point cloud semantic segmentation based on PointNet and Graph convolutional networks. This method extracts point clouds from 3-D SAR images, and then constructs a neighborhood graph that reflects the most relevant nodes. The image features of the SAR point cloud can be put into the improved PointNet MLP block to obtain the semantic segmentation of each part of the aircraft. The effectiveness of this method is verified by 3-D SAR simulations of aircraft targets and several data sets. Keywords: 3-D SAR · Graph signal processing · Semantic segmentation
1 Introduction With the rapid development and increasing popularity of high-resolution SAR images, three-dimensional (3-D) synthetic aperture radar (SAR) application technology has become a hot spot in the SAR field. From the past to the present, different 3-D SAR technologies have been proposed and achieved good results [1]. These methods have been widely used in the 3D reconstruction of urban buildings [2] and have played an important role in topographic mapping. However, research is mainly focused on imaging algorithms rather than image interpretation, segmentation and application. In recent years, with the development of deep learning algorithms for 3D data and the open source of 3D databases [3], a series of results have been achieved in the field of 3D point clouds, such as various classification methods and scene segmentation methods [4]. The first deep learning framework for point cloud classification and segmentation is PointNet [5]. It independently learns point-by-point features through multiple MLP layers, and extracts global features through the maximum pooling layer, which preserves the spatial features of the point cloud to the greatest extent, and has excellent performance in classification and segmentation. Previous work has applied PointNet to the field of 3-D aircraft recognition in radar point cloud data [6]. However, PointNet cannot obtain the local features of the point cloud, which makes it difficult to process 3D radar imaging. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 408–418, 2022. https://doi.org/10.1007/978-981-16-6554-7_46
Semantic Segmentation of 3-D SAR Point Clouds
409
In recent years, with the development of graph signal processing and the open source of 3D databases [7], the method of using graph models to extract point cloud features has become a new direction for processing point cloud models, and has achieved better accuracy. This article is divided into part IV. The second section first briefly introduces the 3D imaging model, and then introduces the principle of PointNet based on graphs. The third section proposes radar point cloud preprocessing and the entire SAR semantic segmentation model system. The fourth section shows the results of 3-D SAR imaging and data preprocessing, which are then put into the network and the results of semantic segmentation are obtained. In addition, the accuracy of Tukey PointNet and the original PointNet will be published, by comparing the segmentation results of the same aircraft The accuracy of the same data set proves the superiority of the graph-based PointNet.
2 Basic Model and Theory 2.1 Three-Dimensional SAR Imaging Model Geometric model of 3-D SAR is shown as Fig. 1. The linear array antenna is putted along the Y-axis, the number of the antenna elements is N, and the airplane plies along the X-axis with speed v.The SAR system works in down looking model [8].
Fig. 1. SAR 3D imaging geometry
3-D SAR is using the linear array antenna motion to synthetic a large virtual aperture along the x axis, assume the phase of each antenna location as: P(n) = (xn , yn , zn ), 1 ≤ n ≤ NA
(1)
NA means the total number of antennas, and assume that the scene is consisted of grid, then the position of each grid Ps (m) was expressed as:
410
Z. Yu and K. Liao
Ps (m) = (xm , ym , zm ), 1 ≤ n ≤ Ms
(2)
Suppose that the SAR platform transmit chirp signal, the m-th scatter would receive de-chirped echo expressed as: (3) s(t, n, m) = σs (m) exp(−j2π fc τnm ) exp jπ fch (t − τnm )2 Then expressing 3-D imaging by discrete (3) as: s(n, m, Ps (m)) =
σm χR (rz )χA (rx )χC (ry ) exp(−j2kr)
(4)
m
2.2 Graph Based PointNet Network Framework For the point cloud semantic segmentation task in radar 3-D data, the entire network is based on the improved PointNet architecture. Figure 2 shows the architecture of the graph-based PointNet network. In this network, the models use the vanilla version of the PointNet to process the point cloud [N * 3] input, there are two improvement graphs based MLP (64, 64) layer before the MaxPool layer to get the different receptive filed of point cloud to get more shape and edge information features. The MaxPool layer output the global feature tensor [1 * 2048], then repeat it with the segmentation-one-hot tensor to determine the partial classes (airplane, chair …) of each point cloud. Then add the fusion of global and local features vector [N * 2112] with each output of graph based MLP tensor with different perceptions of the point clouds. Finally, through the graph based MLP (512, 256, 128, Seg), the model output the score and segmentation result tensor [N * (3 + Seg)].
Fig. 2. Graph based PoineNet architectures framework.
In this paper, we make some improvements to the MLP unit of the PointNet network, the graph method detail is introduced as follow. 2.3 Multi-layer Perceptron with Graph Method Multi-Layer perceptron has one or more inputs, biases, activation functions, and a single output. In the PointNet vanilla version, the MLP is a feature extraction and data dimension
Semantic Segmentation of 3-D SAR Point Clouds
411
conversion unit, with the input x, weight vector ω1 , ω2 and offset b1 , b2 , then the MLP output y is: y = g(ω2 f (ω1 x + b1 ) + b2 )
(5)
Where f , g function are the max function and activation function. However, PointNet does not perform well on the segmentation task, which leads to it only calculates the feature information itself and does not collect adjacent information. In recent years, graph signal processing (GSP) has shined in non-European spatial data. Therefore, this time the model adds a graphical method to the MLP to collect neighboring information. The vertices of the graph are the points of the D-dimensional SAR point cloud. Through the KNN algorithm, each point is connected with K nearest neighbors to construct an edge. The vertices are expressed as: V = {v1 , · · · , vn } ∈ RD
(6)
Where D is the third dimensional of MLP input data, then build the graph G = (V , E) as the follow Fig. 4, the model builds the local neighborhood of each single point and use the neighboring information to characterize each point itself, then aggregate these edge feature by a special function to obtain new characterization of each point (Fig. 3).
Fig. 3. Point cloud graph construct and calculated edge feature by graph
The model follows Wang’s approach [9] to define the edge feature eij to complete the description of V , eij was defined as: eij = h (xi , xj ) h : RD × RD = RD
(7)
Where h is a non-linear mapping function just like g() in previous MLP, h has a series of learnable parameters to describe the relationship between xi , xj , considering the sparsity and clustering characteristic of 3-D SAR point cloud, the model define h (xi , xj ) as: h (xi , xj ) = h (xj − xi , αˆ i,j )
(8)
Where xj − xi means the local neighboring information, αˆ i,j is the softmax by αi,j , which means the normalized similarity vector: exp αi,j αˆ i,j = (9) exp αi,j j
412
Z. Yu and K. Liao
αi,j is drawing on the idea of attention mechanism calculate the attentional weight: i j √ i i i (10) αi,j = q · q D q =W x qi is the embedding vector of input point cloud xi , matrix W is the parameter to be learned in MLP, D means the dimension of qi and qj . So the edge feature eij is finally expressed as:
⎞⎞ ⎛ ⎛ j j√ i i exp Wm x · Wm x D ⎟⎟ ⎜ ⎜ ⎟ ⎜ θ + soft max = RELU ⎜ · x − x eijm m j i √ ⎟ ⎠⎠ (11) ⎝ ⎝ j i xi · W xj exp W D m m j Then aggregate the edge feature to xi using the h (xi , xj ) with max aggregate operation: xi = max eijm j:(i,j)∈ε
(12)
The essence of the MLP training process is the process of continuously optimizing the parameter θm and Wm , So the Graph MLP and normal MLP block architecture is shown as Figs. 4 and 5.
Fig. 4. Graph MLP block. It takes the input as a tensor [N * D], compute edge feature for each N point by applying a MLP with the number of neurons defined as an and get the output tensor.
3 SAR Imgaingdata PRE-Processing Ang Graph-Based 3-D SAR Segmantation Model PointNet uses the original form to accept the input with tensor shape [N * 3], which is perfectly match the form of 3-D SAR point cloud, so it can directly input the preprocessing SAR point cloud data to the network and get the segmentation result score.
Semantic Segmentation of 3-D SAR Point Clouds
413
3.1 Point Cloud Pre-processing Model In 3-D SAR imaging, each object point will have side lobes, which will interfere with the segmentation results. The processing process must preprocess the SAR data and eliminate side lobes to improve accuracy as much as possible. In this article, the model uses the voxel grid filtering method in the preprocessing block. Assuming that the BP three-dimensional imaging results as the point set P = {p1 , p2 , · · · , pN } and side lobes noise for each point N = {n1 , n2 , · · · , nN }, the maximum and minimum with the boundary are P max and P min : P max = [xmax, ymax, zmax, ] P min = [xmin ymin zmin ]
(13)
Then determine the voxel grid size r to filter the side lobes noise [10], the test found that the voxel grid size should be large than the second norm of the original point and the side lobes noise point, which should be r ≥ 0.05, then compute the dimension of the voxel grid as follow: Dx = P max (1) − P min (1)r hx = xi − P min (1)r (14) Dy = P max (2) − P min (2) r hy = yi − P min (2) r Dz = P max (3) − P min (3) r hz = zi − P min (3) r Then compute the voxel index for each point to get the filter result index: h = hx + hy Dx + hz Dx Dy
(15)
After sorting the points according to the index h, the pre-processing model output the point cloud filter dataset of 3-D SAR imaging. 3.2 Graph-Based 3-D SAR Segmentation Model After 3-D SAR imaging and point cloud preprocessing model, the filtered 3-D point cloud data is obtained. In order to train the graph PointNet, the model collected Model40 and shapenet data sets to learn parameters and global feature aggregation.The architecture of the graph-based 3-D SAR segmentation model is shown in Fig. 5. In Fig. 5 The model can directly receive the echo signal and convert it into a recognizable graph-based PointNet class form.
414
Z. Yu and K. Liao
Fig. 5. Framework flowchart of SAR 3D point cloud semantic segmentation system.
4 Experimental Results This section simulates the 3-D SAR imaging experiment and obtains the result of the target aircraft, and then extracts the point cloud through the preprocessing block and sends it to the segmentation network. Finally, the segmentation results and accuracy are also evaluated in this chapter. Compared with the graph-based PointNet and the original PointNet, the graph PointNet model has a higher accuracy rate. 4.1 SAR 3-D Imaging and Point Cloud Pre-processing Results Simulation use 3-D BP imaging algorithm to image the airplane object, the simulation platform parameter is shown as follow Table 1. Table 1. SAR imaging simulation parameters. Parameters
Value
Center frequency
37.5GHz
Number of array
51
Bandwidth
0.2GHz
Incidence angle
0°
SAR platform altitude
300m
Figure 6 shows that the 3-D SAR BP algorithm imaging result with imaging grid size 101 * 101 * 101 and voxel filter result point cloud. There are a lot of side lobe noise point cloud in the imaging result. Apply the 3-D imaging result with pre-processing unit to filter the side lob noise, after testing the experiment and finding the best performing threshold of voxel grid size rbest = 0.05, which could excellently filter the side lobe point as much as possible. It can be seen that after voxel grid filtering, the point cloud data are much more identifiable to the naked eyes with lower side lobe noise. In this case, the airplane point cloud filter not only minimizes the side lobes point cloud, but also preserves the local and global feature, which is great for graph PointNet to learn the semantic information.
Semantic Segmentation of 3-D SAR Point Clouds
415
Fig. 6. (a) 3-D SAR BP imaging of airplane with grid size 101 × 101 × 101 (b) Point cloud voxel filter result with voxel grid size r = 0.05
4.2 Graph Based PointNet Training and Segmentation Results In order to compare the performance with graph base PointNet and PointNet, Shapenet part dataset was used to train the model. Shapenet dataset was proposed and open sourced in 2016, it contains 16,881 3D shapes from 16 object annotated with 50 parts in total. It used the official airplane dataset with three segmentation labels to train these two models, the learning curves are as Table 2. Table 2. Model training parameters. Parameters
PointNet
Graph based PointNet
Training dataset
Shapenet
Shapenet
Epoch
3500
1000
Learning rate
0.005 (Momentum)
0.01 (Momentum)
Batch size
64
32
Test batch size
32
8
As shown in Fig. 7, the PoineNet model reached convergence at the 1500th epoch, and the training accuracy reached 0.92; the graph based PointNet model reached convergence at the 800th epoch, and the training accuracy reached 0.97. It can be seen that the graph based PointNet model is more difficult to train, its training time of a single epoch is also longer, because the graph based PointNet needs to firstly select the K neighboring points, and then aggregate the surrounding features, the training process is more difficult to converge, but in terms of the verification results, the graph based PointNet shows a higher accuracy rate.
416
Z. Yu and K. Liao
Fig. 7. Training accuracy and loss curve of (a) PointNet (b) Graph based PointNet
In order to verify the performance of graph-based network, the experiment input the same SAR point cloud data to compare the segmentation results of the two networks, the result is shown as Fig. 8.
Fig. 8. SAR point cloud segmentation result using (a) PointNet (b) Graph based PointNet
The segmentation result show that graph based PointNet has better accuracy and performance in semantic segmentation. Compared with PointNet, graph based PointNet has more advantages in feature aggregation, it suggests that local geometric features are important to 3D segmentation tasks. And the PointNet does not capture the local structure caused by points in the metric space, which limits its ability to recognize finegrained patterns. The experiment proves that the graph method can better learn local and global features to finish 3D segmentation tasks.
Semantic Segmentation of 3-D SAR Point Clouds
417
In order to make the results more comprehensive, we also test several validation datasets of object segmentation with these two networks, the evaluation result is shown in Table 3. Table 3. Segmentaiton results of different datasets. Network
Databases Airplane
Bag
Chair
Guitar
Laptop
Rocket
Table
PointNet (accuracy%)
83.4
78.7
89.6
91.5
95.3
57.9
80.6
Graph based PoineNet (accuracy%)
86.1
83.4
90.6
92.3
95.7
63.5
82.6
Through the experiments, it can be seen that the graph based PointNet using graph aggregation features is more accurate than the PointNet in multiple object segmentation tasks. It also proves that local geometric features are more valuable than individual point features.
5 Conclusion In this article, we propose an improved PointNet graphics method to understand local and global edge features, and use it to complete the 3-D SAR point cloud segmentation task. The solution first uses the point cloud preprocessing block to filter the original 3D SAR imaging to generate point cloud data. The MLP module of the design graph convolutional network is used to aggregate the feature information between the point clouds, so as to fully consider the feature information of the point clouds under different perception fields. After 1000 epoches of training in the training set, The effectiveness of the network is verified by the 3-D SAR simulation data of sampling grids of different sizes. The results show that the segmentation task can be completed on low-resolution 3-D SAR point cloud data, and it has a higher segmentation accuracy than the PointNet network. Acknowledgements. This work was supported part by National Natural Science Foundation of China (61701128), part by Guangxi science and technology project (AD18281061) and by postgraduate innovation program project (2021YCXS036).
References 1. Wei, S.J.: Research on sparse imaging technology of linear array 3-D synthetic aperture radar. University of Electronic Science and Technology of China (2013) 2. Ge, N., Gonzalez, F.R., Wang, Y.: Spaceborne staring spotlight SAR tomography—a first demonstration with TerraSAR-X. arXiv preprint. arXiv:1807.06826v1 (2018) 3. Goodfellow, I., Bengio, Y.: Deep Learning (2016). Book in Preparation for MIT Press. http:// www.deeplearningbook.org
418
Z. Yu and K. Liao
4. Schumann, O., Hahn, M., Dickmann, J., Wöhler, C.: Semanticsegmentation on radar point clouds. In: 2018 21st International Conference on Information Fusion (FUSION), July 2018, pp. 2179–2186 (2018) 5. Qi, C.R., Su, H., Mo, K.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings Computer Vision and Pattern Recognition (CVPR). IEEE (2017) 6. Mou, W.: Object recognition of three-dimensional SAR based on PointNet. University of Electronic Science and Technology of China (2019) 7. Bruna, J., Zaremba, W., Szlam, A., Yann, L.: Spectral networks and locally connected networks on graphs. arXiv:1312.6203 (2013) 8. Ze, K.J., Chi, B.D., Xiao, L.Q., Liang, J.Z.: Urban 3D imaging using airborne TomoSAR: contextual information-based approach in the statistical way. ISPRS J. Photogramm. Remote Sens. 170, 127–141 (2020) 9. Yue, W., Yong, B.S., Zi, W.L., Sarma, S.E.: Dynamic graph CNN for learning on point clouds. arXiv preprint. arXiv:1801.07829v2 (2018) 10. Li, M., Sun, C.: Refinement of LiDAR point clouds using a super voxel based approach. ISPRS J. Photogramm. Remote Sens. 143, 213–221 (2018)
MEIS Medical Engineering and Information Systems
Study on Monitoring and Early Warning Technology of Tick-Borne Zoonosis in Western Liaoning Province Shuyu Hu(B) and Xiaogang Liu Jinzhou Medical University, No. 40, Songpo Section 3, Linghe District, Jinzhou, Liaoning, China
Abstract. In this paper, we collected ticks in western Liaoning and studied the epidemiological rules. Through the analysis of modern molecular biology technology, the result of carrying virus was obtained as the sample detection data. The influence factors such as vector factor, environmental factors, social and economic factors, disease management, public opinion information and other aspects are combined as monitoring and warning parameters, and the Bayesian network is constructed as the core algorithm of early warning technology. Monitoring and early warning of tick-borne zoonosis can effectively prevent the large-scale outbreak of zoonosis. By monitoring the zoonosis carried by ticks, the establishment of early warning system has become an urgent need to protect animal and human health, which is of great significance in the control of zoonosis and protection of human and animal health. Keywords: Tick · Zoonosis · Monitoring · Early warning · Bayesian method
1 Environmental Background In recent years, with the rapid development of animal husbandry and tourism, tick carrying a variety of zoonotic infectious diseases and harming animal and human health is becoming more and more obvious. The rising number of cases of people infected tickborne disease, and tick-borne diseases occur frequently all over the world. Domestic animals caused by tick bite of tick-borne diseases brought huge economic losses to livestock. Also because of the number of tick-borne diseases increased year by year, it poses a serious threat to people’s life and the development of animal husbandry, which has aroused widespread concern. Tick is a parasitic arthropod in vitro, commonly known as grass crawler and dog bean. According to the classification system of Barker, it belongs to the phylum arthropoda, arachnida, subclass acarinae, paracarinae and general family of ticks in animal kingdom. As the world’s second largest pathogen transmission vector, ticks live actively in mountain forests and lawns, and forest lawns are also the optimal areas for grazing cattle and sheep. Livestock, wild animals and people in close contact are vulnerable to infection, and a variety of zoonotic infectious diseases carried by ticks have become more and more prominent hazards to animal and human health. Ticks need to feed blood © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 421–429, 2022. https://doi.org/10.1007/978-981-16-6554-7_47
422
S. Hu and X. Liu
at each stage of development and growth. After being bitten by a tick, the animal will suffer from anemia, emaciation and slow growth. In severe cases, the tick-toxin will cause paralysis and death of the animal. At the same time, ticks absorb more blood and obtain higher energy than other arthropods, thus improving their potential to transmit zoonoses. As an important vector of many zoonoses, ticks are an important zoonotic parasitic disease in vitro. Sample Heading (Third Level). Only two levels of headings should be numbered. Lower level headings remain unnumbered; they are formatted as run-in headings.
2 Tick Investigation 2.1 Domestic Survey Overview According to statistical data, there are nearly 100 known zoonotic diseases transmitted by ticks, and 60 kinds of pathogens can be carried in China. Among them, the most common ones that are seriously harmful to human beings are forest encephalitis, spotted fever, Q fever, Xinjiang haemorrhagic fever, Babesia, etc., and 14 kinds of bacteria can be carried, such as brucellosis and plague. In recent years, there have been serious incidents of tick bites in many provinces of China. In particular, in September 2010, fatal cases of tick bites occurred in Shandong, Henan and other provinces. After study, it was confirmed that the infection of tick-borne new bunyavirus (NBV) was confirmed, which aroused more attention to tick-borne zoonosis. ASF was first reported in Shenyang, Liaoning Province, China, on August 1, 2018, followed by reports in Zhengzhou, Henan Province, Lianyungang, Jiangsu Province, and Yueqing, Zhejiang Province, China. African Swine Fever (ASF) causes A large number of pig deaths and huge economic losses, and there is no effective vaccine for prevention, so it is listed as A Class key infectious disease by OIE. The main vectors were Ornithodoros Moubata (African region) and Ornithodoros Erraticus (European region), both of which belonged to the genus Ornithodoros Moubata. 2.2 Investigation and Result Analysis in Western Liaoning Western liaoning district is located in west of Liiaohe, bordering in Inner Mongolia and Hebei province. It includes five level cities: Jinzhou, Fuxin, Chaoyang, Panjin and Huludao. Liaodong economic zone is in the east, west Hebei, south Bohai bay, and north depends on vast territory, rich resources of Inner Mongolia hinterland, which is semi-humid and semi-arid climate, with low hills, vegetation coverage rate is high, has the suitable natural environment. Here, the agriculture and animal husbandry ecological environment is good, and climatic and environmental conditions in this area are favorable for the growth of various ticks. It is an important epidemic area of tick-borne diseases in Liaoning province. The base year of this survey is 2019, and the method of combining classified sampling survey and field monitoring is adopted. The survey scope is mountainous forest, grassland, livestock and poultry farms and other areas in western Liaoning, and the livestock classification in the qualified areas is comprehensively investigated. A total of 3842 fresh ticks were collected from 5 investigation sites. The species of ticks were identified by
Study on Monitoring and Early Warning Technology
423
morphology, and the infection intensity was counted to determine the dominant species. After data analysis, Haemaphysalis longicornis and Dermacornis forest were found to be the dominant species of ticks in cattle, sheep and dogs in western Liaoning. By testing long-horned blood ticks and forest leather ticks, this paper uses polymerase chain reaction (PCR) method to check the scale of carrying zoonoses. The results found that animals in western Liaoning region carrying two tick-borne zoonoses: hot spots rickettsia and Q fever rickettsia, These two kinds of infectious diseases will appear severe headaches, fever, chills, collapse and muscle pain, dry cough and characterized by interstitial pneumonia. As a result of these symptoms similar to influenza, easily lead to misdiagnosis. Spotted fever rickettsial disease can also cause neurological symptoms including headache, irritability, insomnia, delirium and coma and encephalitis symptoms; In severe cases, blood pressure drops, tissue necrosis and circulatory failure, heart and brain sequelae, and in fulminant cases, even death may result from sudden cardiac arrest.
3 Status and Significance of Tick-Borne Zoonosis Early Warning System In terms of the monitoring and early warning of ticks in foreign countries, the spatial information technology such as remote sensing technology and geographic information system is mainly used to monitor ticks and combined with laboratory testing, so as to find out the high-risk areas where ticks and tick-borne diseases may occur and spread, and aiming to achieve the purpose of preventing as soon as possible. However, some research has been done on the detection and identification of tick species and tick-borne diseases in China, but there are few studies on monitoring and early warning of tick and tick-borne zoonosis, so the following are the main problems: 3.1 Traditional Monitoring Methods, Technologies and Means are Backward Current tick monitoring in China adopts traditional methods of human capture, laboratory testing and virus gene monitoring, but it is not combined with modern information technology and high-tech technology such as big data and artificial intelligence. In the future, tick monitoring using Beidou system may be an effective application development direction. Moreover, due to the different professional level of laboratory technicians, the detection results are often infected by human factors, and the accuracy is not high. 3.2 The Early Warning System Lacks Integrity Domestic relevant studies are still mainly limited to single passive monitoring of tickborne disease epidemic indicators, or targeted monitoring or early-warning analysis of tick-borne disease indicators, without a complete early-warning indicator system. In addition, the information feedback mechanism is not perfect, and the data analysis ability is limited. At present, the surveillance and early warning of tick disease in China has not formed a complete system, which can not respond to the situation in a timely and effective manner.
424
S. Hu and X. Liu
Since, at preset, there is no effective vaccine or prevention method for most tick-borne viruses, it is extremely necessary to integrate the factors affecting the outbreak of tickborne diseases, or the complex ecological environment, human behavior, social economy and disease management in the epidemic process. It is an important measure to prevent and control infectious diseases, especially tick infectious diseases, by strengthening the influencing factors of all kinds of infectious diseases, monitoring and detecting the abnormal changes of the epidemic level, issuing early warning, and making effective prevention and control strategies as soon as possible. It is necessary to establish an efficient, rapid response, long-lasting, feasible multi-factor and multi-level monitoring and warning system to meet the needs of surveillance and early warning of tick-borne zoonoses in China. It is of great significance to minimize the harm to the lives and property of the public caused by the spread of the event, reduce economic losses, provide technical support for improving the monitoring and early warning level of tick disease in China, and finally complete the goal of China’s long-term plan for the control of tick-borne zoonoses.
4 Tick-Borne Zoonosis Early Warning System 4.1 Warning Parameter Analysis Vector Factors. Vector factor is warning information source and warning signs. Vector organisms carry rickettsia, bacteria, viruses, etc., it can keep pathogenic organisms exist in nature over a long period of time. A certain number of hosts and vectors are the necessary conditions for insect-borne and zoonotic diseases to circulate in nature. Monitoring the population distribution of vectors and changes in density and quantity can be used as an important early warning indicator of disease. Ticks can carry 14 kinds of deadly viruses and bacteria, such as rickettsia, which are important sources of infection and transmission vector of human diseases. Temperature, moisture, soil, flora and fauna, distribution, and seasonal changes can all affect tick activity and seasonal fluctuation. Generally, tick activity peaks from May to September. It is particularly important to carry out comprehensive monitoring for tick-borne zoonoses. Environmental Factors. Environmental factors as police information source and warning signs, vegetation coverage, water and land conditions conducive to the environmental quality, temperature, humidity, natural disasters and climate change will directly or indirectly affects human health and the number of ticks and their distribution. For example, warm winter will make a tick live safe through the winter and conducive to reproduce, thus causes the distribution of tick-borne diseases and the risk bigger, so expand the prevalence and scope, and affect the health of people. Socio-Economic Factors. Social and economic factors, such as the density of people, the mobility of population, regional economic development, convenient transportation and living environment, are all important factors related to the spread and prevalence of zoonotic diseases, which will increase the risk of rapid spread of infectious diseases.
Study on Monitoring and Early Warning Technology
425
Network Public Opinion. Online public opinion is the information of warning sources and warning signs, and the management of zoonotic diseases should pay attention to the participation of multiple parties to form a social co-governance pattern. Infectious disease events will attract high attention from the public and social media, and the two are highly correlated. Pay attention to the collection of public opinion data, as a key risk factor for early warning, extract effective data, and then effectively guide and intervene. This study used web crawler technology to collect data on tick bites and zoonotic tick-borne events in the past 10 years as the basic data for early warning. In addition, based on the classification of rickettsia, bacteria, viruses and other data carried by ticks, the number of patients is determined as the early warning parameter for the part of disease management, which is the warning information.
4.2 Design of Early Warning Algorithm The early warning of tick-borne zoonosis is related to many factors, which are interrelated with each other. Therefore, the independent monitoring and early warning for time and space is difficult to meet the real-time effectiveness requirements of early warning. Bayesian model is a very effective algorithm model to analyze the law of the development and change of things from the complex and interrelated warning sources and warning sings. Bayesian networks, also called belief network or directed acyclic graph model, is a probabilistic graphical model that represents a set of variables and their conditions of dependence by directed acyclic graph (DAG). Bayesian networks are well suited for taking events that have already occurred and predicting the likelihood that any of several possible known causes are contributing factors, which can represent probabilistic relationships between diseases and symptoms. If a trait is known, a Bayesian network can be used to calculate the probability of occurrence of various diseases. The Bayesian network method is used to preprocess the original data, make it become stable and then speculate, which belongs to the spatiotemporal aggregation model and solves the problems of uncertainty and incompleteness. It has important practical significance in the early warning of tick-borne zoonosis. Construction of a Bayesian Network for Tick-Borne Zoonosis. Tick-borne zoonosis Bayesian network is to draw a directed graph according to the dependent and independent conditions of the variables involved in the system to describe the conditional dependence relationship among random variables, with circles to represent random variables, and arrows to represent conditional dependencies, as shown in Fig. 1:
426
S. Hu and X. Liu
Fig. 1. Tick-borne zoonosis by Bayesian network
In practical work, more random factors can be considered to improve the Bayesian network, and in each factor, we can build our own Bayesian network. Calculation Method. According to the formula of Bayes’ theorem:
P(A|B) =
P( B|A) P(A) P(B)
(1)
Among them, the conditional probability P (A | B) said, namely in B under the condition of the probability of A. Figure 1 X1, X2, X3 of the Bayesian network for tick-borne zoonosis. The joint distribution formula of X17 is as follows: P(X1, X2, X3 . . . . . . X17) = P(X1) P(X2) P(X3) P(X10|X1, X2, X3) P(X4) P(X5) P(X6) P(X11|X4, X5, X6) P(X7) P(X8) P(X9) P(X16|X7, X8, X9) P(X12|X10, X11) P(X13|X12) P(X14) P(X15|X13, X14) P(X17|X15, X16) (2) For any random variable, the joint probability can be obtained by multiplying the respective local conditional probability distributions.
Study on Monitoring and Early Warning Technology
427
For example, to calculate the probability of tick density, because tick density is only affected by tick-borne factors and environmental factors, without considering the influence of the upper layer of environmental factors, the formula is as follows: P(X10, X11, X12) = P(X10)P(X11)P(X12|X10, X11)
(3)
After simplification, we can get:
P(X 10, X 11, X 12) =
x12
P(X 10)P(X 11)P(X 12|X 10, X 11) ⇒ P(X 10, X 11) = P(X 10) ∗ P(X 11)
(4)
x12
Then according to the obtained probability data Table 1, put into the formula to calculate the corresponding probability, make a prediction. Table 1. Probability table X10
X11
X12 = 0
X12 = 1
0
0
0.3
0.7
0
1
0.6
0.4
1
0
0.2
0.8
1
1
0.7
0.3
Complex links can be completed by means of step-by-step derivation and machine learning. Finally according to the needs of the form of web files, text files to express graphics, tables and other effects, for early warning.
5 Conclusion 5.1 A Subsection Sample The surveillance and early warning system of tick-borne zoonotic diseases is a complex system engineering. The surveillance and early warning index system of tick-borne zoonotic diseases includes epidemic discovery and detection, information processing, epidemic early warning and epidemic response, etc. Due to the popular features of zoonosis is complicated, the variables in the model in different regions, different diseases, different degree, different times have different warning signs and warning information source. These different variable information has important effect for early warning. So it is necessary to have a constant adjustment in the process of implementation. Tick-borne livestock, as a risk warning system, chooses Bayesian algorithm, a combination of a variety of factors, realizing common tick-borne human disease outbreak early recognition of anomalies that completed the basic task of the early warning data analysis. It meet the requests of the timeliness and effectiveness of early warning, also the coincidence rate is high. This algorithm builds a risk early warning system of the western Liaoning region tick-borne human platform, flexible use, easy to operate, safe and reliable performance,
428
S. Hu and X. Liu
will realize the western Liaoning region tick-borne livestock total risk early warning, can timely implement 24 h before the outbreak early warning, and gives the optimal strategy, risk prevention, strong practicability. To effectively control the transmission of tick-borne diseases between humans and animals, so as to develop comprehensive prevention and control measures, reduce the threat of tick-borne diseases to the public safety of residents, breeding industry, animal husbandry and so on in the related region. By detecting zoonosis carried by ticks, the establishment of early warning system has become an urgent need to protect animal and human health. It is of great significance to control zoonosis and make human and animal safe. The establishment of a tick-borne zoonotic disease monitoring and early warning system platform in western Liaoning is of great significance for comprehensively improving the level of teaching and scientific research, prevention and control of zoonotic parasitic diseases in China, and ensuring the health of the Chinese people and the sustainable development of the economy. Source of Funding. 1. Natural Science Foundation of Liaoning Province Project: Research on surveillance and early warning technology of tick-borne zoonosis in western Liaoning Province (2019-ZD-0604) 2. Local Service Project of Liaoning Province: Development and Construction of Artificial Intelligence Optimized Big Data Platform for Monitoring, Early Warning and Prevention of Public Health Emergencies (JYTFW2020009).
References 1. Wang, Z.: Tick-borne zoonosis in Western Liaoning province. Jinzhou Medical University, China (2016) 2. Ye, N., Pei, X., Sun, S.: Ticks and ticks in Heihe port area, China. Chin. J. Zoonoses (08), 761–767 (2018) 3. Zhu, L.: Research advances on detection methods of tick-borne diseases and pathogens. Chin. J. Front. Health Quar. (06), 441–412 (2017) 4. Lu, Y., et al.: J. Parasitol. Med. Entomol. (03), 181–192 (2018) 5. Gu, D., Zhang, M., Chen, H., Chen, X., Ma, Z., Gui, G.: Investigation on the distribution of ticks and pathogens in hilly scenic spots in Suzhou. Shanghai Prev. Med. (08), 652–655 (2018) 6. Zhao, Q., Han, H., Chen, M., Sun, A., Tan, G., Ma, F.: Epidemiological investigation of dermatomycosis in pet dogs and cats in Minhangdistrict, Shanghai. Shanghai Anim. Husb. Vet. Bull. (01), 52–53 (2019) 7. Xiang, R., Yu, B., Shan, X., Deng, F., Xu, X., Liu, Z.: Observation on the effect of sentinel mouse method in monitoring schistosomiasis infection in key waters of Hanchuan city. Chin. J. Parasitol. Parasit. Dis. (01), 46–51 (2016) 8. Li, Z., Yuan, H., Shi, R., Zhao, C., Tian, Y.: The role of sentry animals in the prevention and control of zoonoses. China Anim. Husb. (18), 40–41 (2013) 9. Zheng, H., et al.: Regional warning of schistosomiasis risk based on surveillance of sentry mice in infectious water bodies in 2012. Chin. J. Parasitol. Parasit. Dis. (06), 428–432 (2013) 10. Gan, X.: Study on the epidemic assessment and prediction of schistosomiasis. Huazhong Univ. Sci. Technol. (10) (2011) 11. Sun, Y.: Early warning system and indicators of human zoonotic infectious and parasitic diseases. Mod. Prev. Med. (06), 627–628 (2005) 12. Wang, Q., et al.: Design and implementation of an emergency command system for the outbreak of parasitic diseases. Chin. J. Parasitol. Parasit. Insectic. (05), 393–397 (2014)
Study on Monitoring and Early Warning Technology
429
13. Yang, K., Li, S.: Research on the application of big data mining technology in surveillance and early warning of schistosomiasis. Chin. J. Parasitol. Parasitol. (06), 461–465 (2015) 14. Cheng, T., Feng, L.: Research on China’s food safety risk early warning factors under the background of big data. Sci. Technol. Manag. Res. (17), 175–181 (2018) 15. Xu, Z., et al.: J. Anhui Agric. Sci. (13), 258–260 (2019) 16. He, K.: Design and analysis of high-speed railway monitoring and early warning and emergency platform based on resource sharing. Electron. Des. Eng. (16), 44–47, 52 (2019) 17. Wuting, Qiu, H., Pretty, Y., Xu, Y., Zhang, L.Y.: The role of animal disease monitoring and early warning in the disease prevention and control system. China Livest. Poult. Breed. Ind. (9), 54–55 (2019)
Research on Energy Cost of Human Body Exercise at Different Running Speed Lingyan Zhao, Qin Sun(B) , Baoping Wang, and Xiaojun Wang Shandong Provincial Engineering Lab of Traffic Construction Equipment and Intelligent Control, Shandong Jiao Tong University, Jinan 250357, Shandong, China [email protected]
Abstract. The study of human energy cost during exercise has certain scientific significance and application value in the research fields of human motion biomechanics, sports science and rehabilitation medicine. In order to establish a research method of human energy consumption based on the energy method, a simplified model of the twelve rigid human body in the sagittal plane was established according to the physiological structure characteristics; On this basis, a biomechanical model of human motion based on the body acceleration integral algorithm is established, and a method of human motion energy detection and research based on 12 accelerometers is proposed. For experimental verification, a human body motion energy consumption detection system based on MEMS accelerometer was developed to measure human energy loss at different running speeds. The experimental results show that as the running speed increases, the alternate support period of the left and right feet becomes shorter and the amplitude increases. Therefore, the energy consumption of human exercise will change with the change of running speed. This conclusion provides a reference for the research on energy consumption of human exercise and sports health management. Keywords: Energy cost · Acceleration · Kinetic model
1 Introduction Human health is closely related to physical activity, and there is no standard human energy measurement method that can effectively monitor and manage human health [1]. The acceleration research on human motion plays an important role in the fields of gesture recognition, sports and health, rehabilitation medicine, and physical exercise [2]. Measuring the energy cost of walking is an important method to evaluate the walking function of the human body. The energy consumption of human exercise is positively related to its intensity. The intensity of human activity is affected by its speed, that is, the energy consumption of the human body will increase with the speed of walking. Therefore, the movement speed is the main factor of energy cost when the human body walks and runs. Human walking can be simulated as an inverted pendulum. In this model, the body’s center of mass, potential energy, and kinetic energy are continuously exchanged to minimize mechanical work and energy production [3, 4]. Bouten et al. [5] used a three-axis © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 430–436, 2022. https://doi.org/10.1007/978-981-16-6554-7_48
Research on Energy Cost of Human Body Exercise
431
acceleration sensor and its unique data processing system to measure the amount of human movement. It is found that in multiple experiments, the sensitivity and offset in each measurement direction are the same, and there is a good linear relationship between the human body energy loss and the results measured by the three-axis accelerometer At the same time, Bouten et al. [6] also concluded that the consistency between human energy consumption and the output of a three-axis accelerometer is greater than that of a single-axis accelerometer in an experiment with constantly changing exercise intensity. Dowd et al. [7] compared the results of GT3X, ActivPAL and Cosmed K4B2 accelerometers through experiments on 15 adult women performing daily activities. Finally, it is found that the actual energy consumption of the human body is highly correlated with the predicted energy consumption measured by ActivPAL. Albinali et al. [8] created a somatosensory network that can not only collect the acceleration information of the human upper arms, thighs and buttocks, but also measure the oxygen consumption during exercise. Through the analysis and calculation of these data, the energy consumption of daily exercise of the human body is estimated. However, due to the processing and analysis of data from multiple sensors, there are problems such as a large amount of calculation and insufficient real-time performance. Daniel et al. [9] quantified the impact of five different outdoor terrains on the metabolism of adults, and obtained the difference in energy consumption under different terrain conditions. This paper derives the energy loss during exercise based on the acceleration of human motion, and provides a reference for the research on energy consumption of human motion and sports health management.
2 Human Motion Model Based on Acceleration Method According to the laws of Newton’s mechanics, the speed of motion can be obtained by integrating the acceleration of the motion of an object. In this paper, multiple accelerometers are used to collect the real-time acceleration of the limbs when the human body moves, and then the acceleration data of each group is integrated to calculate the instantaneous speed of the human body movement for subsequent calculations.
Fig. 1. Human body motion rigid body structure model and unilateral limb dynamics model
432
L. Zhao et al.
The kinematics modeling process of the human body in this paper is to solve the kinematics problem of kinetic energy in motion from the relative rotation angle of each limb and the acceleration of the center of mass. The main reference datums in kinematics research include three planes: sagittal plane, coronal plane and horizontal plane. In daily activities, the swing in the sagittal plane is the main form of movement of each limb, so this paper only focuses on the movement model of the human limbs in the sagittal plane. The human body system is simplified into a twelve rigid body model consisting of the head and torso and the shoulder joints, upper arms, elbow joints, forearms, hip joints, thighs, knee joints, calves, ankle joints and feet on both sides, as shown in Fig. 1. In Fig. 1, m represents the mass of each joint. L represents the length of each joint. C represents the coordinates of the centroid point of each joint. l represents the length of the centroid point from each joint. a represents the acceleration of the center of mass point during walking and running. v represents the linear velocity of the centroid point. ω represents the angular velocity of each joint. θ represents the joint angle in the relative coordinate system. It is worth noting that because the torso bending changes slightly during running, the torso is set in the same direction as the coronal axis in this system. And make the following definition: The shoulder rotation angle is the angle between the upper arm and the torso θs . The pivot angle of the shaft joint is defined as the angle between the forearm and the direction along the extension line of the upper arm θq . The hip rotation angle is defined as the angle between the thigh and the extension of the trunk θd . Knee joint rotation angle is defined as the angle between the lower leg and the direction along the extension line of the thigh θx . The ankle angle is defined as the angle between the foot and the extension line of the calf θz . Integrate the acceleration to obtain the relative velocity equation of each centroid point, and obtain the angular velocity equation of each centroid point by transforming the velocity, and then integrate the angular velocity to obtain the angle equation of each centroid point. Then the velocity vectors are summed to obtain the absolute velocity equation of each centroid point, and then the translational kinetic energy equation of each limb’s centroid point is obtained through the velocity equation. In addition, the rotational kinetic energy of the centroid of each limb is obtained from the moment of inertia of the centroid of each limb, and the total kinetic energy of each limb can be obtained. The dynamic analysis of human motion is the main method to study the complexity and regular characteristics of human motion. In the study of human body dynamics, Lagrangian equation method is the most practical and effective method for analyzing rigid body models. In this study, the main body joints involved in physical exercise were taken as the research object, and the human body was simplified into a multi-rigid body dynamic model. As shown in the right figure of Fig. 1, the human body unilateral limb movement model. The expression of Lagrange’s equation is: ∂L d ∂L − = Qj (j = 1, 2, · · · , n) (1) dt ∂ q˙ j ∂qj Where: qj represents the generalized coordinates of the system, n is the number of degrees of freedom of the system, and Qj is virtual work, which is a function of joint torque Mj . The generalized coordinate qj corresponds to the generalized force Qj . According to qj , q˙ j and q¨ j , the joint torque Mj can be calculated.
Research on Energy Cost of Human Body Exercise
433
The calculation formula of joint power is: Pi = Mi ωi . Mi is the net torque of the i joint. ωi = q˙ j − q¨ j−1 , j = i + 1, is the joint angular velocity between limb j and limb j − 1. The joint work can be calculated by power integration: (2) Wi = Pi dt (i = 1, 2, · · · , n)
3 Experiment and Result Analysis 3.1 Human Motion Detection System Based on Acceleration
Fig. 2. Schematic diagram of sensor wearing position.
Taking into account the feasibility of the measurement information and energy consumption derivation involved in the theoretical model of human motion, experiments are needed to obtain the acceleration, angular velocity and angle of each limb. Therefore, it is necessary to build an experimental platform that can accurately measure the information of each limb, which is composed of 12 MEMS acceleration sensors, as shown in Fig. 2. 3.2 Human Exercise Energy Consumption Experiment Under Different Running Speeds The human body consumes different energy under different motion speed states. This paper uses the experimental data obtained to establish a mathematical model of human body acceleration changes during running. And calculate the kinetic energy curve of the human body’s center of mass, and finally calculate the energy consumption of the human body at different running speeds.
434
L. Zhao et al.
People’s running speed can be divided into slow speed, medium speed and fast speed. The slow speed mainly refers to the speed between 4 km/h and 6 km/h, the medium speed mainly refers to the speed between 7 km/h and 9 km/h, and the fast speed mainly refers to the speed above 10 km/h. This article takes the above three speeds as examples to analyze the movement characteristics of each limb when a person runs at different speeds.
Fig. 3. Kinetic energy curve of each limb when the running speed is 12 km/h.
After the acceleration curve is integrated, the kinetic energy curve of each limb as shown in Fig. 3 can be obtained. It can be observed from the figure that the kinetic energy curve of each limb is relatively smooth, and there is no sudden change point, and the curve shapes of the corresponding limbs on the left and right sides have the same regularity. There may be a little difference in amplitude, however, the kinetic energy curve of each limb can accurately reflect the kinetic energy characteristics of human running. The analysis of the kinetic energy change curve of each limb shows that the kinetic energy of the limbs on both sides of the human body changes symmetrically, and the kinetic energy of the limbs decreases during the heel landing period, and the kinetic energy of the limbs increases during the toe off the ground. Based on the kinetic energy
Research on Energy Cost of Human Body Exercise
435
data curve of each limb, the total kinetic energy data curve of human motion can be calculated, and then the change trend of the total kinetic energy of the human body during the gait cycle during running can be studied. Figure 4 shows the curve of the total kinetic energy change of the human body when running on a treadmill at a jogging speed of 6 km/h, a middle running speed of 9 km/h, and a fast running speed of 12 km/h. It can be observed from the figure that the total kinetic energy of the human body has a certain law. There are two crests in one gait cycle of running, which appear in the limb swing period when the human body is running. At this time, the total kinetic energy of the human body reaches the maximum value, and the total kinetic energy of the human body reaches the minimum value when the heel touches the ground. The total kinetic energy of the human body is relatively small when running at a slow speed, but as the speed increases, the total kinetic energy of the human body will increase, especially when running at a fast speed.
Fig. 4. Curves of total human kinetic energy changes at three running speeds.
4 Conclusion The study of energy loss in human sports has important application value in the fields of human sports biomechanics, sports science and rehabilitation science. Based on the development of research on human exercise energy consumption at home and abroad, this paper proposes a human exercise energy detection and research method based on 10 accelerometers. The main contributions are as follows: (1) Based on the energy method to study the energy consumption of human movement, analyze the calculation method of the mechanical work of the human body, and according to the characteristics of the physiological structure of the human body. A simplified model of a twelve rigid human body in the sagittal plane is established. A biomechanical model of human motion based on the body acceleration integral algorithm is established. (2) Using treadmills, wearable motion capture and biomechanical function evaluation systems as the experimental platform, experiments on energy loss under different running speeds were carried out. On this basis, the characteristics of the change between the energy consumption value and the movement speed are described, and the change trend of the energy consumption of the human body under different movement states is analyzed.
436
L. Zhao et al.
Acknowledgement. This work is supported by PHD Startup Fund of Shandong Jiaotong University (BS201902018 and BS201901040), Major Science and Technology Innovation project of Shandong Province (2019JZZY020703), Science and Technology Support Plan for Youth Innovation in Universities of Shandong Province (2019KJB014).
References 1. Satru, R.: Research on energy consumption of human motion based on accelerometer. Harbin Engineering University (2016) 2. Balbinot, G., Schuch, C., Bianchi, H., et al.: Mechanical and energetic determinants of impaired gait following stroke: segmental work and pendular energy transduction during treadmill walking. Biol. Open 9, bio051581 (2020) 3. Tesio, L., Rota, V.: The motion of body center of mass during walking: a review oriented to clinical applications. Front. Neurol. 10, 999 (2019) 4. Gordon, K.E., Ferris, D.P., Kuo, A.D.: Metabolic and mechanical energy costs of reducing vertical center of mass movement during gait. Arch. Phys. Med. Rehabil. 90, 136–144 (2009) 5. Bouten, C.V.C., Koekkoek, K.T.M., Verduin, M., et al.: A triaxial accelerometer and portable data processing unit for the assessment of daily physical activity. IEEE Trans. Biomed. Eng. 44, 136–147 (2006) 6. Bouten, C.V.C., Sauren, A.A.H.J., et al.: Effects of placement and orientation of body-fixed accelerometers on the assessment of energy expenditure during walking. Biol. Fertil. Soils 35, 50–56 (2014) 7. Dowd, K.P., Harrington, D.M., Donnelly, A.E.: Criterion and concurrent validity of the activPALTM professional physical activity monitor in adolescent females. PLoS ONE 7, e47633 (2012) 8. Albinali, F., Intille, S., Haskell, W., et al.: Using wearable activity type detection to improve physical activity energy expenditure estimation. In: Proceedings of the 12th ACM International Conference on Ubiquitous Computing. ACM (2010) 9. Kowalsky, D., Rebula, J., Ojeda, L., et al.: Human walking in the real world: interactions between terrain type, gait parameters, and energy expenditure (2019)
Accurate Localization of Fixed Orthodontic Treatment Based on Machine Vision Xiaoli Sha(B) Pingnan Hospital, Nantong, Jiangsu, China
Abstract. The application of computer vision technology in the process of orthodontics can help doctors accurately understand the displacement state of the fixed appliance worn by the treated person, reduce the dependence on doctors’ experience and operation skill, enhance the treatment effect, and improve the treatment experience. However, the tooth image in orthodontic process has its unique characteristics, such as small tooth target, etc. The above characteristics make the traditional feature extraction method can only achieve the purpose of rough location. In this paper, the method of matched filtering is used to accurately locate the orthodontic fixed appliance in the tooth image, and some key problems in the process of matched filtering are studied deeply. This paper studies the principle of the heavy and difficult points in the process of matched filtering and puts forward an improved method to effectively reduce or eliminate the related interference. At the same time, it also optimizes the algorithm on how to reduce the computer computation and improve the program running speed, so that the tooth image feature extraction based on the principle of optical matched filtering can be carried out efficiently and accurately. Keyword: Orthodontic · Feature extraction · Optical matched filtering · Target recognition and location · Machine vision
1 Introduction After more than a decade of rapid economic development in China, people’s income has increased, and people’s consumption consciousness has also changed. From pure rigid consumption in the past, it gradually involves some spiritual consumption. In recent years, the hot beauty consumption, by a lot of people’s attention. In the medical cosmetology industry, orthodontics is quite popular. As we all know, human teeth are not always arranged in a very neat way. Especially in childhood, when the deciduous teeth are replaced with the permanent teeth, the new permanent teeth will grow irregularly and have different sizes, which makes our oral teeth look not beautiful. At the same time, irregular teeth will lead to some corners difficult to clean in the process of brushing, resulting in dental plaque, dental calculus and other pathological phenomena. Data show that more than 95% of the population have different degrees of malocclusion [1]. Orthodontic treatment is aimed at the above problems, targeted treatment, in order to help people get their own satisfactory, beautiful teeth. In the process of orthodontic © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 437–442, 2022. https://doi.org/10.1007/978-981-16-6554-7_49
438
X. Sha
treatment, it is necessary to adjust the orthodontic appliance step by step with the process of treatment to adapt to the course of treatment. Traditional medical procedures are largely dependent on the physician’s experience and proficiency. In the process of adjusting the orthodontic appliance, the doctor uses his hand to feel the strength feedback of the teeth on the appliance, so as to gradually adjust the position of the appliance. Experienced doctors have better outcomes and experience than other doctors. But the reliance on doctors’ experience, to some extent, can reduce the overall effectiveness of treatment. Especially when the treated children are younger, because the development of their gums is not fully mature, small operation intensity may also lead to a large movement of teeth and their fixed appliance, which will not only affect the treatment effect, but also bring a bad treatment experience to the treated. This topic is devoted to the introduction of computer vision technology into the process of orthodontic treatment [2]. Use technology to reduce reliance on doctors in the treatment process. The displacement measurement system based on vision technology can measure the position of teeth and orthodontic appliance in the process of orthodontic operation. When the orthodontic appliance is adjusted by the doctor, the measuring system can timely measure the position changes of the teeth and the appliance, and feedback to the doctor. By measuring the feedback information from the system, the doctor can know the position of the appliance in real time and make further adjustments as needed. This reduces the dependence on the physician’s experience and proficiency in the treatment process, thereby improving the effectiveness and experience of the treatment. This paper mainly studies the application of visual technology in the process of orthodontic treatment to provide a technical basis for further realization of real-time acquisition of tooth displacement parameters in the treatment process in the future.
2 Measurement Accuracy Analysis The main method of orthodontic treatment is to exert external forces on the teeth of the mouth through various orthodontic devices, so that the teeth move slowly in the gums. Through the gradual adjustment of the appliance over the course of treatment, the tooth is finally moved to the correct position that meets the treatment requirements. Figure 1 shows the worktable for orthodontic treatment. Because of the above characteristics of dental orthodontic treatment, dental orthodontics is a small operation in oral medicine. Common orthodontic treatment uses a common small dental operating table, as shown in the figure. Such treatment environment determines that the distance between the camera and the patient is relatively close in the process of tooth image acquisition, which is about 0.3 m–0.5 m. Shooting at close range provides good shooting conditions for obtaining high-definition and high-resolution images. At the same time, in the process of dental orthodontics, the lighting can improve the bright enough lighting conditions, which also provides conditions for high-speed shooting of camera equipment and high-speed image processing in the later stage. Under the experimental conditions described in this paper, dental image photography is mainly divided into two kinds: direct photography without markers; The image was taken with the fixed orthodontic appliance as the marker [3]. Direct photography without markers refers to the direct photography of the treated teeth, and the photos of the teeth and the
Accurate Localization of Fixed Orthodontic Treatment
439
Fig. 1. The worktable for orthodontic treatment.
surrounding oral cavity are obtained. This kind of photos will be used in the following steps such as image segmentation, feature extraction and so on. Since there are no dental markers, the subsequent process can only be treated according to the characteristics of the teeth. An image shot with a fixed appliance as a marker allows the subsequent process to take advantage of features richer than those of the tooth itself.
3 Ideal Measurement Analysis Under ideal conditions, it is the number of pixels occupied by each tooth in the tooth image that ultimately affects the measurement accuracy of the displacement measurement system described in this paper. Assuming the tooth width is L and the number of pixels occupied is N, then the pixel density of tooth image is: λ =
L . N
(1)
In addition to the width of the teeth, the number of pixels N occupied by the teeth also depends on the parameters of the camera equipment, such as the total number of pixels of CMOS or CCD, the object distance when shooting, and the focal length (or magnification) of the lens. The Sony Alpha7III used in this paper is matched with the Sonyvario-Tessart *FE24-70mmF4ZAOSS lens. The experimental shooting equipment has 24.2 megapixels and a 70 mm telephoto lens, which can achieve high image resolution. The image was taken by the above-mentioned SONY α7III micro-single camera, which has a horizontal pixel number of 6000 and a longitudinal pixel number of 4000.Considering that the head of the treated person may have radial movement relative to the camera, the width of a single tooth accounts for about 1/10–1/30 of the image width in the tooth image. Therefore, the number of pixels of a single tooth is 200–600. It is generally accepted that the actual width of a single tooth is about 1 cm, so the pixel
440
X. Sha
density λ of the image is about 0.05 mm/pixel–0.017 mm/pixel. Under ideal circumstances, if the displacement measurement algorithm can be accurately identified to M pixels in the image, then the measurement accuracy is: η = Ml =
ML . N
(2)
Substituting the value of pixel density, the accuracy of the displacement measurement system described in this paper can reach 0.05 mm to 0.015 mm under ideal circumstances.
4 Non-ideal Measurement Analysis In the process of orthodontic treatment, the treated person is awake (the medical process does not need anesthesia), and there is no stable fixation measures, so the treated person’s head is basically in a free movement state. This makes the target tooth and its markers shift and rotate in the camera image instead of facing the camera all the time. The most significant effect of the matched filtering process is the three-dimensional rotation of the target tooth and its markers. Figure 1 is a three-dimensional schematic diagram of tooth acquisition, in which the photographed object is placed at the origin of coordinates and the camera is located on the Z axis. At this point, the photographed object can have three rotational degrees of freedom: (1) rotate around the X axis; (2) Rotation about the Y-axis; (3) Rotate about the Z axis. The rotation about the Z axis, because it is perpendicular to the camera axis, is equivalent to the rotation in the plane, which is not the main content of this chapter. This chapter focuses on rotations along the X and Y axes (Fig. 2).
Fig. 2. Cartesian coordinate system
Accurate Localization of Fixed Orthodontic Treatment
441
In homogeneous coordinates, a three-dimensional rotation is essentially a left-hand multiplication of the original vector by a rotation matrix: ψ = ν ×T
(3)
Where T is the rotation matrix. The rotation matrices of the rotation angles around X, Y and Z axes are respectively: ⎡
⎤ 1 0 0 0 ⎢ 0 cosθ −sinθ 0 ⎥ ⎥ T(x, ") = ⎢ ⎣ 0 sinθ cosθ 0 ⎦ 0 0 0 1 ⎡ ⎤ cosθ 0 sinθ 0 ⎢ 0 1 0 0⎥ ⎥ T(y, θ) = ⎢ ⎣ −sinθ 0 cosθ 0 ⎦ 0 0 0 1 ⎡ ⎤ cosθ −sinθ 0 0 ⎢ sinθ cosθ 0 0 ⎥ ⎥ T(z, θ) = ⎢ ⎣ 0 0 1 0⎦ 0 0 01
(4)
(5)
(6)
For the measurement accuracy, the influence of image recognition algorithm is mainly reflected in the measurement error of image plane. When the error of the image plane measurement is large, the M value in Eq. (2) is large, which reduces the measurement accuracy of the system. The algorithm of matched filtering shows a good image matching precision in the experiment. In this section, matched filtering is carried out on the targets under different rotation angles to obtain the corresponding identification accuracy data. Under the condition that the rotation Angle of the target in the image to be detected is the same as that of a sub-image in the composite template, the translation operation is carried out to a certain extent, and then the translation is measured by matching filtering algorithm. When the rotation Angle of the target is the standard rotation Angle, the exact matching at the pixel level is achieved. During the experiment, X translation +200 pixels and Y translation −100 pixels were uniformly adopted. The matching results meet the requirements as described above and the matching is correct. At the same time, we can see that the error is within 5 pixels. By substituting the error M = 5 into Eq. (2), the displacement measurement system using the matched filtering algorithm can reach the magnitude of 0.1 mm under the experimental conditions of this paper (Fig. 3).
442
X. Sha
Fig. 3. The images of teeth before and after an orthodontic treatment
5 Experimental Results In this section, the matched filtering method is used to measure the position of tooth images during the process of orthodontic treatment. The principle of orthodontics is to use a fixed appliance to add external force to the teeth, so that the teeth in the gums to move to the appropriate position. During orthodontic treatment, the surgeon’s main concern is the distance between two adjacent teeth. Computer vision is used to measure the distance between adjacent teeth during orthodontics, allowing doctors to adjust the fixed appliance according to the course of treatment.
References 1. Koretsi, V., Kirschbauer, C., Proff, P., Kirschneck, C.: Reliability and intra-examiner agreement of orthodontic model analysis with a digital caliper on plaster and printed dental models. Clin. Oral Invest. 23, 3387–3396 2. Kravitz, N.D., Groth, C., Jones, P.E., Graham, J.W., Redmond, W.R.: Intraoral digital scanners. J. Clin. Orthod. 48, 337–347 3. Leifert, M.F., Leifert, M.M., Efstratiadis, S.S., Cangialosi, T.J.: Comparison of space analysis evaluations with digital models and plaster dental casts. Am. J. Orthod. Dentofac. Orthop. 136, 16.e1–16.e4 4. Lippold, C., Kirschneck, C., Schreiber, K., Abukiress, S., Tahvildari, A., Moiseenko, T., et al.: Methodological accuracy of digital and manual model analysis in orthodontics - a retrospective clinical study. Comput. Biol. Med. 62, 103–109
Speech Stuttering Detection and Removal Using Deep Neural Networks Shaswat Rajput1 , Ruban Nersisson1(B) , Alex Noel Joseph Raj2 A. Mary Mekala3 , Olga Frolova4 , and Elena Lyakso4
,
1 School of Electrical Engineering, Vellore Institute of Technology, Vellore, India
[email protected]
2 Key Laboratory of Digital Signal and Image Processing of Guangdong Province, Department
of Electronic Engineering, College of Engineering, Shantou University, Shantou, China 3 School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India 4 St. Petersburg State University, Saint Petersburg, Russia
Abstract. There are more than 70 million people worldwide who suffer from stuttering problems. This will affect the confidence of public speaking in people who suffer from this issue. To solve this problem many people take therapy sessions but the therapy sessions are a temporary solution, as soon as they leave therapy sessions this problem might arise again. This work aims to use state of the art machine learning algorithms that have improved over the past few years to solve this problem. We have used the dataset from UCLASS archives which provide the data for stuttered speech in.wav format with time-aligned transcriptions. We have tried different algorithms and optimized our model by hyper parameter tuning to maximize the model’s accuracy. The algorithm is tested on random speech data with low to heavy stuttering from the same dataset, and it is observed that there is significant reduction in the Word Error Rate (WER) for most of the test cases. Keywords: Speech stuttering · Classification · Deep learning · Word Error Rate · Filtering
1 Introduction Speech is considered as a universal form of communication. Speech production is a complex process, including the formation and realization of a motor program for pronunciation. Violations of the motor program at the central level (Broca’s zone) and peripheral level (articulatory organs) lead to speech disorders – such as stuttering. Stuttering is characterized by frequent repetition or prolongation of sounds, syllables, words, frequent stops or indecision in speech, breaking its rhythmic flow. Violation of prosodics when stuttering is associated with the occurrence of additional pauses. Stuttering is considered as a social barrier for those who suffer from it and can’t have a proper social life. Our work aims to use the concept of mirror neurons and machine learning algorithms to generate a feedback signal of the user’s voice. There are many research works from © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 443–451, 2022. https://doi.org/10.1007/978-981-16-6554-7_50
444
S. Rajput et al.
various research groups, but all the works are mainly focus on segregating the stammering speech from the normal speech [1]. The approach they used is pre-emphasis of audio and then extracts audio features. Few researchers have used hand movements as a mirroring technique to solve stuttering problems. 1.1 Literature Review Most of the work that has been done is mostly the classification of stuttering. The approach [1] they use is pre-emphasis of the audio and then they extract the audio features and training the classifier models. They used the median frequency, Root Mean Square RMS, standard mean, peak to peak amplitude analysis for the neural network algorithm the features used are specially extracted mean features. Most of the current system uses Artificial Neural Networks (ANN), Hidden Markov Chain (HMM) and Gaussian Models (GMM) [2–4]. Mirror neurons are a new concept that was founded accidentally by Italian neuroscientist, Giacomo Rizzolatti and his group [5]. A mirror neuron, by its activity, reflects the behaviour of another, in the same way as if the observer himself acted. We are proposing a scheme to use the concept of mirror neurons and the machine learning algorithms to detect and solve the problem of stuttering. The existing training system now is to use a professional vocal teacher. While taking sessions may help the people but some research has shown that the aid might be temporary and that the user has to continue the sessions for a longer time. Some people might not go for therapy sessions because of a lack of social confidence or for some other reasons [6]. The presented work that we made might help people to overcome their stuttering problem in the comfort of their home. 1.2 Dataset One of the challenges for this work was to get a time-aligned labeled dataset. The dataset used are for University College London archive of Stuttered Speech (UCLASS) [7]. The dataset is generated by the monologues, conversations and reading recordings from different volunteers of different age group varies from 5 years old to 47 years old, the details shown in Table 1. In total 138 recordings were considered in it 18 female speakers and 120 male speakers are taken for the work. There are two releases of audio files in the dataset, in which the release one has 16 files with time-aligned transcripts, and the release two has 4 files with time-aligned transcripts. But few of the data in both the releases do not have orthographic transcripts and/or time – aligned transcripts. So, we worked with the 16 files from release 1 to train our models. The experimental setup should include a table with a mirror, a microphone connected to a recording device – a computer or tape recorder, and a chair in front of the table. The person will sit down in front of the mirror and will read some pre-written speech. The audio files are recorded in.wav file type. The audio signals are recorded with a sampling rate of 22050 Hz. The time - aligned transcripts of those recording were saved in CHILDES CHAT format [8]. The orthographic transcripts are in plain text.
Speech Stuttering Detection and Removal Using DNN
445
Table 1. Gender and age details of UCLASS release one dataset. Age during recording
Gender
No. of recordings
5 years
Male
1
Female
-
6–10 years
Male
43
Female
9
Male
51
Female
8
15–20 years
Male
14
Female
1
20–30 years
Male
7
Female
-
Male
3
Female
-
Male
1
Female
-
11–14 years
30–40 years Above 40 years
The transcription was prepared by an in-house script which is converted into a CHAT file. The transcripts are assessed with a trained transcriber and 96% agreement was achieved [9].
2 Methodology The system flow is given in Fig. 1. The Audio files are segmented and send through the training model where the speech is corrected and a transcript is also made using IBM Watson’s speech-to-text.
Fig. 1. Block diagram of the system.
446
S. Rajput et al.
2.1 Data Pre-processing The stuttering and normal segments are arranged by splitting the time-aligned transcripts. Total 12,633 such segments were made by splitting the audio files in the time gap of millisecond between the start and the end time. Table 2. Database information for stuttering detection. Parameters
All
Stutter
Normal
No. of segments
12633
2643
9990
Max duration (ms)
17044
17044
14499
Min duration (ms)
0
1
0
Mean duration (ms)
315.09
762.63
196.71
Median duration (ms)
192
486
168
Mode (ms)
109
201
93
From Table 2 we can see that the average duration of all the segments is 315 ms and stuttered segments are 2.5 times longer than the normal segments. The longest stutter was about 17 s long. Training on such skew data will not be useful because it is highly unlikely that we will encounter any stammer segment which is as long as 17 s. So, instead of using the segments of variable duration, we segmented the file segments further down to less than or equal to 300 ms. The time scale chosen based on the average value of the segments as given in Table 2. This segmentation generates 17,545 segments as shown in Fig. 2 which will be used for training the models.
Fig. 2. Audio segments of 300 ms duration.
2.2 Feature Extraction Once the audio segments were prepared, the next stage is feature extraction, the features are extracted from the segment, and once it is extracted then those values are fed to the model for classification. We used Mel-frequency cepstral coefficients MFCC features as they are very representative of human speech and the loudness of the human speech is normally represented by Root Mean Square Error (RMSE), so we have used RMSE as the second feature. The total of 40 features are extracted, that is 39 MFCC features and
Speech Stuttering Detection and Removal Using DNN
447
RMSE feature which constitutes the total features map, so the feature vector comprises of 80 components. Overall the final dataset used for this research contains 17,545 rows. Each row will have 80 features. 2.3 Classification Now with the dataset ready, the next step was to train the classifier models. We used Deep Neural Network, Support Vector Classifier, Decision Trees, Gaussian Naïve Bayes Comparing to SVC, DNN model took very less time (2 min) to train and also produced better results, were as SVC took around 2.5 h to produces the same results. The hidden layers were 3, with 10 neurons in each layer. The learning rate was 0.001 and training epochs were 1,200. Table 3. Classification accuracy of models. Classifiers
Accuracy (%)
Decision Trees
76.98
Gaussian Naïve Bayes
75.22
Support Vector Classifiers
79.32
Deep Neural Network
85.89
2.4 Neural Network Configuration The number of the input is the size of the feature vector used. In our model, we used a learning rate of 0.001, batch size of 100 and the number of epoch to be 1200. If wi is a signal, for each hidden unit hj , then: hj = σ vj · ϕ(wi ) (1) For, optimizer function we used Adam Optimizer. We used the Adam function because it combines the advantage of both the Adaptive Gradient Algorithm and Root Mean Square Propagation (RMSProp). Instead of adjusting the parameter learning rates dependent on the average first moment (the mean) as in RMSProp, Adam likewise makes utilization of the normal of the second moment of the gradient [10].
3 Implementation The developed classifier achieved an accuracy of around 86% using a Deep Neural Network. Now, next in the pipeline is the audio correction.
448
S. Rajput et al.
3.1 Overlapping Segmentation The model is trained on audio segments ≤300 ms, so it’s obvious that the audio to be corrected needs to be segmented with the duration of 300 ms, but what’s less obvious is to detect the stutter boundaries. To improve the quality of the segments, overlapping of segments were used, which is shown in Fig. 3. In the total time scale, overlapping was 200 ms.
Fig. 3. Overlapping segmentation.
With this segmentation, we could detect the stuttered and non-stuttered parts with the granularity of 100 ms. After segmentation, these segments were sent to the classifier for classification. 3.2 Reassembling the Segments The classifier gave the labels of the overlapping segments. So, the segments which were labelled as STUTTER were removed and NORMAL segments were combined. A set difference is taken to get the group of audio that contains NORMAL label. Now there are different ways to assemble the segments. Figure 4 shows the way to assemble the segments in contiguous blocks; this approach produces artificial sounding voices with sharp edges. It is due to the generation of interjections at the points of concatenations.
Fig. 4. Naive re-assembling.
Speech Stuttering Detection and Removal Using DNN
449
So, instead of abruptly attaching the adjacent blocks, the audio samples between the end of the previous block and the beginning of the current block are interpolated which is given in Fig. 5.
Fig. 5. Smoothed re-assembling.
The complete audio correction task was finished in 24 s for all the audio with duration 2 min 50 s. In that 24 s, 98% time was spent for extracting the features. Segmentation and classification were performed instantaneous. 3.3 Speech to Text Now, there are many ways to check the model capability to see if the model is successfully correcting the stuttered voice. One of the best ways is to pass the voice to speech recognition software to see if there is any reduction in Word Error Rate (WER). So, to check the WER, we passed the original and corrected audio through IBM Watson’s Speech to text. We chose IBM Watson because our dataset, UCLASS speech archives is recorded in British English and IBM Watson has a model that is already trained in British English. So, instead of programming our speech to text model, we used IBM Watson’s speech to text program [11].
4 Results As we can see from Table 3, we observe that the WER of the corrected voices is reduced significantly as compared to the original ones. So, we can say that our algorithm worked successfully in removing most of the ‘pause blocks’ or the stutter blocks from the input voice. Table 4. Comparison of WER of the corrected and original voice. Subject
Original (%WER)
Corrected (%WER)
M_1017_11y8m_1
54.35
46.74
M_1017_13y2m_1
61.55
43.12
M_0065_20y1m_1
92.35
76.12
M_0017_19y2m_1
73.07
73.92
450
S. Rajput et al.
As we can see in Table 4, in the first three voices, WER in the corrected one is reduced significantly but in the last voice it is rather increased slightly. The reason behind this is that we have a limited dataset available for training. So, because of this reason we were able to achieve this much accuracy only. With a larger and wide range of dataset we can reduce the WER even more significantly uniformly.
5 Discussion and Conclusion The stuttered speech from the dataset, which will act as input and our model will try to remove the stammering from the speech and the algorithm will provide feedback output of their voice into the headset. Doing this in an iterative loop may help in reducing the stammering in people. It is inferred from Table 4 that the model is performing well in removing the stutter block in the speech. Even the corrected voices made by our model were able to reduce the %WER from the original voice. There is so much which it still can be done to increase the accuracy of the model and to decrease more WER from the stuttered voice. With more modifications to the algorithm and with a considerable amount of dataset, we can achieve better results. As mention above, around 98% of the time was spent for feature extraction, maybe we can find some other feature extraction library (currently using librosa [12]) which can extract feature in less time or even look for more features that are more representative of stammering speech. 5.1 Future Scope The developed method can be used for teaching stuttering patients based on acoustic feedback. Natural speech of the patient modified through the program - perception of modified speech. We assume that multisensory audio-video interaction can contribute to the activation of mirror neurons and help reduce stuttering. The developed algorithm for evaluating pauses can be used in the analysis of emotional speech. The style and intonation of the speech are greatly influenced by the pausing and other speech characteristics. In addition to this, pausing is very important in the perception of human speech [13]. The proposed algorithm is working good in eliminating the stutter block in the speech signal. Children, who are suffering from stuttering, struggle to speak out the words even though they are very clear on the words they want to convey. The stuttered speech fluency of autistic children is disorganized and repetitive. This algorithm can be used to rectify this issue. Acknowledgement. The authors express their sincere gratitude to Vellore Institute of Technology, Vellore for encouragement provided as well as the support and resources to complete the project. We would also thank University College London archive of Stuttered Speech (UCLASS) for the dataset and the volunteers who have provided the data and the Russian Foundation for Basic Research (project 19–57-45008–IND_a) – for Russian researcher and Department of Science and Technology (DST) (INTRUSRFBR382) - for Indian researcher.
Speech Stuttering Detection and Removal Using DNN
451
References 1. Chee, L.S., Ai, O.C., Yaacob, S.: Overview of automatic stuttering recognition system. In: Proceedings International Conference on Man-Machine Systems, pp. 1–6, Batu Ferringhi, Penang Malaysia, October (2009) 2. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012) 3. Kesarkar, M.P.: Feature extraction for speech recognition. Electronic systems, EE, Department, IIT Bombay (2003) 4. Szczurowska, I., Kuniszyk-Jó´zkowiak, W., Smołka, E.: The application of Kohonen and multilayer perceptron networks in the speech non fluency analysis. Arch. Acoust. 31(4(S)), 205–210 (2014) 5. Fabbri-Destro, M., Rizzolatti, G.: Mirror neurons and mirror systems in monkeys and humans. Physiology 23(3), 171–179 (2008) 6. Sanju, H.K., Choudhury, M., Kumar, V.: Effect of stuttering intervention on depression, stress and anxiety among individuals with stuttering: case study. J. Speech Pathol. Ther. 3(1), 132 (2018) 7. Howell, P., Davis, S., Bartrip, J.: The university college London archive of stuttered speech (UCLASS). J. Speech Lang. Hear. Res. 52, 556–569 (2009) 8. MacWhinney, B.: The CHILDES project part 1: the CHAT transcription format (2009) 9. Howell, P., Davis, S., Bartrip, J., Wormald, L.: Effectiveness of frequency shifted feedback at reducing disfluency for linguistically easy, and difficult, sections of speech (original audio recordings included). Stammering Res.: On-line J. Publ. Br. Stammering Assoc. 1(3), 309 (2004) 10. Zhang, Z.: Improved adam optimizer for deep neural networks. In 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018) 11. Gliozzo, A., et al.: Building Cognitive Applications with IBM Watson Services: Volume 1 Getting Started. IBM Redbooks, Armonk (2017) 12. McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th Python in Science Conference, vol. 8 (2015) 13. Schröder, M.: Expressive speech synthesis: past, present, and possible futures. In: Tao, J., Tan, T. (eds.) Affective Information Processing, pp. 111–126. Springer, London (2009). https:// doi.org/10.1007/978-1-84800-306-4_7
Design of Epidemic Tracing System Based on Blockchain Technology and Domestic Cipher Algorithm Chong Leng, Lai Wei, Ziqian Liu, Zhiqiang Wang, and Tao Yang(B) Beijing Electronic Science and Technology Institute, Beijing 100070, China [email protected]
Abstract. With the development of social transportation, the risk of infectious disease transmission is increasing, and the transformation of digital epidemic prevention is imminent. Most of the traditional products serving public health and safety use database and big data technology, which have defects in security and efficiency, and can not meet the needs of privacy protection and data retrieval efficiency. To solve this problem, a design scheme of infectious disease tracking system based on blockchain technology and domestic cryptographic algorithm is proposed, and the design structure, basic implementation process and effect evaluation of the system are described. Keyword: Blockchain technology · Domestic cryptographic algorithm · Epidemic tracing
1 The Introduction With the spread of epidemic diseases in the world, it is urgent to establish a public health emergency response system. However, as information submission, data storage and processing are deployed in different systems, personal information is vulnerable to attack during transmission, processing and storage platforms, resulting in personal information disclosure, abuse and tampering [1]. The traditional security measure is to store the ciphertext generated by information encryption in the database. In this way, the access efficiency is low, and the information is always faced with the risk of being tampered and stolen. Therefore, this paper designs a safe and efficient information collection, storage, management and retrieval system. When encountering public health emergencies, we can simplify the investigation steps, narrow the scope of investigation and target the target population.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 452–460, 2022. https://doi.org/10.1007/978-981-16-6554-7_51
Design of Epidemic Tracing System Based on Blockchain Technology
453
2 The Framework Design In this paper, domestic cryptographic algorithm of SM2 and SM4 hybrid encryption is used to encrypt the personal information submitted in plaintext, and a local blockchain based on hyperledger fabric is built in Linux virtual machine to provide guarantee for the security of data storage and efficient data traceability. The system operation process and overall architecture are shown in the figure below (Figs. 1 and 2).
Fig. 1. The architecture diagram of the epidemic traceability system based on blockchain technology and domestic cryptographic algorithm.
Fig. 2. Flowchart of epidemic traceability system based on blockchain technology and domestic cryptographic algorithm.
2.1 Data Storage and Traceability Principle of Blockchain Application (WEB) The data storage and traceability of blockchain mainly depends on the interaction between decentralized application layer and block chain data block [2]. The web page transmits the data to the blockchain data processing module, and the data is packaged and transmitted to the blockchain after processing, then the data storage process is completed. The tracing process of data is the reverse process of data storage process (Fig. 3).
454
C. Leng et al.
Fig. 3. Architecture diagram of web page and blockchain data interaction. Conceptual model (left) and Actual model (right).
2.2 Encryption Mechanism Based on SM2 and SM4 Hybrid Encryption Algorithm This system uses SM2, SM4 hybrid encryption algorithm to improve the scheme, on the basis of the original hybrid cryptosystem, introduces a binary pseudo-random sequence B, which is generated by the prime modular multiplication congruence method. The plaintext P is identified by binary, and the equal length plaintext and binary pseudorandom sequence are taken, if Bi = 0, the i-th plaintext is separated and stored in the 0 cluster array named P0 , if Bi = 1, the plaintext is stored in the 1 cluster array named P1 . Finally, the empty bits in the array are removed, the P0 and P1 arrays are spliced, and the ciphertext is encrypted by SM4 algorithm, and then the ciphertext is encrypted by SM2. Since the key length of SM4 algorithm is 128, the key space of the algorithm is 128 power of 2. By introducing a 128 bit binary pseudo-random sequence as the component of SM4 key, the key space of this part of the algorithm is increased to 256 power of 2, and the security of the algorithm is significantly improved with the increase of key control [4]. The overall architecture is shown in the Fig. 4:
Fig. 4. Encryption and decryption architecture diagram of the improved scheme based on SM2 and SM4 hybrid domestic cryptographic algorithm
Design of Epidemic Tracing System Based on Blockchain Technology
455
3 Experiment and Result Evaluation 3.1 System Implementation The system is divided into three modules: user, background and administrator (Fig. 5).
Fig. 5. Operation flow chart of the epidemic traceability system based on blockchain technology and domestic cryptographic algorithm.
Each module in the system has independent functions, and its goals are as follows: Client. A web page that can provide registered accounts to citizens to fill in personal identity information, such as telephone numbers, ID number, etc. It provides Android Software that can log in by scanning QR code to help citizens verify their accounts when they travel in public places and upload travel data. Background program. The identity information submitted by citizen registration is encrypted by SM2 and SM4 domestic cryptographic algorithm, then transmitted to the local database, and stored for searching; Package and upload the account information, time and place corresponding to the citizen login to the blockchain; The user-defined algorithm is implemented. When the staff input the identity information of the diagnosed patients, according to their travel conditions, they can trace the data on the blockchain to find out the people suspected to be in contact with the patients. Administrator. Input the identity information of the confirmed patients, and query the suspected contact group; Access the local database, obtain the privacy information of the crowd, and inform them to go to the hospital in time.
456
C. Leng et al.
The implementation process and results are as follows: Build the blockchain network and the webpage interacting with the blockchain (see Fig. 6).
Fig. 6. Building blockchain network
Combine SM2 and SM4 algorithms to encrypt the data obtained from Android and the web (Fig. 7).
Fig. 7. Test results of the encryption part of the domestic cryptographic hybrid algorithm
Design of Epidemic Tracing System Based on Blockchain Technology
457
The search platform obtains the personnel list according to the defined algorithm (Fig. 8).
Fig. 8. Query “Zhang San” travel route and contact crow
3.2 Performance Testing Test the data encryption efficiency of Android platform using SM2, SM4 hybrid algorithm encryption. The three algorithms are tested separately, recording the encryption and decryption time and memory usage. The average value is taken, and the test times are 100 times (see Table 1). Fabric platform performance test (see Table 2)
92.65
92.70
93.25
Parameter
1b
51b
111b
992.590
991.684
990.185
Memory (kb)
Encryption
Time (ms)
Type
SM2
Algorithm
41.90
41.15
33.65
Time (ms)
992.054
991.506
993.585
Memory (kb)
Decryption
93.55
93.40
93.35
Time (ms)
1015.584
1001.570
990.558
Memory (kb)
Encryption
SM4
59.30
49.95
49.55
Time (ms)
992.891
994.728
989.988
Memory (kb)
Decryption
Table 1. Test results of three algorithms
297.6
290.3
289.4
Time (ms)
1028.559
1015.583
1003.571
Memory (kb)
Encryption
SM2,SM4
299.3
293.4
289.6
Time (ms)
2018.324
1015.688
1003.340
Memory (kb)
Decryption
458 C. Leng et al.
Design of Epidemic Tracing System Based on Blockchain Technology
459
Table 2. Fabric platform performance test Project
Call contract TPS (b/s)
Does node access require authentication
Does it support network attacks such as DDoS
Number of tolerant node failures
Index
356.8
Yes
Support
More than N/3
4 Summary In view of the frequent tampering and leakage of privacy data, this paper proposes a design scheme to improve tampering prevention and reduce the risk of data transmission leakage. In order to achieve the above goal, this article through three aspects to achieve. First of all, when the user submits the data form, the SM2 and SM4 hybrid algorithm is used to encrypt and decrypt to reduce the possibility of data transmission leakage; Secondly, the travel information submitted by users is stored in the blockchain to ensure the security and immutability of travel information, and the subsequent data acquisition is convenient through the pre-defined data interface; Finally, the administrator directly interacts with the database through the web page, and retrieves the list of suspected patients who are in close contact with the patient. The administrator obtains the personal information in plain form by decrypting the privacy database, and accurately notify the individual according to the mobile phone number and ID number, so as to help the precise epidemic prevention. After running test, the epidemic tracing system based on blockchain technology and national secret algorithm is designed to ensure that citizens’ personal information is always encrypted in the transmission process, reduce the risk of data leakage, and ensure the data security and integrity of the storage platform. The system not only ensures that the data is not tampered, but also the controllability and security of personal privacy information. Fund project. This research was financially supported by the Key Lab of Information Network Security, Ministry of Public Security (C19614), China Postdoctoral Science Foundation (2019M650606), First-class Discipline Construction Project of Beijing Electronic Science and Technology Institute (3201012).
References 1. Aijun, Z., Sai, L.: Blockchain Path for Public Health Emergencies: Application Scenario, Ethical Risk and Balanced Approach. J. Wuhan Univ. Technol. (Soc. Sci.Ed.). 34(01), 8–14 (2021) 2. Guoying, Z., Yanqin, M.: Blockchain based decentralized data provenance method. J. Nanjing Univ. Posts and Telecommun. (Nat. Sci. Ed.). 39(02), 91–98 (2019) 3. Yi, X., Zhewei, Y.: Medical Application status and Prospect of Block chain Technology. J. Clin. Surg. 28(04), 304–307 (2020) 4. Jianlong, L.: Improvement and application of SM2 and SM4 hybrid encryption algorithm abstract. Inner Mongolia University (2015)
460
C. Leng et al.
5. Liyuan, Y., Dong, D.: Application status and development trend of blockchain technology in medical and health field. J. Med. Inform. 41(01), 50–54 (2020) 6. Yong, Y., Feiyue, W.: Blockchain: The state of the art and future trends. Acta Automatica Sinica 42(04), 481–494 (2016)
Optimization of Gene Translation Using SD Complementary Sequences and Double Codons Dingfa Liang1,2 , Zhumian Huang1 , Liufeng Zheng1,2 , and Yuannong Ye1,2,3(B) 1 Cells and Antibody Engineering Research Center of Guizhou Province, Key Laboratory of Biology and Medical Engineering, School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, China [email protected] 2 Bioinformatics and Biomedical Big Data Mining Laboratory, Department of Medical Informatics, School of Big Health, Guizhou Medical University, Guiyang 550025, China 3 Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang 550025, China
Abstract. Studies have confirmed that the growth, development and maintenance structure and function of all living organisms are determined by their genes, but the speed of gene translation products is affected by various internal and external factors. Therefore, studying the gene translation speed is extremely important for production and improving the speed through a variety of ways has always been an important subject. In recent years, the found of codon usage bias, tRNA recycling model and SD sequence-like structure provide another way to optimize the translation efficiency. These methods are almost based on codon usage theory to optimize gene sequence to speed up or slow down the translation speed so as to regulate the cell expression and metabolism. This paper is combined the three theories above, aimed at E.coli to design some reasonable algorithm to optimize gene sequence and thus using codon adaptation index (CAI) to evaluate the optimized effect. Keywords: E. coli · Optimization of gene translation efficiency · Codon usage bias · tRNA-recyling model · SD sequence-like structure
1 Introduction 1.1 Background and Significance DNA is carrier of genetic information and the key in the genetic codon, which is related to the “decipher” of the book of life. With the continuous deepening of genomics research and the rise of proteomics, it has been become an extremely important topic to regulate the expression products by changing the gene translation speed. Most of the traditional methods only change the promoter strength, or use high-copynumber plasmids, or optimize the cell expression system to change the gene translation speed. In recent years, many new breakthroughs have been made in the research on the factors influencing gene translation speed. Many of the findings are about the influence of codon usage on gene translation speed, which has laid a solid theoretical foundation for our work. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 461–470, 2022. https://doi.org/10.1007/978-981-16-6554-7_52
462
D. Liang et al.
1.2 Research and Current Situation of Factors Affecting Gene Translation Speed by Codon Usage The phenomenon that an amino acid is encoded by at least one codon is called codon degeneracy. These codons that encode the same amino acid are also called synonymous codons. In fact, synonymous codons are not used uniformly in the process of translation, and most organizations tend to use a certain part of them. Those codons frequently used are called optimal codon. The phenomenon of frequent use of some codons is called codon bias, so the use of synonymous codons affects the gene translation speed to a certain extent [1]. In tRNA Recycling, the tRNA diffusion away from the ribosome is slower than translation and some tRNA channeling takes place at the ribosome, if tRNA that is not diffused in time can be utilized again. In the process of translation, when the same amino acid is encountered and the same tRNA is needed, the tRNA that has not been diffused is likely to carry the corresponding amino acid to rejoin the translation process, which is faster and more efficient than waiting for another identical tRNA to participate in the translation. It is well known that the ribosome binds to the SD sequence on the mRNA before translation begins. The degree of complementary between the SD sequence on the mRNA and the anti-SD sequence of rRNA and the distance from the start codon can significantly affect translation initiation efficiency [2]. 1.3 The Value and Significance of the Subject At present, most of the studies on the effect of codon usage on the gene translation speed are single factor studies, most of them are about codon usage bias, and there are few researches on the combined effect of existing factors. So our work is pioneering and inevitable. In this study, we combined the three theories including codon usage bias, tRNArecyling model and SD sequence-like structure inducing transient pausing of ribosomes. Aimed to E. coli to design a comprehensive three effects, reasonable algorithm to optimize the gene sequence, and the codon adaptation index (CAI) was used to evaluate the expression of the optimized sequence.
2 Materials and Methods 2.1 Materials 2.1.1 Codon Usage Bias Codon usage bias wasn’t been realized significance until the heterologous gene expression studied in the 1970s [3]. A bulk of result of studies have proved that codon usage bias was different between Eukaryotes and Prokaryotes, as well as even it exists in the same organism. Not only Highly expressed genes and lowly expressed genes, In the Human genome, to select the synonymous codon has a gap but also the same phenomenon exists even in different regions of the same gene. In addition, the codon usage bias of the highly expressed genes is generally larger no matter what unicellular or multicellular organisms, which can strongly improve accuracy of translation.
Optimization of Gene Translation Using SD Complementary Sequences
463
2.1.2 TRNA Abundance and Gene Expression Related to codon degeneracy are isoaccepting tRNAs that can carry the same amino acid but can have different anticodons. It can often have major or minor tRNAs, but the difference between them are the used frequency of codon by recognition (major tRNAs are much higher than minor tRNAs) [4]. There are many tRNA pools in every organism which are called tRNA abundance [4]. The higher the tRNA Abundance is, the stronger the Codon bias will be, and the higher the codon content of a certain codon in the same gene is, the higher the tRNA abundance will also be. Gene expression is more inclined to use these codons with high tRNA abundance, which will improve gene translation speed. At the same time, researchers found that the codon usage is not only related to tRNA availability but also may be affected by the change of tRNA abundance during development in the study of codon usage bias about Drosophila genome [5]. 2.1.3 Effect of tRNA on Gene Translation Speed Evolution is inclined to the emergence of a multipurpose tRNA that can recognize more than one synonymous codon, aimed to reduce the number of tRNAs to encode 20 amino acids. These highly efficient and multipurpose tRNAs are called isoaccepting tRNAs. For example, serine has six synonymous codons, which are recognize by three isoaccepting tRNAs that each can recognize two codons under the wobble hypothesis. Due to codon usage bias, some tRNAs are used more frequently, while others are used less, which indicates that the difference in tRNAs has an impact on gene translation speed. This conclusion has been confirmed: replacing rare codon with optimal codon will strongly improve the gene translation speed of the organism [3]. 2.1.4 SD Sequence The SD sequence [6] in mRNA is a sequence used to bind prokaryotic ribosomes, which exists 7–12 nucleotides upstream of the AUG start codon in bacteria and archaea. The SD sequence can be combined with the anti-SD sequence on rRNA, and the stronger the degree of binding is, the greater the translation efficiency is. Once the binding is completed, it will point to the downstream AUG start codon and make it start the translation from the AUG start codon. 2.1.5 SD Sequence-Like Structure Inducing Transient Pausing of Ribosomes Gene translation speed is mainly influenced by mRNA and ribosome, which include the degree of combination between them and translation elongation rate. But the degree of combination is depended on the SD sequence of mRNA and the base complementarity of 16rRNA. Protein synthesis by ribosomes at variable rates can occur transient pausing of ribosomes which can affect a number of important process, including protein targeting and folding. The reason for the pause is influenced by the mRNA sequence, mainly due to the fact that the mRNA will hybridize with the 3’-end of the 16SrRNA on the translating ribosome. It was found that the more SD sequence like structures in mRNA sequence and the higher the similarity, the more times of pause of ribosomes and the longer the pause time, that is the slower the translation efficiency.
464
D. Liang et al.
2.2 Method: Algorithm and Effect Evaluation The algorithm in this paper is designed by taking E. coli as an example, which is divided into three kinds. The first method is the combinatorial algorithm of codon usage bias and tRNA recycling model, the second is the algorithm of SD sequence-like structure inducing transient pausing of ribosomes, and the final is a synthetical optimized algorithm above tow. The Codon Adaptation Index(CAI)was used to evaluate the algorithm [7]. 2.2.1 The Combinatorial Algorithm of Codon Usage Bias and tRNA Recycling Model The tRNA abundance of E.coli is mainly derived from GtRNAdb database (http://gtr nadb.ucsc.edu/genomes/bacteria/EschcoliK/12MG1655/). Isotype stands for the name of amino acid and tRNA Count by Anticodon is tRNA abundance, which is expressed by the usage frequency of the tRNA anticodon. There is no number some tRNA anticodons Table 1. tRNA abundance of E. coli. Isotype tRNA Count by Anticodon Four Box tRNA Sets
Tow Box tRNA Sets
AGC
GGC 2
CGC
TGC 3
5
Gly
ACC
GCC 4
CCC 1
TCC 1
6
Pro
AGG
GGG 1
CGG 1
TGG 1
3
Thr
AGT
GGT 2
CGT 2
TGT 1
5
Val
AAG
GAC 2
CAC
TAC 5
7
Phe
AAA
GAA 2
2
Asn
ATT
GTT 4
4
Lys
CTT
6
ATC
GTC 3
3
His
ATG
GTG 1
1
Ile
CTG 2 AAT
GAT 3
Met Tyr
ATA
4
TAT
3 8
GTA 3
3 CTA
ACA
TTA
GCA 1
Trp
CCA 1
1 TCA 1
AGA
GGA 2 CGA 1 TGA 1 ACT GCT 1
Arg
ACG 4 GCG
Leu
AAG
CCG 1 TCG
GAG 1 CAG 4 TAG 1
0 1
SelCys Ser
TTG 2
CAT 8
Supres Cys
Six Box tRNA Sets
TTT 6
Asp Gln Tow Box & Other tRNA Sets
Total
Ala
1 5
CCT 1
TCT 1 7
CAA 1 TAA 1 8
Optimization of Gene Translation Using SD Complementary Sequences
465
on the right, but it does not mean that the tRNA Abundance is zero, it only means that the usage frequency of the anticodon is very low, so it is default that this tRNA abundance is approximately zero in the algorithm (see Table 1). The procedures of combinatorial algorithm between codon usage bias and the tRNArecyling model are presented (see Fig. 1). The first step is to determine the preferred codon of amino acid of E.coli based on the tRNA count by Anticodon shown in Table 1. The second is to optimize the whole sequence using the preferred codon identified in the former. The specific method is to convert all the codons in the sequence that will express the same amino acid into their corresponding preferred codons. For example, we can use the algorithm to convert all ATC codon of E.coli into GTC codon with a relatively high tRNA abundance according to the Table 1, and optimize the whole sequence in the same way.
Fig. 1. The combinatorial algorithm of codon usage bias and tRNA recycling model
2.2.2 The Algorithm of SD Sequence-Like Structure Inducing Transient Pausing of Ribosomes The Initial Idea of the Algorithm. The core idea of the algorithm is to convert the SD sequence-like structure into the least similar sequence than SD sequence in E.coli under the express products unchanged. According to this idea, the corresponding algorithm are designed (see Fig. 2). Firstly, the tow adjacent codons encoding amino acids in the genetic sequence of E.coli are converted to their corresponding amino acids, and then all the synonymous codons corresponding to these two amino acids are combined one by one. To each pair of combinations were aligned with the SD sequence of E. coli using ClustalW [8] and the one with the highest score was the most similar to the SD sequence of E. coli, while the
466
D. Liang et al.
one with the lowest score was the least similar. Using the combination with the lowest score is regarded as the tow adjacent codons optimized and repeat the above methods to optimize the whole sequence in turn.
Fig. 2. SD sequence-like structure inducing transient pausing of ribosomes
The Problem of the Initial Algorithm and Its Improvement Method. The initial algorithm seems perfect in logic, however, there is a big defect. The core idea of the initial algorithm is codon 1 and codon 2 are optimized as the adjacent codons after sequence alignment with SD sequence of E. coli, and then continue to optimize codon 3 and codon 4 as the adjacent codons in the same way. But the ribosomes do not necessarily bind to codons 1 and 2 first according to the design process, and may also bind to codons 2 and 3 first in the actual situation. According to the design idea of initial algorithm, codons 2 and 3 are not optimized as the adjacent codons. In fact, if codon 2 and codon 3 are similar to the SD sequence of E. coli, the algorithm does not achieve the purpose to optimize the sequence. What will be the result if all the adjacent codons are optimized (after optimized the adjacent codon 1 and 2, and then optimize to codon 2 and 3 as the adjacent codons) ? In fact, this method cannot also get the optimal results, because it is not only extremely complex, but also has new defects. When codon 1 and codon 2 are optimized and then codon 2 and codon 3 are optimized, the optimality of codon 2 will be affected. As a result, codon 1 and codon 2 may not be the optimal result for the previously optimized adjacent codons. Therefore, the initial algorithm is not very helpful to optimize sequence to accelerate gene translation speed. Since the initial algorithm did not optimize the sequence better, we need to modify it further (see Fig. 3). After sequence alignment of codon 1 and codon 2 with SD sequence of E. coli, then swap the position of codon 1 and codon 2 to sequence alignment again and calculate the score of the two sequences alignment. In this way, the codon 2 was aligned with the first half of the SD sequence of E. coli while the second half was also aligned. Finally, the score with the lowest score is considered as the best result of the
Optimization of Gene Translation Using SD Complementary Sequences
467
codon 1 and 2. Then sequence alignment of codon 3 and codon 4 or codon 5 and 6 and so on, was performed in the same way. At the same time, we can also eliminate the sequence alignment of the codon 2 and 3, and solving the problem of infinite alignment in the initial algorithm. By using the improved algorithm, SD sequence-like structure in sequence can be reduced as much as possible, so that the number of transient pausing of ribosomes can be minimized to accelerate gene translation speed.
Fig. 3. The improving algorithm of SD sequence-like structure inducing transient pausing of ribosomes
2.2.3 The Algorithm Combining Three Effects The above two algorithms have realized the sequence optimization, but the effect is not the best (it can be clearly seen from the CAI value mentioned below). Because of the two algorithms optimize the sequence separately, when the two algorithms are used together, no matter which algorithm is used to optimize the sequence first, the effect of the former algorithm will be changed when another algorithm is used. For this reason, we propose a comprehensive algorithm that combines the three effects simultaneously. The basic idea of the comprehensive algorithm is based on the above two algorithms with a more better effect (see Fig. 4). Firstly, the algorithm of SD sequence-like structure inducing transient pausing of ribosomes is to all the synonymous codons of the two adjacent amino acids to sequence alignment with the SD sequence of E. coli using ClustalW. Among them, there must be a combination with the lowest score, which is the least similar to the SD sequence of E. coli. For the two adjacent unknown amino acids, there are 20 × 20 = 400 combinations of amino acids that each has its optimal codon. Then, the codon of each amino acid in
468
D. Liang et al.
Fig. 4. The algorithm combining three effects
the combinations of the 400 optimal codons were counted. For the codon of each amino acid, the maximum number of counts were the most likely to be optimal, and it was regarded as the optimal codon of this amino acid. In other words, to use the preference codon redefined by the SD sequence-like structure of E. coli and then combined with the tRNA-recyling model to convert all the codon that will be expressed as the same amino acid in the sequence into the redefined preference codon. That is the comprehensive algorithm that combines the three effects. 2.2.4 Using CAI to Evaluate the Optimization Algorithm Relative Synonymous Codon Usage (RSCU) stands for the j-th codon of the i-th amino acid [7]. The calculation formula is as follows. (1–1) ni Xij /ni ) (1-1) RSCUij = Xij /( j=1
Where X ij represents the emerging frequency of the j-th codon of the i-th amino acid, ni is the number of the synonymous codon encoding the i-th amino acid. Thus, Codon Adaptation Index(CAI) can be defined to encode the j-th codon of the i-th amino acid.(1–2) wij = RSCUij /RSCUi max = Xij /Xi max
(1-2)
Optimization of Gene Translation Using SD Complementary Sequences
469
The meaning of the above formula is the maximum ratio of the observed value of synonymous codons to the number of synonymous codons encoding the amino acid. Thus, the CAI value of a gene can be calculated by the following formula.(1–3) L
CAI = exp(
k=1
InWk /L)
(1-3)
Where L is the number of codons in the gene. CAI is a recognized indicator to evaluate the gene expression level and the CAI value is between 0 and 1, and the greater the value, the stronger the codon bias. Table 2 shows the results of a GFP sequence optimized by the above three algorithms (see Table 2). The CAI value of the optimized sequence by algorithm is higher than original sequence, while the comprehensive algorithm combining the three effects is better than the other two algorithms. Table 2. The results of the GFP sequence optimized by three algorithms CAI
CAI value
The CAI value of the original sequence
0.560360882811
The CAI value of optimized using algorithm of SD sequence-like structure
0.789961136133
The CAI value of optimized with the combinatorial algorithm of codon usage bias and tRNA recycling model
0.789358840046
The CAI value of the comprehensive algorithm combining the three effects
0.876665648612
3 Discussion 3.1 Content This article combines three new theoretical knowledge in recent years, and uses them to design algorithm for E. coli to optimize the sequence, so as to optimize the rate of gene translation. 3.2 The Defects of This Study and the Direction Needing Improvement Due to the lack of knowledge in biotechnology, we didn’t Conduct experimental verification in this work, which leads to may not be practical enough of the optimized sequence algorithm designed in this study. If the result of biological verification is negative, the algorithm needs to be improved or even redesigned, and then continue to carry out experimental verification. The algorithm is only a prediction, even if the recognized indicators show that its effect is excellent, it is not absolutely accurate. Therefore, we need to continue to do experiments to verify whether the algorithm is practical.
470
D. Liang et al.
Acknowledgement. This study was jointly funded by the National Natural Science Foundation of China (32160151), the Science and Technology Foundation of Guizhou Province (2018– 1133,2019–2811), the Science and Technology Fund project of Guizhou Health Commission (gzwjkj2019–1-40), NSFC Incubation Program by Guizhou Medical University (20NSP033).
1. References 1. Akashi, H.: Gene expression and molecular evolution. Current Opinion in Genetics and Development 11(6), 660–666 (2001) 2. Sprengart, M.L., Fuchs, E., Porter, A.G.: The downstream box: an efficient and independent translation initiation signal in escherichia coli. Embo Journal 15(3), 665 (1996) 3. Gustafsson, C., Govindarajan, S., Minshull, J.: Codon bias and heterologous protein expression. Trends in Biotechnology 22(7), 346–353 (2004) 4. Konopka, A.K.: Theory of degenerate coding and informational parameters of protein coding genes. Biochimie 67(5), 455–468 (1985) 5. Moriyama, E.N., Powell, J.R.: Codon usage bias and trna abundance in drosophila. J. Mol. Evol. 45(5), 514–523 (1997) 6. Shine, J., Dalgarno, L.: The 3 -terminal sequence of escherichia coli 16s ribosomal rna: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71(4), 1342–1346 (1974) 7. Sharp, P.M., Li, W.H.: The codon adaptation index--a measure of directional synonymous codon usage bias, and its potential applications. Nucl Acids Res 15(3), 1281–1295 (1987) 8. Thompson, J.D., Gibson, T.J., Higgins, D.G.: Multiple sequence alignment using clustalw and clustalx. Current Protocols in Bioinformatics 00(1) (2003)
Integrated Helicobacter Pylori Genome Database and Its Analysis Liufeng Zheng1,2 , Mujuan Guo1 , Dingfa Liang1,2 , and Yuannong Ye1,2,3(B) 1 Cells and Antibody Engineering Research Center of Guizhou Province, Key Laboratory of Biology and Medical Engineering, School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, China [email protected] 2 Bioinformatics and Biomedical Big Data Mining Laboratory, Department of Medical Informatics, School of Big Health, Guizhou Medical University, Guiyang 550025, China 3 Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang 550025, China
Abstract. The shape of helicobacter pylori (HP) is usually helical or S-shaped, which makes it to have the feature of highly divergent. Besides, its distribution also is different greatly in space and time, which is the factor to lead to multiple infection. In addition, in the same infected person, multiple infection also appeared in different regions of strain gene infection origin. Therefore, it is of great significance to study the genomic diversity and evolutionary analysis of Helicobacter pylori. In this paper, we are to analyze the homology of multiple sequences using ClustalX1.83 and meanwhile to construct the homologous genome evolutionary tree by MEGA7.0 to analyze the its function. The purpose of this study is to compare the genomic evolutionary differences of Helicobacter pylori among populations in different regions, and to discuss the significance of its infection prevalence and genomic variation in geographical changes, so as to conduct further discussion on the molecular level research mechanism of Helicobacter pylori. Keywords: Helicobacter pylori · ClustalX · MEGA · Genome · Phylogenetic tree
1 Introduction Helicobacter pylori was first isolated by Warren and Marshall et al., which is a bacterial strain of gram-negative bacterium, urease-positive and its shape is S-shaped or slightly helical. Helicobacter, mainly parasitic on human gastric mucosa, is the main human pathogen causing chronic gastric mucosa inflammation [1]. The Infection and development increases the risk of some diseases including gastric carcinoma, chronic active gastritis, duodenal ulcer and atrophic gastritis and HP was considered the first class of carcinogenic factor by the World Health Organization in 1994. Epidemiological survey show that the HP infectious rate is about 30% in the western countries, while it is higher than 50% in many Asian countries, and have the higher incidence of gastric carcinoma © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 471–486, 2022. https://doi.org/10.1007/978-981-16-6554-7_53
472
L. Zheng et al.
here [2]. The unique shape of HP with S-shaped or slight helix makes it to have the feature of highly divergent, which result in its severe pathogenicity and high infection rate. There is a rapid progress in genome research of the Helicobacter pylori variation. Since 1997, determination and analysis of full-length genomic sequences of the first HP strain named 26695 was completed, and now the genomic DNA sequence analysis, proteome expression analysis and functional protein research have been completed [3]. So far, thousands of strains have been sequenced or are being sequenced. Helicobacter pylori genome sequencing has found that its genome has the characteristics of highly divergent and rapid evolution. The isolation of HP strains by sequencing the whole genome sequence showed great diversity not only in genomic structure and epigenetic modification, but also in geographical origin [4]. Building evolutionary tree can not only study Helicobacter pylori biological evolution process and roughly estimate divergence time, but also analyze regional origin characteristics of the existing Helicobacter pylori sequence by sequenced, so as to provide a new understanding means of evolution from molecular level and understand the relationship between the global population migration and the geographic origin and evolution of Helicobacter pylori [5]. This study combined genomics and bioinformatics analysis to study the geographical origin and population structure of Helicobacter pylori [6]. NCBI has included effective and highly homologous HP strains. We can download the corresponding 16S rRNA sequences from Sliva database and then use ClustalX 1.83 software for multiple sequence alignment to construct the evolutionary tree by Mega 7.0. Through comparative genomics and bioinformatics methods, we can analyze and understand the characteristics of HP genome and its population structure which know its geographical origin and evolution.
2 Materials and Methods 2.1 Data Acquisition In this study, 299, 214 samples of HP strain genetic data were required to download from NCBI based on the FastGB program. Data were searched and analyzed for sequence similarity based on BLAST database, strains with high homology were collected and classified according to the relevant information such as the name, sequence, geographical origin and characteristics of strains. For genes isolated from different regions and diseases, 1661 valid data of HP genes with 16S rRNA sequences from different regions were integrated in databases of homologous gene [7]. 16S rRNA is commonly found in Prokaryotic cells, with a length of about 1300– 1500 and a high number of copies (accounting for more than 80% of the total bacterial RNA), easy to obtain templates, has high functional homology and moderate genetic information, which is suitable for the analysis of bacterial diversity. It is highly Conservative and specificity. 16S rRNA gene consists of conserved region and variable region, the conserved region reflects kinship among Species, while the variable region mirrors Species differences. Therefore, 16S rRNA sequences of the corresponding strains were selected for data analysis in this study [8]. Compared with 16S rRNA sequences of HP collected from Sliva database, 1337 sequences with 16S rRNA sequence length of more than 1000 or with high homology were screened out. The geographical origin of the sequence was queried, and 1051 sequences with unclear information of geographical
Integrated Helicobacter Pylori Genome Database and Its Analysis
473
origin were screened out. 150 sequences with high similarity from the same region were excluded, and 136 data of H. pylori strains with high homology and clear geographical origin information were finally obtained, all of which were included in this study. 2.2 Variable Ascertainment The 136 data of HP strains obtained ultimately were included in this study: the accuracy of information was verified by comparing with multiple databases. After data analysis based on the GenBank file corresponding to NCBI, the geographical origin information of the country and city of the strain was completed [9]. Accession number of 16S rRNA sequence in Sliva database and continent information of source were classified and processed, so as to make the phylogenetic tree of unique geographic origin features. Among the existing HP sequences included in sequencing, the sequencing of HP was influenced by environmental conditions, including socioeconomic status, cultural differences, research and development. The number of strains from different regions was different, and the data with high sequence similarity from the same region were screened out (sequences with high similarity were susceptible to each other for phylogenetic analysis and were not easy to distinguish). Finally, the composition of geographical origin of 136 HP strains in this study were from 35 countries, including USA, Britain, Egypt, Australia, Peru, China etc., 11 strains from Africa, 42 strains from Asia, 32 strains from Europe, 29 strains from North America, 8 strains from Oceania, 14 strains from South America. Meanwhile, the corresponding 16S rRNA sequence header name is modified to name the strain with the origin region and continent information, so that the sequence homology alignment and the visualization of constructing phylogenetic tree can be enhanced for subsequent analysis. 2.3 Statistical Analysis ClustalX 1.83 software was used to obtain the corresponding gene sequences of strains with high homology, and multiple sequence alignment was conducted to generate the matrix for phylogenetic analysis. The analysis was carried out by comparative genomics and bioinformatics methods in the matrix. The phylogenetic tree was constructed by using MEGA7.0 software with Neighbor Joining (NJ) and Maximum Parsimony (MP), respectively [10]. Multiple experiments are conducted to build the phylogenetic tree with higher confidence and better visualization, to further understand the genomic characteristics and geographical origin and evolution of HP [5].
3 Results and Analysis 3.1 Multiple Sequence Alignment Based on ClustalX 1.83 and Mega, 16S rRNA gene sequences of different HP strains were obtained by multiple sequence alignment, and the results of matrix calculation were found: (a) In the multi-sequence alignment of 136 sequences, the two strains of HP3 (KY463209) from China, HLJHP256 (ALKA01000039) from Egypt had no common
474
L. Zheng et al.
sites with other sequences and could not be counted. Two sequences were deleted based on the results of the multi-sequence alignment matrix. (b) When the 42 sequences from Asia were analyzed by multi-sequence alignment, HLJHP256 (ALKA01000039) from China had no common sites with other sequences, so the sequence was deleted. Multiple sequence alignment provides alignment results for the construction of different phylogenetic tree to ensure the effective construction of the phylogenetic tree. At the same time, the analysis results showed that the genes of HP strains from different continents were greatly affected by different regions.
Fig. 1. Introducing the front segment of the gene sequence
Fig. 2. The posterior portion of an introduced gene sequence
Integrated Helicobacter Pylori Genome Database and Its Analysis
475
Fig. 3. The results of multiple sequence alignment
3.2 Phylogenetic Tree Analysis of Helicobacter Pylori from Different Continents Multiple sequence alignment was performed for 16S rRNA sequences of HP from six continents of Africa, Asia, Europe, North America, Oceania, South America, and MEGA7.0 was used to construct their respective phylogenetic tree. The results of evolutionary tree analysis showed that: (1) The sequence of evolutionary changes of HP from low to high was Africa, Oceania, South America, North America, Europe, Asia. (2) In the phylogenetic tree of 11 strains from Africa, the strains from different African countries were isolated and did not cross, indicating that the mutation of African strains was the least intense. (3) AusabrJ05 from Australia and PNG84A from Papua New Guinea were in the same branch of the phylogenetic tree of the eight Australasian strains, indicating that the geographical relationship between Australia and Papua Guinea New was adjacent may have a closer kinship. (4) The phylogenetic tree of 14 strains from South America was composed of strains from three neighboring countries, Brazil, Colombia and Peru. The phylogenetic tree showed regular results with infrequent crossover, only one strain, Pecan18, from Peru was isolated from the Peruvian strain and genetically close to some strains in Colombia. The Brazilian strain 1199–10 LPB is closer to the Colombian strain PZ5056 and may have gene variation exchange. (5) In the phylogenetic tree of 29 strains from North America, the strains from Mexico and Canada were close to each other. The American strains were distributed at both ends of the phylogenetic tree, which was consistent with the geographical characteristics of the two plates that were relatively distant from the national region. (6) There were 32 strains from 11 European countries, including Britain, France, Germany, Italy, Spain and Sweden, which constituted the phylogenetic tree of European strains. The distribution was disorderly, which indicated that the genetic variation of the strains in this region was intense. (7) The phylogenetic tree of Asian strains was composed of 41 strains from 11 countries including China India Japan Korea Malaysia. The phylogenetic tree results showed that the Asian strains had the highest degree of variation and crossover. But neighboring
476
L. Zheng et al.
strains from Iran, Helicobacter pylori 502 and Pakistan, SR1-GB; and Iraq, GHA33 and IQ-1 and Kuwait, 45; all clustered together in one single cluster. The results show that the regional origin of H. pylori is restricted by social and economic status, cultural differences, population and other environmental conditions. According to the comparative analysis of the phylogenetic tree of six continents, the characteristics of HP of genetic variation are highly consistent with the dynamic evolution of human anthropogenesis and the great migration of the world population, it has diversity of geographic origin. 3.3 Phylogenetic Tree Analysis of Helicobacter Pylori After multi-sequence alignment, MEGA7.0 was used to conduct systematic evolutionary analysis of 134 valid sequences in this study and construct phylogenetic tree. Although the strains from Singapore, China, Japan, Korea, Malaysia and other Southeast Asia regions crossed each other, they were far away from other strains from America, Africa and Europe, showing a certain clustering relations independently. These locations reflect the conserved genetic background and high similarity of strains in Southeast Asian countries and regions, with little relevance to the gene variation of strains in other regions. Except for one Kuwait, 22 strain was isolated from the Asian group and clustered with the African strain. Singapore of 178, China of G272, India of NAD1, Korea of DU15 and Iraq of IQ-1 were isolated from other Asian strains and close to the European and North American strains, with the four Indian strains closer to the European strains. Two strains of African Egypt, Helicobacter pylori7 and Helicobacter pylori11, are far away from other African strains, Brazil of 1199–10 LPB strain and Peru, Puno120 from South
Fig. 4. a, b, c, d, e, f Phylogenetic trees of Helicobacter pylori respective from Africa, Oceania, South America,Asia, Europe, North America.
Integrated Helicobacter Pylori Genome Database and Its Analysis
Fig. 4. continued
477
478
L. Zheng et al.
Fig. 4. continued
Integrated Helicobacter Pylori Genome Database and Its Analysis
479
Africa, has East Asian strain characteristic and close to its gene sequence. Four strains of African strain in South Africa and Angola formed Independent clusters, because the Helicobacter pylori population in this region still maintains a very high conserved genetic characteristics [10]. The three Indian strains V225D, Cuz20, SHI470 and Singapore 178 are in the same evolutionary tree, so it can be inferred that Singapore 178 has the characteristics of Indian strains. The Morocco strain from Africa and the USA strain from North America clustered together, indicating that there was genetic variation exchange between the strains from the two regions, and the geographical origin of the strain was consistent with the characteristics of population migration development in this region. The European and North American strains are closer together and the South American strains are closer to the North American strains on the phylogenetic tree. Strains from Australia were disorderly distributed in the evolutionary tree and crossed with strains from other regions, indicating the diversity of geographical origin of strain populations in Australia.
Fig. 5. Phylogenetic tree of Helicobacter pylori from six continents
4 Discussion Helicobacter pylori plays an important role in Duodenum disease, due to the special morphology and structure, the special way of survival, the diversity of Infection results,
480
L. Zheng et al.
the high frequency of genetic variation is closely related to the development of human disease, etc. Its research has become a hot spot in the fields of gastroenterology and Microbiology. HP strains showed significant genetic diversity and broad population characteristics, which reflected the frequent horizontal gene transfer among the strains. Its genetic structure is highly coincidence of distinct geographic and distribution with geographical characteristics of the species and migratory history of human. Therefore, the geographical origin of HP strain can accurately reflect the main events of human habitation history, which can be mutually researched and confirmed by means of genetic polymorphism analysis of related genes in a specific region, as well as geographical origin difference and classification. In recent years, several Helicobacter pylori genomes, such as 26695, V255D and other completed genome sequences, have been sequenced in different regions of the world, which provides a realistic data reference for the gene comparative study of HP strains. In this paper, ClustalX and MAGE softwares were used to study the regional origin characteristics of Helicobacter pylori, and the classification of strains from different continents and plates and the total phylogenetic tree from different regions were investigated and analyzed based on the existing sequencing. The results showed that the most frequent exchange of gene mutations among the HP strains was found in Europe and Asia, while the African strains were well conserved. Some Asian Helicobacter pylori strains, such as 178, G272, NAD1, DU15, IQ-1, are closely related to strains from Europe and North America. In terms of evolutionary relationship, the European and North American strains are closer to each other and the South American strains are more similar to the North American strains in genetic background. These results fully indicate that the geographical origin and evolution of Helicobacter pylori are closely related to the origin and migration of human beings, and the development track is highly overlapping. By integrating different genomes of Helicobacter pylori strains from different regions, the significant differences in genes of strains from different sources were analyzed. By using comparative genomics and bioinformatics methods, the characteristics of the genome and population structure of Helicobacter pylori were analyzed, the differences among different populations were found out, and the mechanism of Helicobacter pylori gene polymorphism and its relationship with the occurrence of clinical diseases in different regions were discussed. In the regional analysis and origin analysis of Helicobacter pylori, the results show that it has the same value as other identification methods, but the method is more convenient, quick and accurate, which provides a new idea for the study of Helicobacter pylori related issues. At the same time, studies have shown that different Helicobacter pylori exists in some specific genes or gene groups, and may exist in every tiny part of the Helicobacter pylori genome, suggesting that the evolution of Helicobacter pylori is closely related to the regional environment of infected people. Acknowledgement. This study was jointly funded by the National Natural Science Foundation of China (61803112), the Science and Technology Foundation of Guizhou Province (2018–1133, 2019–2811), the Science and Technology Foundation of Guiyang (2017–30-15), the Science and Technology Fund project of Guizhou Health Commission (gzwjkj2019–1-40), and the Cell and Gene Engineering Innovative Research Groups of Guizhou Province (KY-2016–031).
Integrated Helicobacter Pylori Genome Database and Its Analysis
481
Appendix
Table 1. This study involved 136 data of Helicobacter pylori Number
Continent
Country
Strain name
Accession
Region
1
Africa
Angola
K26A1
CP011486
2
Africa
Egypt
Hp3
KY463209
3
Africa
Egypt
Hp7
KY463213
4
Africa
Egypt
Hp11
KY463217
5
Africa
Gambia
Gambia94/24
CP002332
6
Africa
Morocco
G4
MWUG01000071
Rabat
7
Africa
Morocco
HP_106
MWQM01000059
Rabat
8
Africa
South Africa
SouthAfrica7
CP002336
9
Africa
South Africa
CC33C
CP011484
10
Africa
South Africa
SouthAfrica20
CP006691
11
Africa
South Africa
SouthAfrica50
AVNI01000002
12
Asia
China
G272
CP022409
13
Asia
China
XZ274
CP003419
14
Asia
China
C333
AMFG01000008
Hangzhou
15
Asia
China
C664
AMFC01000019
Hangzhou
16
Asia
China
HLJHP253
ALKC01000015
17
Asia
China
HLJHP256
ALKA01000039
18
Asia
China
wls-5–12
AUPY01000063
19
Asia
China
Taiwan-47
JQNY01000041
20
Asia
China
Hp238
CP010013
21
Asia
India
India7
CP002331
22
Asia
India
SNT49
CP002983
23
Asia
India
L7
CP011482
24
Asia
India
NAB47
AJFA02000041
25
Asia
India
NAD1
AJGJ02000073
Delhi
26
Asia
India
BHUHPSKP207
KC525433
Varanasi
27
Asia
Iran
HP502
GU449113
Tehran
28
Asia
Iraq
IQ-1
MF067399
Zhejiang
West Bengal Bangalore
(continued)
482
L. Zheng et al. Table 1. (continued)
Number
Continent
Country
Strain name
Accession
Region
29
Asia
Iraq
GHA33
MH749351
30
Asia
Japan
F13
AP017329
31
Asia
Japan
F209
AP017332
32
Asia
Japan
F94
AP017355
33
Asia
Japan
Hp_TH2099
CP025748
34
Asia
Japan
MKF10
AP017356
35
Asia
Japan
MKM1
AP017359
36
Asia
Japan
98_10
ABSX01000008
37
Asia
Japan
CPY1962
AKNL01000002
38
Asia
Korea
51
CP000012
39
Asia
Korea
52
CP001680
40
Asia
Korea
DU15
CP011483
41
Asia
Kuwait
45
LIXG01000043
42
Asia
Kuwait
22
LIXF01000052
43
Asia
Malaysia
UM032
CP005490.3
44
Asia
Malaysia
UM037
CP005492.3
45
Asia
Malaysia
FD423
AKHM02000112
Kuala Lumpur
46
Asia
Malaysia
GC26
AKHV02000105
Kuala Lumpur
47
Asia
Malaysia
FD506
AKHO02000028
Kuala Lumpur
48
Asia
Pakistan
SR1-GB
HM596600
49
Asia
Singapore
UM299
CP005491.3
50
Asia
Singapore
132A
MJMX01000016
51
Asia
Singapore
178
MJGH01000015
52
Asia
Singapore
428
MKLV01000014
53
Asia
Viet Nam
GD63
CP031558
54
Europe
Britain
26695
AE000511
55
Europe
Britain
518
QBQA01000003
Nottingham
56
Europe
Britain
456
QBQC01000032
Nottingham
57
Europe
France
B38
FM991728
58
Europe
France
908
CP002184
59
Europe
France
2017
CP002571 (continued)
Integrated Helicobacter Pylori Genome Database and Its Analysis
483
Table 1. (continued) Number
Continent
Country
Strain name
Accession
Region
60
Europe
France
2018
CP002572
61
Europe
France
N6
CAHX01000021
62
Europe
France
GC69-HL
QBQJ01000055
63
Europe
Germany
B8
FN598874
64
Europe
Germany
P12
CP001217
65
Europe
Germany
HP87tlpD
CBRK010000032
66
Europe
Germany
HP87P7tlpDRI
CBRL010000029
67
Europe
Germany
Iso6
AZBQ01000002
Berlin
68
Europe
Hungary
367/2013
KC819620
Budapest
69
Europe
Italy
G27
CP001173
70
Europe
Lithuania
Lithuania75
CP002334
71
Europe
Norwegian
35A
CP002096
72
Europe
Portugal
1198/04
JSXT01000004
73
Europe
Portugal
655/99
JSXB01000002
74
Europe
Portugal
499/02
JTDG01000004
75
Europe
Russia
A45
AMYU01000027
Moscow
76
Europe
Russia
E48
AYHQ01000056
Evenk automous region
77
Europe
Russia
H13–1
AYUH01000073
Habarovsk
78
Europe
Spain
HUP-B14
CP003486
79
Europe
Spain
JGF25
QEGJ01000025
80
Europe
Spain
B373
QDJR01000053
81
Europe
Spain
B657-A1
QDJI01000014
82
Europe
Sweden
HPAG1
CP000241
83
Europe
Sweden
60:1_single
QBPH01000005
84
Europe
Sweden
55(:)5
QBPI01000097
85
Europe
Sweden
27(:)4
QBPQ01000020
86
North America
Canada
Aklavik117
CP003483
87
North America
Canada
Aklavik86
CP003476
88
North America
Canada
R046Wa
AMOW01000005
Bordeaux
Barcelona
Aklavik village, Northwest Territories
(continued)
484
L. Zheng et al. Table 1. (continued)
Number
Continent
Country
Strain name
Accession
89
North America
Canada
R056a
AMOY01000003
90
North America
Canada
UMB_G1
AOTV01000001
91
North America
El Salvador
ELS37
CP002953
92
North America
Mexico
29CaP
CP012907
93
North America
Mexico
7C
CP012905
94
North America
Mexico
MCms1055
MIKV01000016
95
North America
Mexico
MM2003–103
MIKR01000090
96
North America
Mexico
CG-IMSS-2012
AWUL01000033
97
North America
USA
26695–1
CP010435
98
North America
USA
26695-1MET
CP010436
99
North America
USA
7.13_1
CP023267
100
North America
USA
7.13_D1a
CP024015
101
North America
USA
7.13_D2c
CP024020
102
North America
USA
7.13_D3b
CP024022
103
North America
USA
7.13_R1a
CP024071
104
North America
USA
7.13_R2b
CP024075
105
North America
USA
7.13_R3c
CP024079
106
North America
USA
B128_2
CP027020
107
North America
USA
FDAARGOS_300
CP027404
Region
Rural region
Mexico city
(continued)
Integrated Helicobacter Pylori Genome Database and Its Analysis
485
Table 1. (continued) Number
Continent
Country
Strain name
Accession
Region
108
North America
USA
HPJP26
CP023448
109
North America
USA
J166
CP007603
Nashville, Tennessee
110
North America
USA
J99
CP011330
Nashville, Tennessee
111
North America
USA
v225d
CP001582
112
North America
USA
Hp A-8
AKOS01000006
Cleveland
113
North America
USA
Hp P-1
AKPI01000005
Cleveland
114
North America
USA
Hp P-28b
AKQM01000006
Cleveland
115
Oceania
Australia
BM013A
CP007604
Perth
116
Oceania
Australia
FDAARGOS_298
CP028325
117
Oceania
Australia
SS1
CP009259
118
Oceania
Australia
ausabrJ05
CP011485
119
Oceania
Australia
OND1954
MVFB01000034
Perth
120
Oceania
Australia
Sahul64
ALWV01000035
Western Australia
121
Oceania
Australia
CD4
HM243135
Sydney
122
Oceania
Papua New Guinea
PNG84A
CP011487
123
South America
Brazil
1199–10 LPB
JN595861
124
South America
Colombia
NQ4053
AKNV01000006
125
South America
Colombia
NQ4216
AKNR01000003
126
South America
Colombia
NQ4099
AKNU01000005
127
South America
Colombia
PZ5019_3A3
MTWM01000038
Tumaco
128
South America
Colombia
PZ5056
ASYU01000257
Tuquerres (continued)
486
L. Zheng et al. Table 1. (continued)
Number
Continent
Country
Strain name
Accession
Region
129
South America
Colombia
22317
MBGQ01000001
Bogota
130
South America
Peru
Cuz20
CP002076
131
South America
Peru
PeCan18
CP003475
132
South America
Peru
Puno120
CP002980
133
South America
Peru
Sat464
CP002071
134
South America
Peru
Shi112
CP003474
135
South America
Peru
Shi470
CP001072.2
136
South America
Peru
SJM180
CP002073
References 1. Kusters, J.G., van Vliet, A.H., Kuipers, E.J.: Pathogenesis of Helicobacter pylori infection. Clin. Microbiol. Rev. 15(3), 14–20 (2003) 2. Warren, J.R., Marshall, B.: Unidentified curved bacilli on gastric epithelium in active chronic gastritis. Lancet. 321(8336), 1273–1275 (1983) 3. Tomb, J.F., White, O., Kerlavage, A.R., et al.: The complete genome sequence of the gastric pathogen Helicobacter pylori. Nat. 388(6642), 539–47 (1997) 4. Furuta, Y.: Diversity in genome and epigenome of Helicobacter pylori. Nippon Saikingaku Zasshi. 70(4), 383–389 (2015) 5. Mégraud, F., Lehours, P., Vale, F.F.: The history of Helicobacter pylori: from phylogeography to paleomicrobiology. Clin. Microbiol. Infect. 22(11), 922–927 (2016) 6. Thompson, L.J, de Reuse, H.: Genomics of Helicobacter pylori. Helicobacter. 7(s1), 1–7 (2002) 7. Goodwin, C.S., McConnell, W., McCulloch, R.K., et al.: Cellular fatty acid composition of campylobacter pylori from primates and ferrets compared with those of other campylobacters. J. Clin. Microbiol. 27(5), 938–943 (1989) 8. Matsumoto, T., Sugano, M.: 16s rrna gene sequence analysis for bacterial identification in the clinical laboratory. Rinsho byori. Jpn. J. clin. Pathol.61(12), 1107–1115 (2013) 9. Brown, G.R., Hem, V., Katz, K.S., et al.: Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 43(D1), 36–42 (2015) 10. Hall, B.G.: Building phylogenetic trees from molecular data with mega. Mol. Biol. Evol. 30(5), 1229–1235 (2013)
The Algorithms of Predicting Bacterial Essential Genes and NcRNAs by Machine Learning Yuannong Ye1,2,3(B) , Dingfa Liang2 , and Zhu Zeng2 1 Bioinformatics and Biomedical Big Data Mining Laboratory, Department of Medical Informatics, School of Big Health, Guizhou Medical University, Guiyang 550025, China [email protected] 2 Cells and Antibody Engineering Research Center of Guizhou Province, Key Laboratory of Biology and Medical Engineering, School of Biology and Engineering, Guizhou Medical University, Guiyang 550025, China 3 Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang 550025, China
Abstract. Essential genes are indispensable for biological survival. Thus it is of great significance to identify and study essential genes. A machine learning method, K-Nearest Neighbor, is used for development of predicting essential bacterial genes. The homologous features, including sequence homology and functional homology, of the bacterial genomes are extracted for determining essential genes. Based on the features, we use K-Nearest Neighbor algorithm for determining of gene function. And we tune the minimum matching parameter (K) in the essential gene predicted model for building an optimal model of the Escherichia coli specificity model. The corresponding optimal parameter (K) is then extended to other bacterial essential genes predicting models. After cross validation, the highest accuracy is 0.89 while K between 5 and 7. Therefore, the features we extracted can increase the accuracy of the bacterial essential gene prediction. In the premise, we found that the prediction accuracy of the prediction model based on K-Nearest Neighbor was not significantly different in different evolutionary distances between organisms in the database and the investigated species. That means the machine learning model can be extended to more distant species. It wills have a better predictive performance for predicting essential genes of distant species than the usual sequence-based methods. Keywords: Essential genes · Machine learning · KNN
1 Introduction Gene is the DNA fragment that has heredity effect, which is the foundation of heredity. Among the many DNA elements, essential genes are the most important ones, which are indispensable for the survival and reproduction of species. The proteins encoding by essential genes are involved in most basic life process [1]. Therefore, research on the bacterial essential genes has a significant meaning. In the process of antimicrobial target discovery, essential genes are more potential to become new targets [2]. The minimum © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 487–493, 2022. https://doi.org/10.1007/978-981-16-6554-7_54
488
Y. Ye et al.
gene set required base on essential genes could make up a synthetic bioengineering chassis [3]. The study of essential genes helps to understand the origin and evolution of life [4]. However, the number of species that have been identified essential genes by experiment method is extremely small, which is greatly limited to research on essential genes [5]. Thus, it is of great significance to develop predicting methods for essential genes. The traditional method of identifying essential genes is the experimental method, such as single gene inactivation [6], transposon mutation [7] and genetic footprint [8]. However, the experimental methods are time-consuming and costly [8]. In addition, some cells are difficult to grow under laboratory conditions. Therefore, it is unrealistic to rely on experimental method to determine the essential genes of such a species. Due to the above experimental methods limitations, the latest version of the DEG database only includes about 40 species essential gene data by experimental methods [5]. The theoretical predicting essential genes method has the advantages of low cost, short cycle and certain reliability. Hence, theoretical methods can be used as a supplement method to predict the essential genes. At present, there are several theoretical methods to predict the essential genes of bacteria: homologous alignment [9], sequence feature-based approach [10, 11], network topology characteristics method [12] and synthetic integration approach. In terms of recognition of essential genes of bacteria, we have developed a general recognition software Geptop for genome-wide recognition of essential genes, which shows that its recognition accuracy is the best so far [13]. But Geptop is time consuming, which takes more than two hours to predict essential genes of a species. The intrinsic features of sequences are easiest to access for constructing a predicting essential model, which can significantly increase the accuracy of the identification of essential genes. In this work, a machine learning method, K-Nearest Neighbor (KNN), is used for development of predicting essential bacterial genes. The homologous features, including sequence homology and functional homology, of the bacterial genomes are extracted for determining essential genes. After cross validation, the highest accuracy is 0.89 while K between 5 and 7. Therefore, the features we extracted can increase the accuracy of the bacterial essential gene prediction. The machine learning model can be extended to more distant species. It wills have a better predictive performance for predicting essential genes of distant species than the usual sequence-based methods.
2 Materials and Methods 2.1 Reference Dataset Construction In machine learning, the quality of reference data sets has a great influence on the predicted performance. The cross-species validation results of Geptop show that not all species achieve high AUC. The essential genes for those species with lower AUC may not be accurate due to differences in experimental conditions. And essential genes for species with higher AUC are more reliable. Therefore, only 24 species with high AUC were selected as reference dataset in this work. Among them, 23 were used as reference species to calculate the homology information between the predicted species and the reference species. Escherichia coli was used to verify the performance of the essential gene prediction model. The sequence and essential annotation information of
The Algorithms of Predicting Bacterial Essential Genes
489
E. coli genes were obtained from PEC database (http://shigen.nig.ac.jp/ecoli/pec/), and finally 699 essential genes and 3407 non-essential genes were obtained. The essential gene annotation information and sequence of the remaining species were obtained from DEG database and NCBI. According to the annotation information, the essential gene sequences and non-essential gene sequences of the remaining 23 reference species were extracted, respectively, as shown in Table 1. Table 1. Information of reference species. Organism
NO. of essential gene NO. of non-essential gene
Acinetobacter ADP1
499
3307
Bacillus subtilis 168
271
4175
Bacteroides thetaiotaomicron VPI 5482
325
4778
Burkholderia thailandensis E264
406
5632
Caulobacter crescentus NA1000
480
3885
Campylobacter jejuni NCTC 11168 ATCC 222 700819
1572
Francisella novicida U112
390
1719
Mycobacterium tuberculosis H37Rv
611
3906
Mycoplasma genitalium G37
378
475
Mycoplasma pulmonis UAB CTIP
310
782
Porphyromonas gingivalis ATCC 33277
463
2089
Pseudomonas aeruginosa UCBPP PA14
335
5892
Salmonella enterica serovar Typhimurium 14028S
105
5315
Salmonella enterica serovar Typhimurium LT2
230
4451
Salmonella enterica serovar Typhimurium SL1344
353
4446
Salmonella enterica serovar Typhi Ty2
358
4352
Shewanella oneidensis MR 1
402
4065
Sphingomonas wittichii RW1
535
4850
Staphylococcus aureus N315
302
2582
Staphylococcus aureus NCTC 8325
346
2767
Streptococcus sanguinis SK36
218
2270
Streptococcus pneumoniae
244
1735
Vibrio cholera O1 biovar El Tor N16961
591
3503
Vibrio cholera O1 biovar El Tor N16961
499
3307
490
Y. Ye et al.
2.2 Methods In this work, we use both sequence homology and functional homology features to characterize essential and non-essential genes. The essential and non-essential genes were theoretically identified and analyzed as the characteristic variables of machine learning. The first step is to analysis of sequence homology. Because negative selection acts more strongly on essential genes, essential genes are more conserved than non-essential genes. One manifestation of this is that essential genes are conserved in more species over longer periods of time. Thus, this is an important feature that distinguishes between essential and non-essential genes. For a predicted gene, we first compared it with both essential and non-essential genes in the reference library by BLASTP or BLASTN with e < 10–10. After sequence alignment, we selected 21 sequences with the highest BLASTP or BLASTN score for prediction of function. Next, the gene function of the predicted gene is determined by the KNN algorithm. Then, the function homology is analyzed. Because essential genes are necessary for growth, so it’s going to be more conservative than non-essential genes and very unlikely to change among species. If a gene is essential, it will be maintained in more species. In other words, if a gene is essential in more species, it’s more likely to become an essential gene. In this work, we set a minimum matching parameter (K). In our algorithm, if the gene function is essential in S species (S is the number of species) and S is greater than or equal to K, that the gene is considered as essential gene. Otherwise, it is an non-essential gene. The algorithm flow chart is shown in the Fig. 1.
3 Results and Discussions To verify the performance of the algorithm, we used the model organisms, E. coli K12 MG1655, to test the accuracy of the algorithm. Its gene name and amino acid sequence were obtained from NCBI and essential data were extracted in DEG. For fear of interfere, the E. coli K12 MG1655 genes are not included in the reference data set. The test data contains data 699 essential genes and 3407 non-essential genes. There are two ways to test algorithm performance in this work. Since a gene name usually reflects its gene function. The first one is use the gene name for predicting a gene is essential or not. The other is use its amino acid sequence to predicting essential function. We used sensitivity (SN), specificity (SP), precision (precision) and accuracy (Acc) to measure the predictive performance of two verify modes, respectively. These indicators are calculated as follows: SN =
TP TP + FN
(1)
SP =
TN TN + FP
(2)
precision =
TP TP + FP
(3)
The Algorithms of Predicting Bacterial Essential Genes
Acc =
TP + TN TP + FP + TN + FN
491
(4)
Where TP indicates the count of the essential genes predicted as essential genes, FP indicates the count of the non-essential genes predicted as essential genes, TN indicates the count of the non-essential genes predicted as non-essential genes, FN indicates the count of the essential genes predicted as non-essential genes. The results of predictive performance are shown in Table 2.
Fig. 1. The flow diagram of our algorithm.
Table 2 shows that the recall rate (SN) is decreased as K value increases of predicting essential genes using both gene name and protein sequence, which means the predicted essential gene ability weaken while the K value increases. However, the predicted nonessential gene ability is strengthening while the K value increases. The precision is increases with K value increases. The precision of predicting essential gene using protein sequence is higher than using gene name under the condition of the same K value. And
492
Y. Ye et al. Table 2. Information of reference species.
Predicting by gene name
Predicting by protein sequence
K-value
SN
SP
Precision
Acc
SN
SP
Precision
Acc
1
0.88
0.82
0.50
0.83
1.00
0.39
0.25
0.49
2
0.54
0.93
0.61
0.86
0.60
0.77
0.35
0.74
3
0.49
0.94
0.63
0.86
0.53
0.86
0.44
0.80
4
0.42
0.98
0.81
0.88
0.47
0.89
0.47
0.82
5
0.40
0.98
0.80
0.88
0.44
0.92
0.53
0.84
6
0.37
0.98
0.79
0.88
0.43
0.94
0.60
0.85
7
0.35
0.99
0.88
0.88
0.38
0.95
0.61
0.85
8
0.31
0.99
0.86
0.87
0.33
0.96
0.63
0.85
9
0.28
0.99
0.85
0.87
0.33
0.96
0.63
0.85
it has the opposite situation when predicting non-essential genes. Both the two ways have the accuracy about 0.85 while K value is between 4 and 7. In the field of essential gene prediction, the accuracy is a satisfactory prediction result. The number of essential genes is much smaller than that of non-essential genes in a genome. If a gene was known both the name and the sequence, it is recommended that prediction essential gene based on the gene name with the K value between 4 and 7.
4 Conclusions Essential genes are an indispensable part for biological survival. It is of great significance to identify and study the essential genes. In this work, a machine learning method, K-Nearest Neighbor (KNN), is used for development of predicting essential bacterial genes. The homologous features, including sequence homology and functional homology, of the bacterial genomes are extracted for determining essential genes. After cross validation, the highest accuracy is 0.89 while K between 5 and 7. Therefore, the features we extracted can increase the accuracy of the bacterial essential gene prediction. The machine learning model can be extended to more distant species. It wills have a better predictive performance for predicting essential genes of distant species than the usual sequence-based methods. Acknowledgement. This study was jointly funded by the National Natural Science Foundation of China (61803112), the Science and Technology Foundation of Guizhou Province (2018–1133, 2019–2811), the Science and Technology Foundation of Guiyang (2017–30-15), the Science and Technology Fund project of Guizhou Health Commission (gzwjkj2019–1-40), and the Cell and Gene Engineering Innovative Research Groups of Guizhou Province (KY-2016–031).
Conflicts of Interest. The authors declare that they have no conflicts of interest to report regarding the present study.
The Algorithms of Predicting Bacterial Essential Genes
493
References 1. Juhas, M., Eberl, L., Glass, J.I.: Essence of life: Essential genes of minimal genomes. Trends Cell Biol. 21(10), 562–568 (2011) 2. Hu, W., Sillaots, S., Lemieux, S., et al.: Essential gene identification and drug target prioritization in aspergillus fumigatus. PLoS Pathog. 3(3), e24 (2007) 3. Wu, G., Yan, Q., Jones, J.A., et al.: Metabolic burden: cornerstones in synthetic biology and metabolic engineering applications. Trends Biotechnol. 34(8), 652–664 (2016) 4. Koonin, E.V.: Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat. Rev. Microbiol. 1(2), 127–136 (2003) 5. Luo, H., Lin, Y., Liu, T., et al.: DEG 15, an update of the database of essential genes that includes built-in analysis tools. Nucleic Acids Res. 49(D1), 677–686 (2020) 6. Rancati, G., Moffat, J., Typas, A., et al.: Emerging and evolving concepts in gene essentiality. Nat. Rev. Genet. 19(1), 34–49 (2018) 7. Salama, N.R., Shepherd, B., Falkow, S.: Global transposon mutagenesis and essential gene analysis of helicobacter pylori. J. Bacteriol. 186(23), 7926–7935 (2004) 8. Gerdes, S.Y., Scholle, M.D., Campbell, J.W., et al.: Experimental determination and system level analysis of essential genes in Escherichia Coli MG1655. J. Bacteriol. 19(185), 5673– 5684 (2003) 9. Juhas, M., Stark, M., von Mering, C., et al.: High confidence prediction of essential genes in burkholderia cenocepacia. PLoS ONE 6(7), e40064 (2012) 10. Aromolaran, O., Beder, T., Oswald, M., Oyelade, J., et al.: Essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput. Struct. Biotechnol. 18, 612–621 (2020) 11. Nigatu, D., Sobetzko, P., Yousef, M., Henkel, W.: Sequence-based information-theoretic features for gene essentiality prediction. BMC Bioinf. 1(18), 473 (2017) 12. Lei, X., Yang, X., Fujita, H.: Random walk based method to identify essential proteins by integrating network topology and biological characteristics. Knowl-Based Syst. 167, 53–67 (2019) 13. Wei, W., Ning, L.W., Ye, Y.N., et al.: Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny. PLoS ONE 8(8), e72343 (2013)
Pneumonia Recognition Based on Deep Learning Shiting Luo, Yinglin He, Jing Wang, Yuxiao Tang, and Yong Xu(B) School of Artificial Intelligence and Smart Manufacturing, Hechi University, Hechi Guangxi, China [email protected]
Abstract. The main manifestations of pneumonia are ground glass shadows and ground glass shadows on the X-rays of the lungs. Therefore, the research on pneumonia recognition is basically focused on the recognition of glass shadows. In this paper, according to the characteristics of pneumonia image in the existing data set, the author also does image registration, enhancement, filtering and denoising, data amplification, binarization, normalization, and other preprocessing on the data set. Based on the VGG16 network, each convolutional layer and a fully connected layer of the supplementary network are used before dating a batch normalization layer. A deep convolutional neural network BNnet with a more complex network level is obtained. And a feature extraction of pneumonia images is associated with it. When training the model, use the migration learning method to date VGG16, combined with the addition of a network layer to train the Kaggle Data Company’s data set, and input the extracted features into a classifier composed of a fully connected layer. Finally, these images are divided into two categories: normal images and pneumonia images. The accuracy rate can reach 0.98, which effectively solves the over-fitting problem caused by the imbalance of sample data. At the same time, the mainstream neural network model is dispersed, and its accuracy, sensitivity, and specificity have been improved, and it has better robustness sexuality and generalization. Keywords: Deep learning · Convolutional neural network · VGG16 · BNnet · Image identification
1 Introduction From ancient times to the present, humankind has encountered countless diseases, among which pneumonia is a significant problem facing humanity. Children belong to the group with low immunity and are susceptible to pneumonia. Internationally, pneumonia accounts for more than 15% of all mortality among children under five years of age. In 2015, 920,000 children under the age of 5 died from these diseases. The outbreak of novel coronavirus pneumonia in December 2019 was highly contagious, causing infections on a global scale, causing severe impacts for the economy and society of all countries, and becoming a public health emergency of international concern. There is no doubt that the effective treatment of pneumonia is essential. Although pneumonia is prevalent in the clinic, it is a difficult task to diagnose pneumonia accurately. It requires © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 494–503, 2022. https://doi.org/10.1007/978-981-16-6554-7_55
Pneumonia Recognition Based on Deep Learning
495
a well-trained expert to examine chest X-rays and past clinical history, vital signs, and medical laboratory examinations. X-rays are the most commonly used medical clinical diagnostic images, and pneumonia is usually manifested as areas of increased opacity on the X-rays. However, the diagnosis of pneumonia on X-ray films is complicated because many other diseases in the lung interfere with the diagnosis of pneumonia, such as bleeding, pulmonary edema, volume reduction (atelectasis or collapse), lung cancer, and lung changes after radiotherapy or surgery Wait. In addition, the fluid in the pleural cavity outside the lungs (pleural effusion) also appears as an increase in opacity on the X-ray film. In addition to the above, many other factors can also affect the imaging and diagnosis of X-ray films. For example, the patient’s positioning and depth of inspiration can change the appearance of X-ray films, further complicating the diagnosis. n, the recognition of medical images mainly relies on the personal knowledge and experience of medical workers. Due to differences in the level of medical workers and individual differences in cases, the recognition is often subjective, and the stability of the recognition results is not enough. It is easy for doctors to miss some subtle changes during diagnosis. Once again, manual identification is time-consuming and laborious. Clinicians need to observe a large number of images every day, which will cause fatigue due to long-term reading of the pictures and cause misdiagnosis and missed diagnosis. Therefore, high-confidence recognition is used as a first step. This relatively scarce medical resource is becoming increasingly challenging to meet the increasing needs, and more accurate and efficient identification of pneumonia has become a more urgent need. In recent years, deep learning has become one of the hottest technologies in data analysis. Because deep understanding has achieved good results, it has been rated as one of the top ten breakthrough technologies in the current scientific research field. Due to the excellent performance of deep learning, the combination of deep learning technology and medical image analysis is of great significance to the development of medicine, especially in clinical medical diagnosis. On the one hand, it can liberate the labor cost. On the other hand, it can make the whole recognition process free from human factors, which improves the recognition accuracy in disguised form and simplifies the recognition process. In summary, it is of great significance to apply deep learning technology to the analysis and processing of medical images such as pneumonia X-rays. 1.1 Neural Network Structure BNnet for Pneumonia Image Classification Based on Feature Fusion This algorithm mainly includes three main parts: feature extraction, feature fusion, and classifier classification. The architecture of the algorithm is shown in Fig. 1. The overall architecture of the algorithm can be seen in Fig. 1. First, the chest radiograph image is preprocessed. Then the ResNet and DenseNet are respectively entered for feature extraction, and then the features extracted by the two networks are merged and then classified by the classifier. The final output algorithm predicts the probability of pneumonia and realizes the auxiliary diagnosis of the chest radiograph. The main
496
S. Luo et al. Classifier
Sigmoid
Fully connected
Global Average Pooling
Feature extraction based on DenseNet
Feature fusion
Concatenate
image preprocessing
Feature extraction based on ResNet
Probability
Pneumo nia
0.92
Fig. 1. Algorithm architecture diagram
structure of the algorithm is briefly introduced above, and the main structure of the algorithm will be introduced separately in the following content of this chapter. After the image is preprocessed, the image will be input into the algorithm for feature extraction. This algorithm uses a mini-batch method to input the data into the network in batches. The input dimension of each batch algorithm is (batch_size, 224, 224, 3); batch_size represents the number of images in each batch. The image data is input into two network structures for feature extraction. 1.2 Feature Extraction Based on ResNet ResNet (Residual neural network) was proposed by Claiming He et al. It successfully trained a neural network with a depth of 152 layers through the use of a residual learning module (Residual block) and won the championship in the ILSVRC 2015 competition with a 3.57% top-5 Error rate, but its parameter quantity is lower than VGG, the effect is quite outstanding. Table 1 shows the Resnet50 framework used in this article. The image input in this article is 224 × 224 × 3, and the output is 7 × 7 × 2048. Table 1. ResNet50 network architecture Layer(type)
Output Shape
Param
InputLayer
(None, 224, 224, 3)
0
Resnet50 (Model)
(None, 7, 7, 2048)
23587712
Batch_Normalization
(None, 7, 7, 2048)
8192
1.3 Feature Extraction Based on ResNet DenseNet (Dense convolution network) was proposed by Gao Huang of Cornell University and others. DenseNet has more Skip-connections than ResNet. On the ImageNet dataset, DenseNet uses about half of the parameters compared to ResNet to achieve the same recognition accuracy. The DenseNet version of the network architecture is designed for the pneumonia dataset, as shown in Table 2. It can be seen here that a 1 × 1 convolutional layer is added before the 3 × 3 convolutional layer of the Dense block, which is the so-called Bottleneck layer. The purpose is to reduce the number of input feature maps, which can reduce the dimensionality and reduce the amount of calculation. The features of each channel of the input feature map are merged, and the number
Pneumonia Recognition Based on Deep Learning
497
of channels of the preset output feature map is 4k. The DenseNet architecture with the addition of the Bottleneck layer is called DenseNet-B in the text. Table 2. Densent network architecture Layer(type)
Output Shape
Param
InputLayer
(None, 224, 224, 3)
0
Densenet121 (Model)
(None, 7, 7, 1024)
7037504
Batch_Normalization
(None, 7, 7, 1024)
4096
1.4 Feature Fusion Pneumonia recognition based on deep learning uses the network architecture of ResNet and DenseNet to perform feature extraction respectively, and the resulting feature dimensions are (batch_size, 7, 7, 2048) and (batch_size, 7, 7, 1024), in order to comprehensively use the two The features extracted by the network take advantage of the advantages of the two networks, and the features extracted by the two networks need to be merged. In order to maximize the use of the features extracted by the two networks and improve the robustness of the features, this paper adopts the fusion function of cascading first and then averaging as shown in mathematical expression (1). (1) y = f xa , xb = avg cat(xa , xb ) According to the fusion function in formula (1), the features extracted by ResNet and DenseNet are first cascaded, and then each channel is averaged. In the network structure of the algorithm in this paper, by using the connection layer (Concatenate) and global average pooling Layer (Global Average Pool), as shown in Fig. 2.
Feature fusion (batch_size,7,7,2048)
Concatenate
Global Average Pool
(batch_size,7,7,1024)
Fig. 2. Feature fusion
(batch_size,3072)
498
S. Luo et al.
As can be seen from Fig. 2, the feature dimensions output by ResNet and DenseNet are (batch_size, 7, 7, 2048) and (batch_size, 7, 7, 1024). The process of fusing the two features is divided into two steps. The first step is to concatenate the characteristics of the two networks to obtain the characteristics of a data dimension (batch_size, 7, 7, 3072), as shown in Fig. 3.
7 x7
7 x7
2048
C
7 x7
3072
1024
Concatenate
Fig. 3. Feature cascade
The second step is to use the global average pooling layer to average the characteristic data of each channel (a total of 3072 channels). The pooling process is shown in Fig. 4.
Fig. 4. Global Average Pooling
After global pooling, each image gets a one-dimensional feature with a length of 3072. The number of images input to the network each time is batch_size, so the final fusion feature dimension is (batch_size, 3072). After the fusion, the features are more robust, and their expression ability is better than a single feature, which is more conducive to the classification of the classifier. 1.5 Loss Function Pneumonia recognition is actually a binary classification problem. The input is a chest X-ray image, and the output is a binary label x ∈ {0, 1} indicating the presence or absence of pneumonia. For a single label in the training set, the loss function constructed in this paper is shown in formula (2). CW =
1 wy ln p + w(1 − y) ln(1 − p) n
(2)
Pneumonia Recognition Based on Deep Learning
499
In formula (2), y is the true label of the sample, p is the prediction result of the model, which is a probability value, and n is the number of training samples in a batch. In order to solve the problem of sample imbalance, this paper adds weight to the loss value of positive and negative samples in the loss function. The definition of WP is shown in formula (3). WP =
P+N P
(3)
WN =
P+N N
(4)
WN is defined as follows
In formulas 2 and 3, P is the number of positive samples and N is the number of negative samples. By weighting, the model can pay more attention to a small number of samples, thereby effectively improving the classification accuracy.
2 Deep Feature Extraction Framework Based on Transfer Learning The current pneumonia recognition algorithm faces a problem. In the feature extraction process, select a convolutional network that has been trained on the ImageNet data set, such as VGG16, use the first several layers of the convolutional network to extract features, and then proceed to follow-up However, due to the relatively large difference between ImageNet and pneumonia X-ray image data, the extracted features will not fit perfectly with the pneumonia image, which will affect the final effect of the model. In order to enable the BNnet network to extract depth features that are effective for classification tasks even when the pneumonia image is insufficient, this article does not use the ImageNet data set in the pre-training of the model, but uses the ChestX-ray14 data set to perform the BNnet network. Pre-training. The ChestX-ray14 data set is a chest X-ray image. It has a lot of consistency with the data set used in this article. It can ensure that the extracted features are more in line with the image in this article. Then, the pre-trained model is transferred to the pneumonia image in this article for learning. So as to achieve the extraction of the depth features of the pneumonia image. Moreover, this article also sets the learning rate of parameter update during the migration learning process, and sets the global learning rate to one-tenth of the initial learning rate of the ChestX-ray14 data set.
3 Experiment and Analysis 3.1 Development Environment and Parameter Configuration The training of the network model is carried out under the Windows 10 system. The environment built is Python3.5.2, cuda-9.0, cudnn-7.0, tensorflow 1.10.0-gpu version, Opencv4.0.0 version, GTX-1070 graphics card with 8G video memory. The next test can process an image in 0.04 s, about 25 fps, as shown in Fig. 5.
500
S. Luo et al.
Fig. 5. Development environment
This experiment also calls some tools from python and tensorflow, such as tensorboard, (this tool can visualize the network training process, which means that users can monitor the changes in network training accuracy and loss rate) pyqt, config, etc., The specific environment and version required for the experiment are shown in Table 3 (Table 4). Table 3. Experimental development environment Equipment
Parm
CPU
CPUi7–8700
GPU
NVIDIA GTX-1070
Operating system
Windows 10
Development environment
Tensorflow-keras
Programming language
Python3.5
RAM
16GB Table 4. Parameter configuration
Number of layers
9
Max pooling
2× 2
Activation function
RELU
Strides
1
Optimizer
Momentum
Dropout
0.5
Initial learning rate
0.01
Classification function
Softmax
Loss function
Cross entropy
Batch size
32
Pneumonia Recognition Based on Deep Learning
501
3.2 Evaluation of BNnet Network Test Results This study uses a ten-fold cross-validation method to evaluate the performance of the neural network designed in this article. The neural network model designed in this article has been introduced in detail in Sect. 3. The accuracy of the BNnet network is shown in Fig. 6.
Fig. 6. BNnet network accuracy rate change curve
It can be seen from Fig. 6 that the neural network in this experiment has undergone a series of preprocessing operations and tested using the Kaggle data set. The final training accuracy rate is about 99.8%. The change of the loss value is shown in Fig. 7.
Fig. 7. BNnet network loss value change curve
In order to test the actual prediction effect, the 1000 chest X-ray films were divided into 10 sample groups, each sample group contained 100 images, of which 80 normal images and 20 pneumonia images were input into groups of ten samples. The network
502
S. Luo et al.
makes predictions and obtains 10 sets of verification data, and calculates the average probability of the experimental data. (A is the number of correctly identified normal chest radiographs in each sample group, b is the number of chest radiographs misjudged as normal in each sample group, and c is the number of chest radiographs misjudged as pneumonia in each sample group, d is the number of chest X-rays accurately identified in each sample group, m is the number of chest X-rays in each verification sample group, n is the number of sample groups). Accuracy a+b 1 n × 100% (5) k=1 N m Sensitivity 1 n a × 100% k=1 a + c N
(6)
1 n d × 100% k=1 b + d N
(7)
Specificity
According to the above calculation formula, the final accuracy, sensitivity and specificity obtained by using the neural network are shown in Table 5. Table 5. BNnet model prediction results Test sample
Accuracy
Sensitivity
Specificity
Average of 10 groups of samples
98.08%
96.70%
98.09%
4 Conclusion In this paper, the method of convolutional neural network is used to realize the recognition of pneumonia images. In the course of the experiment, it is found that the noise of pneumonia image, the small sample size and the small difference between adjacent samples are the main factors that affect the accuracy of pneumonia recognition. Therefore, a series of preprocessing operations are adopted before the pneumonia data set is used as a training sample, including image registration, SMOTE algorithm, histogram equalization, median blur processing, image enhancement and data amplification, and image binarization. In the construction of neural network, the mature image classification model VGG16 is introduced, the feature extractor and classifier of the neural network are trained and designed, the batch normalization layer is introduced to improve the performance of the model, and then the model is pre-trained by migration learning and evaluate the model by cross-validation. Experiments show that the method in this paper
Pneumonia Recognition Based on Deep Learning
503
can effectively solve the problem of data imbalance in image classification. It also can extract the essential features of pneumonia from the image, and the model classification performance is better. Funding Statement. The authors are highly thankful to The College Students’ Innovative Entrepreneurial Training Plan Program for Program for its financial support (No. 202010605049).
References 1. Enhui, W.: Chinese Image Medicine-Pneumonia Volume, pp. 3–12. People’s Medical Publishing House, Beijing (2002) 2. Wang, X.: Image diagnosis and evaluation of pneumonia. Chin. Foreign Med. Radiol. Technol. 4–5 (2000) 3. Rafie, M., Namin, F.S.: Prediction of subsidence risk by FMEA using artificial neural network and fuzzy inference system. Int. J. Min. Sci. Technol. 25(04), 655–663 (2015) 4. Wang Ying, L., Cuijie, Z.C.: Coal mine safety production forewarning based on improved BP neural network. Int. J. Min. Sci. Technol. 25(02), 319–324 (2015) 5. Timmers, J.M.H., et al.: The breast imaging reporting and data system (BI-RADS) in the Dutch breast cancer screening programme: its role as an assessment and stratification tool. Eur. Radiol. 22(8), 1717–1723 (2012). https://doi.org/10.1007/s00330-012-2409-2 6. Oliver, A., Freixenet, J., Marti, J.: A review of automatic mass detection and segmentation in mammographic images. Med. Image Anal. 2(2), 34–36 (2010) 7. Zhang, W., Lu, X., Wu, L., Zhang, M., Li, J.: Research progress of classification technology based on typical medical images. Prog. Laser Optoelectron. 1–16 (2018)
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue for Identification of Cancer J. Anitha Ruth1(B)
, Vijayalakshmi G. V. Mahesh2 and P. Ramkumar4
, R. Uma3
,
1 SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
[email protected]
2 BMS Institute of Technology and Management, Bangalore, Karnataka, India
[email protected]
3 Sri Sairam Engineering College, Chennai, Tamil Nadu, India
[email protected]
4 Sri Sairam College of Engineering, Bangalore, Karnataka, India
[email protected]
Abstract. In this work a study is conducted on the Breast tissues using machine learning algorithms to identify the Breast cancer. The paper proposes a hierarchical two stage classification frame work utilizing the features provided by the Breast tissue dataset of UCI machine learning database with SVM_RBF, kNN and Decision tree algorithms. The experimental results and their analysis at stage-1 binary classification indicated better performance of the method. Further the analysis under multi class classification specifies the robustness of the methodology by providing reduction in false alarm rates. Keywords: Hierarchical classification · Electrical Impedance Spectrscopy · Breat tissue · Machine learning · Breast cancer
1 Introduction The anatomy of breast tissue plays a vital role in early cancer detection and helps doctors in deciding whether therapy or surgical operation is needed for the patient. Though there are techniques such as Xrays, Ultra sonographic, histology, biomarkers and cytological analyses available to analyze internal architecture of the breast tissue, but diagnosing through electrical impedance method [1] has provided a better insight of tissue structure in breast cancer diagnosis method. At comprehensive level, the risk factors of breast tissue can be analyzed by its internal composition such as fat, fluids and morphological structure of tissue which is known through impendence method. Even though impendence method provides a better results it is difficult to reconstruct the anatomical images from electrical impedance measurements. As it needs high number of electrodes to improve the image resolution and high computational processing workload is required to obtain an image showing the electrical distribution of biological tissue in © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 504–515, 2022. https://doi.org/10.1007/978-981-16-6554-7_56
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
505
concordance. To overcome the difficulty in reconstructing the anatomical images [2] for identifying the deceased breast tissue, Here, we apply machine learning algorithms for accurate classification of breast tissue as normal or pathological. Machine learning algorithms [3] are statistical models that analyze and predict the given data every easily. They are broadly divided into two type’s namely supervised and un-supervised learning algorithms. Supervised learning has labeled data and they are trained accordingly whereas unsupervised learning models cluster the data based on similarities. In this paper, we apply three different classifiers namely SVM with RBF kernel, k-NN and Decision tree on the data set taken from UCI Machine learning Repository to classify the breast tissue as normal or pathological tissue. This paper proposes a method with two stage classification [4] in a hierarchical manner. In the first stage a binary classification was done and in second stage a Multi class classification was performed on the breast tissue dataset obtained from UCI machine learning repository. In binary classification the breast tissue was classified into normal or pathological and in multi classification the breast tissue was classified into six classes based on tissue characterization. The rest of this paper is organized as follows. Section 2 gives an overall view of the related work. Section 3 deals with the methodologies used in the work. Section 4 provides a portrayal of the datasets utilized in the simulation and performance evaluation of the proposed method. At last, conclusion of the proposed method is presented in Sect. 5.
2 Related Work In recent years various methods have been put forward by many researchers to identify or predict breast cancer using attributes obtained from different modalities such as (i) images acquired through MRI, ultrasound, mammogram (ii) biomarkers (iii) histology (iv) characteristics of breast tissues etc. [5] (Chagovets et al. 2020) has proposed a method that accurately detects breast cancer with Mass Spectroscopy method. Statistical analysis was performed on 50 samples from 25 patients. Orthogonal projections were used for analysis and the projection was done on latent structures which classified normal tissue from cancerous tissue, [6] (Truong et.al 2015) proposes a Bayesian neural network (BNN) classifier to handle small data with high complex multiparameters. Cross validation is employed to increase the performance. The use of BNN for detecting breast cancer from the tissues increases the overall performance. [7] (Estrela Da Silva et.al 2000) has used a minimal insertion technique for breast cancer detection and classification using breast tissues with the help of electrical impedance septroscopy. Stastistical analysis has been performed on the features extracted and six classes has been obtained from the tissues. [8] (Li and Chen 2018) two dataset and five different classification algorithms are used for classifying the breast cancer. The performance metrics used for comparing the models include Area Under the Curve, F. Measure and Prediction accuracy. The algorithms used for classification of breast cancer are SVM, Neural Network, Random forest, Logistic Regression and Decision Tree. [9] (Hamsagayathri and Sampath 2017) discuss different classifier algorithms on decision tree with a breast cancer dataset. The various performance metrics for the algorithms are evaluated for the four decision tree algorithms (J48, REP Tree, Random Forest and Random Tree). SEER
506
J. A. Ruth et al.
Breast cancer dataset was used for classification, the dataset was preprocessed before classification, seven attributes were used for classification. Weka software has been used for classification. From the classification results it has been concluded that REPTree algorithm performs better when compared to the other algorithms. [10] (Rajaguru and Sannasi Chakravarthy 2019) proposed a classification technique for breast cancer with the help of two machine learning algorithms the Decision tree and the KNN method, WDBC dataset has been used for the classification. The performance of the two methods have been compared with the different performance metrics and the results have shown that KNN performs better than the Decision tree algorithm.[11] (Huang MW, Chen CW, Lin WC, Ke SW, Tsai CF 2017) asses the performance of SVM Classifier on small and large datasets under various performance metrics. The performance metrics for both the breast cancer datasets sets are compared. The results show that the ensemble SVMs perform better than single SVM. It is also evident from the results that for small datasets SVM ensembles with boosting and bagging method is suitable and RBF kernel based on boosting is suitable for large dataset. [12] (Zhuang et al. 2019) proposes GRA U-Net model for finding the tumor. Nipple segmentation is done to precisely locate the affected area. [13] (Zhuang et al. 2019) this work utilizes RDAU-NET for segementing the tumors. The results were promising for both benign and malignant tumors. The work highly supports in clinical judgement. These findings highlight the significance of machine learning algorithms and deep learning methods in identifying the breast cancer. Unlike other methods the proposed method relies on the features extracted from electrical impedance spectroscopy of breast tissues with distance based traditional machine learning algorithms SVM, kNN and Decision tree.
3 Methodology The block diagram of the proposed methodology is presented in Fig. 1 and the corresponding algorithm is provided in Algorithm 1.
Fig.1. Block diagram of the proposed methodology
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
507
3.1 Data Set Information and Feature Extraction For the work considered, the impedance was measured using Electrical impedance spectroscopy (ElIS) from six breast tissue types: connective, adipose, glandular, carcinoma, fibro-adenoma and masthopathy tissue. Of these tissues, the former three are normal tissue types whereas latter three are pathological. From the plots of ElIS a set of features that can represent or characterize the breast tissues are derived and are tabulated in Table 1. The derived features are good in differentiating the tissue categories [14]. Table 1. Description about the Breast tissue features S.No
Attributes
Description
1
I0
Impedivity (ohm) at zero frequency
2
PA500
Phase angle at 500 kHz
3
HFS
High-frequency slope of phase angle
4
DA
Impedance distance between spectral ends
5
AREA
Area under spectrum
6
A/DA
Area normalized by DA
7
MAX IP
Maximum of the spectrum
8
DR
Distance between I0 and real part of the maximum frequency point
9
P
Length of the spectral curve
These features are provided to the classifier algorithms with the class labels that understand the relationship between the features and class labels to build the model. The model is further used to predict the category of the unknown breast tissue sample. The process of the methodology is provided in Algorithm 1.
508
J. A. Ruth et al.
ALGORITHM 1 Input - Breast tissue Features Output- Class labels of Breast tissue Step 1: Load the data set Step 2: Perform two level classification on the input: Binary and multiclass Step 3: Separate the data set into two units for training and testing Step 4 Train the machine learning algorithms SVM, kNN and decision tree to build the classifier model for two level classification. Step 5. Classifier model is provided with random inputs to validate the models. Step 6: The performance of the models are measured using the metrics derived from the confusion matrix
3.2 Machine Learning Algorithms Support Vector Machine The Support Vector Machine (SVM) was first suggested by Vapnik [15] for classification problem. In SVM, classification is based on the interpretation of the hyper plane that classifies the data points into two classes. Any number of marginal planes can be drawn parallel to hyper plane. The SVM selects optimal marginal plane that has the maximum distance from the hyperplane and data points which are closet to the optimal marginal plane are called support vectors. The training data (features) are represented as {(a1 , b1 ), (a2 , b2 ) . . . ..(an , bn )}, where, a1 , a2 ,a3 ……an , denotes the data points (i.e) features in the training set andb1 ,b2 ,b3 upto bn represents the classes in the data set. The equation of the hyperplane [16] is represented as w.a + c = 0
(1)
Where, w = w1 ,w2 ….wn represents the weight vector and c denotes the bias. The binary classification can performed by decision formula given in the Eq. 2. D1(a) = sign(w · a + c)
(2)
In this method SVM- RBF kernel function maps the non-linear data points of the two classes present in the lower dimensional feature space to a high dimensional space so that the two sets of data points are separated by hyperplane. The kernel function converts a Non-linear classification problem to linear classification problem. If ϕ(a) represents the transformation function that changes the lower dimensional data points higher dimensional feature space then the kernel function [17] is represented
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
509
below in Eq. 3. k1 (a, b) = ϕ(a)ϕ(b)
(3)
Then decision formula is represented as n D1 (a) = sign αi bi k(ai , a) + c i=1
(4)
Where αi represented as Lagrange multipliers which minimizes the function. K-Nearest Neighbor (k-NN) The k-nearest neighbor (k-NN) [18] is a standard classifier introduced by Hart in 1967 to classify the features having closest similarities. It is a non- parametric supervised machine learning algorithm that classifies the unknown distance between two similar data points in the feature space. The Euclidean distance method is applied to calculate the distance between the similar data points in the feature space. Let a, b denote the data points in the feature space, the feature vector of ‘a and b’ is represented as a = (a1 , a2 . . . . . . an ) and b = (b1 , b2 , . . . . . . bn ), where n defines the size of the feature space. The Euclidean metric to calculate distance between two data points is represented as follows m 2 i=1 (ai − bi ) dist(a, b) = (5) √ n Decision Tree Decision trees [19] used for Classification tasks are called as Classification Trees. If the dataset to be classified has sufficient number of attributes, classification with decision tree will be accurate. In a decision tree initialy the data set to be classified is taken at the root node and the data is partitioned depending on the attribute selected. The selection of the attribute is based on the Entropy and Information gain. The attribute having the highest Information Gain is choosen to be the partitioning attribute. The algorithm is run recursively till no more partitioning is possible i.e. all the nodes are leaf nodes. A node with zero entropy is considered as the leaf node and the data is cassified. The decision trees are easy to build, debug and is highly suitable for numerical and categorical data classification. Information Gain Classification Uses Entropy C P(Xi ) logb P(Xi ) Entropy = − i=1
(6)
P = Probability C = Cases b = Base 2 Information Gain = IG (T, A) = Entropy(T ) −
v∈A
|T | · Entropy(VT ) T
(7)
510
J. A. Ruth et al.
Where, T = Target A = the variable (column) under test v = each value in A
4 Results and Discussion The experiment was carried to classify the breast tissue based on their internal structure using three different classifiers. The proposed method considered breast tissue dataset taken from UCI Machine Learning Repository. The dataset has 106 samples collected with six tissue classes that fall under normal and pathological tissue categories. The details of the number of samples with class names are provided in Table 2. Table 2. Dataset details
Total tissue samples 106
Pathology tissues 54
Normal tissues 52
Carcinoma 21 Fibro-adenoma 15 Mastopathy 18 Glandular 16 Connective 14 Adipose 22
From these tissue samples a set of 9 features were extracted (Table 1) which have good discrimination ability to recognize the tissue classes. To accurately identify the breast tissue, classification was done using the classifiers: SVM-RBF, k-NN, Decision Tree in two stages. (i)Binary classification: The tissue samples were classified as normal and pathological, (ii) Multi class classification: This was performed on normal and pathological tissue that resulted in identifying six classes. To validate the proposed method, feature set extracted from each of the 106 samples were divided into training and testing sets in the ratio of 3:2 for both classification stages. 4.1 Stage-1 Classification In the training phase the features were labeled with two classes y1 = {‘0’,’1’} where ‘0’ indicates Normal and ‘1’ indicate Pathological tissue. These features with class labels were provided to the machine learning algorithms to generate the classifier models.
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
511
During training the learning parameters were tuned to obtain the optimized models. The models were later tested with the features from the testing data set and the performance of the proposed method was assessed with the help of various standard metrics computed using the confusion matrix [20]. The metrics are dependent on True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) values of the confusion matrix. The metrics Specificity, Classification Accuracy (CA), Precision, Recall and F1-measure are calculated using the following (Table 3). Table 3: Confusion matrix
Actual class
Predicted class TP
FN
FP
TN
True classification Miss classification
Classification Accuracy = CA = Precision =
(TP) (TP + FP)
Recall (Sensitivity) = Specificity = F1 Score = 2 ∗
(TP + TN) (TP + TN + FP + FN)
(TP) (TP + FN)
(TN) (TN + FP)
(Precision ∗ Recall) (Precision + Recall)
(8) (9) (10) (11) (12)
The performance of stage 1 classification was assessed using the classification accuracy metric. The CA obtained from three classifier models is displayed in Fig. 2.
512
J. A. Ruth et al.
Decision Tree
100
kNN
95
SVM-RBF
95 92
94
96 98 100 Classification Accuracy(%)
102
Fig. 2. Classification accuracy of stage-1 classification
The results indicated the effectiveness of the classification models in classifying the Breast tissue samples as Normal and Pathological. 4.2 Stage-2 Classification The classification process was extended to next stage where the normal and pathological tissues were categorized into 6 classes. To carry out this process the feature set was again divided into training and testing sets with the ratio of 3:2 and the training set was labeled with six classes y2 = {‘0’,’1’,’2’, ‘3’, ‘4’, ‘5’,’6’} where ‘0’ indicates Carcinoma, ‘1’ Fibro-adenoma, ‘2’ Mastopathy, ‘3’ Glandular, ‘4’ Connective and ‘5’ Adipose respectively. Next the training and testing process were simulated similar to stage-1 classification. The performance was assessed using the metrics CA, Specificity, Precision, Recall and F1-score and are tabulated in Tables 4, 5 and 6. Table 4. Multi Class classification result of SVM-RBF classifier Classes
1
2
3
4
5
6
Classification accuracy (%)
95,2
93
92.8
100
98
95
Precision (%)
100
100
71.43
100
100
87
Recall (%)
78
25
83.33
100
80
87
Specificity (%)
100
100
94.4
100
100
97
F1-Score
0.88
0.4
0.769
1
0.89
0.87
PM
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
513
Table 5. Multi class classification result of kNN classifier Classes
1
2
3
4
5
6
Classificaion accuracy (%)
95
95
88
100
95
100
Precision (%)
87.5
80
80
100
71
100
Recall (%)
87.5
80
80
100
100
100
Specificity (%)
97
97
97
100
94
100
F1-Score
0.875
0.80
0.80
0.83
1
PM
1
Table 6. Multi class classification result of decision tree classifier Classes
1
2
3
4
5
6
Classification accuracy (%)
97
90
88
95
92
98
Precision (%)
100
80
40
75
100
87
Recall (%)
85
57
50
100
100
100
Specificity (%)
100
97
92
94
100
97
F1-Score
0.92
0.67
0.44
0.85
1
0.93
PM
The results of multiclass classification indicate that the proposed model was effective in discrimination the Breast tissue classes effectively with higher classification accuracies. Ideally the values of Precision and Recall is 100%. The three classifiers in the proposed work for multiclass classification provided significantly higher values of Precision and Recall. SVM-RBF provided an average precision of 93.07% and 76% while kNN classifier presented 86.41% and 80.33% respectively. Also Decision tree was also best in delivering 91.25% average Precision and 82% of average Recall for all the 6 classes. This signifies the perfection of the models in disticting the Breast tissues clearly. Further we also evaluated the models using F1 score which is the harmonic mean of Precision and Recall. F1 score indicates the robust ness of the classifier, an higher value of F1 obtained from our models implies the lesser number of false positives and false negatives that reduce the misclassifications.
5 Conclusion The paper proposes a 2 stage classification framework with distance based classifiers for classifying a Breast tissue into normal and pathological in stage-1 and further classifying the result of stage-1 into six categories at stage-2 to identify Breast cancer. The work considered the breast tissue dataset where the features were extracted from Electrical Impedance Spectroscopy of Breast tissues. These features or characteristics were provided as input to the SVM-RBF, kNN and Decision tree machine learning algorithms
514
J. A. Ruth et al.
for classification and assessment of the models. The work carried out using the breast tissue dataset with the 2 stage classification frame work exhibited the the potency of the impedance measurement in better discrimination of the tissue classes. The experimental results obtained and their analysis indicated the better performance of the methodology in terms of Classification accuracy, specificity, Precision, recall and F1 score. The significantly higher values of Precision, Recall and F1 score highlighted the reduction in mis classifications leading to lower false alarms.
References 1. Jossinet, J.: Variability of impedivity in normal and pathological breast tissue. Med. Biol. Eng. Comput. 34, 346–350 (1996) 2. Sahran, S., et al.: Machine learning methods for breast cancer diagnostic. In: Bulut, N. (ed.) Breast Cancer and Surgery (2018). https://doi.org/10.5772/intechopen.79446 3. Oliver, A., et al.: A Novel Breast Tissue Density Classification Methodology. IEEE Trans. Inf Technol. Biomed. 12(1), 55–65 (2008). https://doi.org/10.1109/titb.2007.903514 4. Huang, M.-W., Chen, C.-W., Lin, W.-C., Ke, S.-W., Tsai, C.-F.: SVM and svm ensembles in breast cancer prediction. PLoS ONE 12(1), e0161501 (2017). https://doi.org/10.1371/jou rnal.pone.0161501 5. Chagovets, V.V., et al.: Validation of breast cancer margins by tissue spray mass spectrometry. Int. J. Mol. Sci. 21(12), 1–11 (2020) 6. Truong, B.C.Q., Tuan, H.D., Fitzgerald, A.J., Wallace, V.P., Nguyen, T.N., Nguyen, H.T.: Breast Cancer classification using extracted parameters from a terahertz dielectric model of human breast tissue. In: 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), November 2015, pp. 2804–2807 (2015) 7. Estrela Da Silva, J., Marques De Sá, J.P., Jossinet, J.: Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38(1), 26–30 (2000) 8. Li, Y.: Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl. Comput. Math. 7(4), 212 (2018). https://doi.org/10.11648/j.acm.20180704.15 9. Hamsagayathri, P., Sampath, P.: Decision tree classifiers for classification of breast cancer. Int. J. Curr. Pharm. Res. 9(2), 31 (2017) 10. Rajaguru, H., Sannasi Chakravarthy, S.R.: Analysis of decision tree and k-nearest neighbor algorithm in the classification of breast cancer. Asian Pac. J. Cancer Prev. 20(12), 3777–3781 (2019). https://doi.org/10.31557/APJCP.2019.20.12.3777.1 11. Huang, M.W., Chen, C.W., Lin, W.C., Ke, S.W., Tsai, C.F.: SVM and SVM ensembles in breast can prediction. PLoS ONE 12(1), 1–14 (2017). https://doi.org/10.1371/journal.pone. 0161501 12. Zhuang, Z., et al.: Nipple Segmentation and Localization Using Modified U-Net on Breast Ultrasound Images. J. Med. Imaging Health Inform. 9(9), 1827–1837 (2019). https://doi.org/ 10.1166/jmihi.2019.2828 13. Zhuang, Z., Li, N., Raj, A.N.J., Mahesh, V.G.V., Qiu, S.: An RDAU-NET model for lesion segmentation in breast ultrasound images. PLoS ONE 14(8), 1–23 (2019). https://doi.org/10. 1371/journal.pone.0221535 14. http://archive.ics.uci.edu/ml/datasets/breast+tissue 15. Vapnik, V.N.: The Nature of Statistical Learning Theory, 1st edn. Springer, New York (1995). https://doi.org/10.1007/978-1-4757-2440-0 16. Nedra, A., Shoaib, M., Gattoufi, S.: Detection and classification of the breast abnormalities in digital mammograms via linear support vector machine. In: 2018 IEEE 4th Middle East Conference on Biomedical Engineering (MECBME) (2018). https://doi.org/10.1109/ mecbme.2018.8402422
A Hierarchical Machine Learning Frame Work to Classify Breast Tissue
515
17. Sewak, M., Vaidya, P., Chan, C.-C., Zhong-Hui, D.: SVM approach to breast cancer classification. In: Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007) (2007). https://doi.org/10.1109/imsccs.2007.46 18. Nusantara, A.C., Purwanti, E., Soelistiono, S.: Classification of digital mammogram based on nearest-neighbor method for breast cancer detection. Int. J. Technol. 7(1), 71–71 (2016). https://doi.org/10.14716/ijtech.v7i1.1393 19. Yi, L., Yi, W.: Decision tree model in the diagnosis of breast cancer. In: 2017 International Conference on Computer Technology, Electronics and Communication (ICCTEC) (2017). https://doi.org/10.1109/icctec.2017.0004 20. Marom, N.D., Rokach, L., Shmilovici, A.: Using the confusion matrix for improving ensemble classifiers. In: 2010 IEEE 26-Th Convention of Electrical and Electronics Engineers in Israel (2010). https://doi.org/10.1109/eeei.2010.5662159
An Improved Method for Removing the Artifacts of Electrooculography Huimin Zhao1 , Chao Chen1 , Abdelkader Nasreddine Belkacem2 , Jiaxin Zhang1 , Lin Lu3(B) , and Penghai Li1 1 Tianjin University of Technology, Tianjin 300384, China 2 Department of Computer and Network Engineering, College of Information Technology,
UAE University, Al Ain 15551, UAE 3 Zhonghuan Information College Tianjin University of Technology, Tianjin 300380, China
Abstract. In order to solve the problem that the removal effect of EEG artifacts is not ideal in EEG preprocessing, PCA-JADE-ARX is used in this paper. Firstly, PCA is used to select the number of components, and then JADE method is used to remove the artifacts. Based on the results of removing the artifacts, ARX model is estimated to find the optimal model and complete the correction of JADE results. Finally, clean and reliable real EEG signals are restored. The relative error, stability and effectiveness of the method are improved, which shows the practical application of the method. Keywords: EEG signal · PCA · JADE · ARX
1 Introduction EEG is a weak bioelectricity signal, usually. The analysis of EEG is helpful to the research of psychology, life science and medicine [1, 2]. Electroencephalogram (EEG) signals are easily interfered by other physiological signals, among which EOG artifacts (including blinking artifacts, eye movement artifacts, etc.) are the most serious [3, 4]. At present, we can use regression, principal component analysis (PCA) and independent component analysis (ICA) to remove artifacts. However, there are many errors in a single method, among which regression method is easy to remove the normal EEG signal by mistake [5], PCA can’t completely separate the potential noise with similar waveformx [6]. The artifacts separated by ICA will more or less have weak EEG activity. This paper mainly studies and analyzes the problem of artifacts elimination in EEG signal preprocessing. PCA is used to select the number of components, and then jade method is used to remove the artifacts. Based on the results of removing the artifacts, ARX model is estimated to find the optimal model and complete the correction of jade results. Finally, clean and reliable real EEG signals are restored.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 516–525, 2022. https://doi.org/10.1007/978-981-16-6554-7_57
An Improved Method for Removing the Artifacts of Electrooculography
517
2 Methods 2.1 Main Component Analysis Principal component analysis (PCA) is a classical method in statistical signal analysis. As one of the most commonly used data dimension reduction algorithms, PCA can reduce the dimension of data [8]. Let the original observation signal matrix of M channels and n variables be Xm × n, where the matrix V is the covariance matrix. The eigenvectors F = [F1, F2, ..., Fm] corresponding to the covariance matrix V and all the eigenvalues λ1, λ2, λ3,…,λm of the covariance matrix V can be obtained by formula calculation, and all the eigenvalues λ1 ≥ λ2 ≥ λ3 ≥,…, ≥ λm are arranged in the order from large to small. At this time, the corresponding m characteristic signals areY = [y1 , y2 , y3 , ..., ym ]T . They satisfy the following formula: ⎡ ⎤ ⎡ ⎤ ⎤ F1 ⎡ y1 ⎢ y2 ⎥ ⎢ F2 ⎥ x11 x12 · · · x1n ⎢ ⎥ ⎢ ⎥⎢ x21 x22 · · · x2n ⎥ ⎥ ⎢ y3 ⎥ ⎢ F3 ⎥⎢ (1) ⎢ ⎥ = ⎢ ⎥⎢ . . . . ⎥ ⎢ . ⎥ ⎢ . ⎥⎣ .. .. .. .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ xm1 xm2 · · · xmn ym Fm ⎡ ⎤ ⎡ ⎤ λ1 F1 ⎢ λ2 ⎥ ⎢ F2 ⎥ ⎢ ⎥ ⎢ ⎥ (2) ⎢ ⎥ = ⎢ . ⎥V F1 F2 · · · Fm . .. ⎣ ⎦ ⎣ .. ⎦ λm
Fm
Where (1) is equivalent to Y = F T X, and (2) is equivalent to Λ = F T VF, In the formula, each row of Y represents the principal component of the original observation vector, in which the first principal component is y1 , the second principal component is y2 , and so on, the m-th principal component is ym .The definitions are as follows: (1) Contribution rate: the proportion of the eigenvalue corresponding to the i-th principal component in the sum of all the eigenvalues of the covariance matrix. The calculation formula is as follows:
n λi (3) αi = λi i=1
(2) Cumulative contribution rate: the proportion of the sum of the eigenvalues of the first j principal components in the sum of all eigenvalues. The calculation formula is as follows:
n j λi λi (4) Mj = i=1
i=1
In addition, the principal components are not correlated with each other, and they are arranged according to the size of energy, which is also the characteristics and advantages of principal component analysis [9].
518
H. Zhao et al.
2.2 Independent Component Analysis Independent component analysis (ICA) is a blind source separation (BSS) signal processing method. The basic meaning of ICA is to decompose the multi-channel observation signal into several independent components by optimization algorithm under the assumption that the source signal is statistically independent [10]. The basic independent component analysis model is as follows: x(t) = As(t)
(5)
Among them, x(t) = [x1 (t), x2 (t),…, xi (t)]T is the i-dimensional observation signal vector, s(t) = [s1 (t), s2 (t),…,sj (t)]T is the unknown j-source signal vector, A is the unknown i × j mixed matrix. The basic ICA model is extended, assuming that r(t) is an i-dimensional noise vector,In this case, the extended noise ICA model can be expressed as: x(t) = As(t) + r(t)
(6)
Under some assumptions, ICA model can be identified, that is, the mixed matrix and independent components can be estimated, but there are still some uncertainties [11]. The Blind Source Separation algorithm is an adaptive batch ICA optimization algorithm based on fourth-order cumulants. The sampling JADE algorithm can draw the corresponding brain topographic map according to the independent components under the condition of independent source signals, and then determine and delete the artifacts of the electrooculogram, so as to realize the separation of the electrooculogram interference and the real EEG signals [12]. The basic process of JADE algorithm: first, spheroidize the original signal data x with spheroidization matrix A to get z, so that the variance of each component contained in z is 1, Then, the set of fourth-order cumulant matrices of z is constructed, and the estimation of unitary matrix U = WA is obtained by joint diagonalization,The estimation of mixed matrix B = AUT is transformed into the determination of unitary matrix U. 2.3 An Improved Method In the process of removing artifacts, it is an effective method to use PCA method in EEG signal preprocessing. However, in the process of removing the artifacts based on ICA, if we can not accurately decompose the source signals of the artifacts and the EEG signals into different independent components completely and effectively. In view of the shortcomings of the above methods, this study takes principal component analysis as a preprocessing method, uses the extended similar diagonalization algorithm of independent component analysis to remove the artifacts, and combines with ARX model (auto regressive with extra) In order to recover the real EEG signal, the EEG signal obtained after JADE removing the artifacts was corrected by input. EEG signal correction method of ARX: Due to the shortcomings of JADE algorithm to remove the artifacts, the data analysis process of this study combined with the extended autoregressive model(ARX) to correct the EEG signal obtained after jade algorithm to remove the artifacts, in order to restore the clean and reliable real EEG signal.
An Improved Method for Removing the Artifacts of Electrooculography
519
Basic concepts of ARX model:In the correction process of EEG signal processing in this experiment, ARX model is selected to realize system identification. Based on the input and output observation data, ARX model is based on the least square criterion to determine the structure and coefficients of the model. The structure of ARX model is as follows: A(x−1 )Z(h) = B(x−1 )S(h) + e(h)
(7)
In the above formula, represents the input and output of the system respectively, which is white noise, and x −1 is the backward shift operator (U +
np i=1
Ai x−i ) · Z(h = (U +
nq
Bi x−i )S(h) + e(h)
(8)
i=1
Where U is nz × nz dimensional identity matrix, A is nz × nz dimensional matrix, and B is nz × ns dimensional matrix. nz„ ns is the input dimension and output dimension of the system, and the order of ARX model is called np, nq . The selection of ARX model order: In this paper, we find an estimation model with smaller value of Bayesian information criterion (BIC) to determine the order of the model. The BIC value is determined by the following equation: BIC = −2 ln Lmax + h ln N
(9)
Where, L max is the maximum likelihood function obtained by ARX model, the number of data estimates is n, and the number of parameter estimates is K. In the data processing of this study, the order of ARX model will be automatically selected by BIC. And this signal processing process simply set the same value. The basic idea of the method based on JADE-ARX is that on the basis of JADE removing the artifacts, ARX model estimation is used to correct the processing results of the EEG signal reconstructed by JADE [13], so as to restore the clean and reliable real EEG signal. The specific steps of JADE-ARX are as follows: (1) The original EEG signal s was processed by JADE to get the EEG signal s* after JADE processing. (2) The clean segment s2 * after JADE processing is used as the input of ARX model, and the clean segment s2 * of the original EEG signal is used as the output of ARX model. The strategy of establishing ARX model: a clean input channel corresponds to an ARX model. If the number of channels of the original EEG signal is N, there will be N ARX models corresponding to the number of channels of the marked eye artifacts, and each model corresponds to an output. (3) After modeling, there will be n corresponding models for the marked artifact channel. Then we take the "clean segment" EEG signal s2 * processed by JADE of each channel as the output of each model, and do correlation calculation with the "clean segment" EEG signal s2 of each channel of the original EEG signal respectively, and get n corresponding correlation coefficients (CC). At the same time, we select the model corresponding to the maximum correlation coefficient as the optimal
520
H. Zhao et al.
ARX model. Finally, the EEG signal s* after JADE processing is taken as the input of the optimal ARX model, and the model output is taken as the final corrected EEG signal to complete the whole JADE-ARX EEG signal correction process. Based on the above steps, ARX model estimation is used to establish the optimal model to complete the correction of EEG signals and obtain clean EEG signals.
3 Experimental Results and Comparative Analysis 3.1 Experimental Data The experimental data were EEG data of finger movement execution and motor imagery collected in Capital Medical University. The acquisition equipment was EGI EEG analyzer. The sampling requency was 1000 Hz, and 128 lead data were recorded. Each subject had 120 trials of finger movement execution and motor imagery, and 40 leads of forehead were intercepted with the most obvious interference of eye artifacts. The length of each trail time window was 5s, including 4 The time domain waveform of trials (20 s) is shown in Fig. 2.
Fig. 1. Original EEG waveform
It can be seen from Fig. 1 that lead e1–e30 electrode was interfered by obvious blinking artifacts. During the experimental trail time of 3.5 s, the recorded level amplitude was very large, reaching more than 100 µV, and then the blinking interference spread to each lead, especially the electrode located in the forehead. 3.2 Experimental Result After principal component analysis, the eigenvalues of each principal component and the corresponding variance contribution rate can be calculated. The histogram of cumulative contribution distribution of principal component components is shown in Fig. 3, where the abscissa represents the order of cumulative components and the ordinate represents the percentage of cumulative energy. Abscissa 1 shows the sum of the energy contribution rates of all 30 principal components, and its value is 1, that is 100%. Abscissa
An Improved Method for Removing the Artifacts of Electrooculography
521
2 shows the sum of the variance contribution rates of 29 principal components except the first principal component, and its value is about 0.68, that is 68%. And so on. It can be seen from Fig. 3 that the energy of the first 16 principal components is almost 99% (98.87%). Therefore, the first 16 principal components are used for independent component analysis. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Fig. 2. Principal component cumulative contribution distribution histogram
In addition, by solving the covariance matrix of the principal component matrix, it can be seen that the principal components are not correlated with each other, that is, the decorrelation of the principal components is realized. Figure 3 and Fig. 4 record the spatial and temporal distribution of 16 independent source components separated by JADE algorithm. Figure 3 is the time series waveform of ICA decomposition components, and Fig. 4 is the spatial distribution EEG map of ICA decomposition components. According to Fig. 3, ICA1 contains obvious blink artifacts, while ICA2 has obvious horizontal eye movement artifacts, but there is no information at other time points. In addition, the EEG topographic map and local enlarged map of Fig. 5 can be used to judge, because it can reflect the pattern distribution of energy size of each independent component in brain space. We can see that the spatial pattern of ICA1 is located in the forehead, which is consistent with the location characteristics of blink artifacts, while the horizontal eye movement artifacts in ICA2 are located in the forehead and eyes. This is consistent with the introduction that the artifacts are mainly distributed near the eyes in the prefrontal brain region. In order to preserve the information of EEG as much as possible, the independent source components determined as artifacts are zeroed, and then the remaining independent source components are projected back to the spatial position of the original EEG signal to reconstruct the EEG signal after removing the artifacts.The figure is as follows: In this paper, the EEG signal before the interference of eye artifacts is regarded as a clean standard EEG signal, which is used as a reference signal for quantitative relative error calculation. Based on the result of JADE algorithm, the ARX model is used to correct the result of JADE algorithm. The EEG data collected in the experiment were
522
H. Zhao et al.
Fig. 3. Time domain waveforms of 16 independent source components
IC1
IC2
Fig. 4. The spatial distribution map of 16 independent source components and EEG topographic map of specific ICA1 and ICA2
set threshold (100 µV) combined with expert recognition. Nine of the 11 subjects were selected to remove the artifacts. All subjects intercepted the time window data with a length of 5s (sampling frequency of 200 Hz). Each subject’s data segment contained obvious interference of electrooculogram artifact, and there were 2.5 s clean EEG signal bands before the interference of electrooculogram artifact. In this paper, the 2.5s segment of the 5s EEG before the interference of the eye artifact is marked as the standard clean EEG signal, and the relative error is calculated quantitatively and the ARX model is estimated. Figure 5 shows the removal results of eye artifacts of JADE and PCA-JADE-ARX of 30 channel electrodes in the prefrontal brain area of one of the nine subjects. The area representing the data band of eye artifacts in the figure has been marked by vertical lines. Figure 5 (a) shows the original EEG signal including the interference of the eye electric artifact. It can be seen that there is obvious eye electric artifact interference near the time point of 3.5s, and the EEG signal has been covered by the eye electric artifact interference. Figure 5 (b) and (c) show the reconstructed EEG signals of JADE and PCA-JADE-ARX after removing the artifacts. By observing Fig. 5 (b) and (c), it can be seen that JADE and PEA-JADE-ARX can effectively remove the interference of eye artifacts and get relatively clean EEG signals. In this paper, the average relative error calculation and single population t-test were carried out for 9 subjects to remove electrooculogram artifact. Table1 records the average
An Improved Method for Removing the Artifacts of Electrooculography
523
Fig. 5. The waveforms of the lead channels of the prefrontal EEG regions and the EEG signals after the removal of PCA-JADE-ARX (the area surrounded by the red dotted line is the time period of the interference of the eye electrical artifacts)
relative error and single population t-test results of JADE and PCA-JADE-ARX methods for removing the artifacts of 9 subjects.Among them, the black dotted line is the original EEG signal containing the artifacts of eye electricity; the blue dotted line and the red solid line represent the waveforms of JADE and PCA-JADE-ARX respectively, and the upper left block diagram shows the relative error values of JADE and PCA-JADE-ARX respectively. The subgraph indicated by the green arrow is the enlarged detail of the 2.5–4.5 s data segment framed by the green dotted line. The relative error values in Table1 are the average relative error values and standard error values of the 40 lead channels distributed in the prefrontal brain area. When the statistical difference of the relative error values p < 0.01 indicates the difference and its significance, it is marked with the sign “※”; when the statistical difference of the relative error values p < 0.05 indicates the difference and its significance, it is marked with the sign “#”. By observing Table1, it can be concluded that the average relative error of PCA-JADE-ARX is less than that of JADE, and t-test results show that the average relative error of PCA-JADE-ARX is significantly less than that of JADE (p < 0. 05).
524
H. Zhao et al. Table. 1. Average relative error and single sample t-test results of 9 subjects
Subject serial number
Relative error value JADE
PCA-JADE-ARX
1
0.7556 ± 0.2375
0.2374 ± 0.1682
#
2
0.8752 ± 0.1844
0.3140 ± 0.2167
※
3
0.9884 ± 0.4292
0.4214 ± 0.2020
#
4
1.0573 ± 0.1385
0.2442 ± 0.1961
※
5
0.8986 ± 0.4411
0.3536 ± 0.2584
#
6
0.8304 ± 0.4922
0.3059 ± 0.2263
※
7
1.0482 ± 0.6026
0.3796 ± 0.2017
※
8
0.8768 ± 0.4364
0.3363 ± 0.2352
※
9
1.0161 ± 0.5175
0.3307 ± 0.1887
#
Average
0.9274 ± 0.9893
0.3247 ± 0.5057
※
4 Conclusion In this paper, the experimental data collected in Beijing Anding Hospital were used to compare and analyze the effects of JADE and PCA-JADE-ARX methods in removing the artifacts. The results show that the EEG signal obtained by PCA-JADE-ARX method is more synchronous and consistent with that obtained by jade method in time domain. At the same time, the relative error between the EEG signal obtained by PCA-JADEARX and the real EEG signal is less than that between the EEG signal obtained by jade and the real EEG signal. The t-test method also proves the statistical superiority of PCA-JADE-ARXmethod in removing the artifacts. Finally, through the analysis of the experimental data, it can be concluded that PCA-JADE-ARX method can keep the complete EEG data information as much as possible on the basis of ensuring the full removal of the artifacts. Acknowledgment. This work was financially supported by National Natural Science Foundation of China (61806146), Natural Science Foundation of Tianjin City (17JCQNJC04200), Natural Science Foundation of Tianjin City(18JCYBJC95400, 19JCTPJC56000), National Key R&D Program of China (2018YFC1314500) , Tianjin Key Laboratory Foundation of Complex System Control Theory and Application (TJKL-CTACS-201702) and Young and Middle-Aged Innovation Talents Cultivati on Plan of Higher Institutions in Tianjin.
References 1. Yang, C.: Application research of independent component analysis and wavelet threshold in epileptic EEG signal denoising. Northwest University, Xian (2017) 2. Li, M., Liu, F.: Automatic recognition and removal of EEG artifacts in EEG. Biomed. Eng. Beijing 37(6), 559–565 (2018)
An Improved Method for Removing the Artifacts of Electrooculography
525
3. Burger, C., van den Heever, D.J.: Removal of EOG artefacts by combining wavelet neural network and independent component analysis. Biomed. Signal Process. Control 15, 67–79 (2015) 4. Li, H., Kumar, N., Chen, R., et al.: Deep reinforcement learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018) 5. Jung, T.P., Humphries, C., Lee, T.W.: Removing electroen-cephalographic artifacts:comparison between ICA and PCA. Neural Netw. Signal Process. 63 (1998) 6. Cichocki, A., Amari, S.: Adaptive blind signal and image processing learning algorithms and applications. Wiley, New York (2002) 7. Kierkels, J.: A model-based objective evaluation of eye movement correction in EEG recordings. IEEE Trans. Biomed. Eng. 53, 246–253 (2006) 8. Karhunen, J.: Generalization of principal component analysis, optimization problems and neural networks. J. Neural Netw. 8(4), 549–562 (1995) 9. Hyvarinen, A., Oja, A.: Independent component analysis: algorithms and applications. Neural Netw. 13(4/5), 411–430 (2000) 10. Menglin, L., Jing, C., Shaofei, C., et al.: A new reinforcement learning algorithm based on counterfactual experience replay. In: 2020 39th Chinese Control Conference (CCC), pp. 1994– 2001. IEEE (2020) 11. Kezhong, J.Y., Weiguo, G., Weihong, L., Yixiong, L.: Face recognition method based on wavelet transform and ICA. J. Instrument. (2005) 12. Cardoso, J.F.: Blind signals processing: statistical principles. Proc. IEEE 86(10), 2009–2025 (1998) 13. Sha, F., Ping, L.: Adaptive predictive functional control based on time-varying ARX model. Control Eng. 4 (2017)
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis Yating Wu1 , Qingshan She1(B) , Hongan Wang1 , Yuliang Ma1 , Mingxu Sun2 , and Tao Shen2 1 School of Automation, Hangzhou DianZi University, Hangzhou 310018, China
[email protected] 2 School of Electrical Engineering, University of Jinan, Jinan 250022, Shandong, China
Abstract. It is challenging to understand the intermuscular coupling from surface-electromyogram (sEMG) signals in the study of human movements. A mutual information (MI) estimation method is proposed based on R-Vine Copula, and applies it to analyze intermuscular coupling during the movement of upper limbs. The probability density function (PDF) of sEMG signal may be a kind of “peak fat tail” distribution. R-Vine Copula is more flexible and effective in depicting the muscle dependent structure (the Vuong test, p < 0.05). There is high coupling strength between the triceps brachii (TB) and posterior deltoid (PD), while the coupling degree between the biceps brachii (BB) and other muscles is relatively low. The intermuscular coupling between the subjects is significantly correlated (p < 0.05). RVCMI provides a new research method and theoretical basis for intermuscular coupling analysis, and has a good application prospect. Keywords: sEMG · Intermuscular coupling · R-Vine copula · Mutual Information
1 Introduction The neuromuscular system is highly complex. Studies have shown that the primary motor cortex located in the precentral gyrus of the frontal lobe receives the input from several brain areas which aid in planning movement, and its principle output stimulates spinal cord neurons and skeletal muscle contraction [1]. However, how the central nervous system (CNS) coordinates complex movements, each comprising a large number of muscles and joints, to attain a behavior goal is still a fundamental question. For example, a smooth reaching movement may be a result of coordinating activities of shoulder flexors, elbow extensors, and posture-supporting muscles. As sEMG signals are the measure of summed activity of a number of motor unit action potentials (MUAP) near the electrode, they can accurately reflect the state and function of intermuscular coupling in real time. The research on the characteristics of intermuscular coupling will be helpful to explore the potential regulation of CNS on muscles.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 526–534, 2022. https://doi.org/10.1007/978-981-16-6554-7_58
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis
527
At present, many methods were used to analyze the coupling of two and multiple time series, such as coherence, mutual information (MI), S estimator, and global synchronization index (GSI). The coherence method quantifies the linear correlation between two time series in the frequency domain, but it does not consider the own non-linear properties of signals. For MI, it calculates the own and joint probability density distribution (JPDF) of two time series, and quantifies the linear and non-linear statistical independence between two time series by computing various entropies. However, it is difficult to estimate the JPDF of non-Gaussian distribution data, which hinders its practical application of MI [2]. The S estimator is a synchronization method based on state space, which calculates the synchronization strength by analyzing the interdependent relationship among multiple signals in state space reconstruction domain. Unfortunately, it does not consider adequately the effect of random and artifact component, and the accuracy of calculation has yet to be improved. The GSI can improve the performance of the S estimator, in order to analyze the multi-dimensional neural series. However, based on covariance matrix, it does not estimate the non-linear correlation of multiple time series, and is also subject to the interference of noise to some extent. Lately, Ince et al. [3] combined the statistical theory of Copulas with the closed form solution for the entropy of Gaussian variables, and presented a practical Gaussian Copula Mutual Information (GCMI) estimation method. However, the single Gaussian Copula function has a single form which can only describe the symmetrical correlation structure, and the model fitting is easy to be distorted. As a result, it cannot accurately describe the functional coupling characteristics between neural signals. Pair Copula constructions are flexible representations of the dependence underlying a multivariate distribution. The principle of pair Copula models is to decompose the multivariate distributions into conditional distributions, and describe these conditional distributions through bivariate Copula modeling dependence of two variables at a time. The special pair Copula construction is called regular Vine (R-Vine) Copula, which has higher flexibility enables it to model a wider range of complex multivariate dependence [4]. Inspired by the GCMI method, this paper first proposes a new method for MI estimation using R-vine Copula. It inherits the merits of R-Vine Copula and can be directly applied to arbitrary data size. Then, we applied our approach to the sEMG signals to uncover the linear and non-linear characteristics of intermuscular coupling.
2 R-Vine Copula Mutual Information In probability theory and information theory, MI can measure the degree of linear or nonlinear interdependence between two random variables, and reflect the amount of information one random variable carries with another. The MI between two random variables X and Y is defined as ¨ f (x, y) )dxdy (1) MI (X , Y ) = f (x, y) log( f (x)f (y) where f (x, y) is the joint probability density function, f (x) and f (y) are the marginal probability density functions (MPDF) of X and Y , respectively. When logarithmic operations are based on 2, the unit of MI is bit. According to the definition, MI is a non-negative
528
Y. Wu et al.
correlation statistic, MI(X , Y ) ≥ 0. If and only if X and Y are independent of each other, MI(X , Y ) = 0. The larger MI is, the more information X and Y share. However, the estimation of MI is heavily dependent on the precise representation of the MPDF and JPDF of the variable. Fortunately, Ma and Sun et al. [5], based on the Sklar’s theorem, proved through derivation that MI is essentially a kind of Copula entropy, as shown in Eq. (2), which provides a new idea for understanding and estimating MI. MI (X , Y ) = −Hc (F(x), F(y))
(2)
where Hc (F(x), F(y)) is called the two-dimensional Copula entropy,F(x) and F(y) are marginal distribution functions for X and Y , respectively, and c is the Copula density function. Ince et al. [3] estimated MI based on Copula entropy and called it Copula MI, which is convenient for us to understand still used it in this paper. Extended to the high-dimensional case, Eq. (2) can be written as MI (X1 , X2 , ..., XN ) = −Hc (F(x1 ), F(x2 ), ..., F(xN ))
(3)
It can be seen from Eqs. (2) and (3) that the key to estimating the Copula MI lies in the Copula entropy, which can generally be obtained directly by multiple integration method. However, when there are too many variables, the multiple integration method is more difficult. Therefore, this paper uses Monte Carlo simulation to simulate Copula entropy Hc (F(x1 ), F(x2 ), ..., F(xN )) = −E[log c(F(x1 ), F(x2 ), ..., F(xN ))]
(4)
where E[*] is mathematical expectation. In addition, considering that the Copula density function with different mathematical forms has different fitting effects on the data, it may lead to different analysis results. There are mainly two types of Copula functions: elliptic Copula and Archimedean Copula. However, when the variable dimension is too high, these traditional high-dimensional Copula functions are faced with the problems of dimension disaster, low model accuracy and poor flexibility in parameter estimation. The R-Vine Copula model proposed by Bedford and Cooke et al. [6] based on the idea of graph theory can effectively avoid the above troubles by decomposing the high-dimensional Copula function into a series of pair Copula functions’ products. The two simplest types of Vines in the R-vine Copula model include C-vine and D-vine, which have different logical structures. C-vine is a star structure while D-vine is a parallel structure, which are suitable for different kinds of data sets. For higher dimensional data variables, Morales-Napoles et al. [7] have pointed out that R-Vine in general form has more flexible and diverse dependent structures than C-Vine and D-Vine. A typical R-Vine is composed of tree, node and edge. There are several nodes in each layer of tree. Each node represents a variable or condition variable. The connection between nodes is called edge, and each edge represents a pair Copula composed of two adjacent nodes. On the basis of directional graph, Diβmann et al. [8] proposed a concept called R-Vine matrix (RVM), which made the R-Vine Copula more convenient to calculate and simulate on computer. However, RVM is not uniquely deterministic. For a given N-dimensional R-Vine, there are 2N −1 different R-Vines. In order to determine
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis
529
the most appropriate R-Vine Copula structure model, we adopt the maximum spanning tree (MST) algorithm proposed by Brechmann et al. [9]. The key of this algorithm is to ensure the maximum sum of Kendall’s τ absolute values of each tree node in R-Vine structure. After the RVM is determined, the pair Copula function in RVM needs to be further determined. Then, Akaike’s Information Criterion (AIC) is used to select the best pair Copula function from a large number of Copula functions given by AIC = −2 ln(L) + 2k
(5)
where k is the number of parameters in pair Copula function, and L is the maximum likelihood function value of pair Copula function. The smaller the AIC value, the better the fitting degree of R-Vine model. Combined with Eq. (3), the R-vine Copula MI proposed in this paper can be simply expressed as RVCMI (X1 , X2 , ..., XN ) = −HcR−Vine (F(x1 ), F(x2 ), ..., F(xN ))
(6)
where cR - Vine is R-vine Copula density function, and the maximum likelihood estimation method is used to estimate the pairing parameters in each pair Copula function.
3 Data Description and Preprocessing 3.1 Data Description The data of this experiment comes from the related work of Israely et al. For detailed information on the experimental process and protocol, please refer to the paper [10]. A total of 5 healthy subjects (H1 ~ H5) participated in the experiment, with an average age of less than 76 years old. Each subject was required to sit in front of the table and put his forearm in a comfortable position. According to the activated voice prompt every 10 s, the subjects carried out five reaching actions toward the target in front, and then the subjects took a 10s rest after each reaching task. The sEMG signals were collected simultaneously from eight upper limb muscles (upper trapezius (UT), anterior deltoid (AD), medial deltoid (MD), posterior deltoid (PD), pectoralis major (PM), infraspinatus (IN), biceps (BB) and triceps (TB) of each participant with a sampling rate of 2000Hz, and the recorded muscle position is shown in Fig. 1.
Fig. 1. Recorded muscle positions.
530
Y. Wu et al.
3.2 Preprocessing Since the sEMG signal is very weak and the high frequency band is very susceptible to noise interference, it is necessary to clean the collected sEMG signal. In the preprocessing stage, five data segments with 2.5 s time window each were first extracted manually, and the mean value and baseline drift were removed. Furthermore, the second-order IIR Notch Filter was used to suppress 50 Hz power frequency interference, and the fourthorder Butterworth band-pass filter was applied (5–200 Hz). In order to facilitate the subsequent analysis, the pure sEMG signals obtained after pretreatment were averaged according to the experimental trials.
4 Results
F(x)
sEMG signals Quantiles
The non-parametric kernel density estimation method has excellent properties such as asymptotic unbiasedness and mean square consistency, so it used to estimate the marginal cumulative distribution function (MCDF) of 8-channel sEMG signals. The kernel function was Gaussian kernel, and the window width was determined by empirical rules:hn ≈ 1.06σˆ −1/ 5 . To better understand the distribution characteristics of sEMG, the Q-Q diagram is used to test the Gaussian property of 8-channel sEMG signals before estimating the MCDF. The results are shown in Fig. 2 (a). It can be seen from Fig. 2 (a) that the quantile of 8-channel sEMG signals is not in a straight line and far away from the standard Gaussian quantile. Moreover, it is warped outside the two diagonals and the tail is thicker, indicating that the sEMG signal does not follow the Gaussian distribution. The MCDF estimation results are shown in Fig. 2 (b). As can be seen from Fig. 2 (b), the cumulative distribution function curve of 8-channel sEMG signals increases monotonically and is very smooth. Most of the amplitudes (-0.5 × 10–4 ~ 0.5 × 10–4 ) are concentrated on both sides of the zero mean value and symmetrical distribution. Further calculation of the kurtosis of 8-channel sEMG signal shows that the average kurtosis value is 5.6780, larger than the kurtosis value of Gaussian distribution (3), and the standard deviation is 0.8502, indicating that the sEMG signal has “peak fat tail” distribution phenomenon.
Standard Normal Quantiles
(a) Q-Q diagram
Standard Normal Quantiles
(b) MCDF estimation
Fig. 2. Gaussian test and MCDF estimation of 8-channel sEMG signals.
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis
531
Before estimating the parameters of R-Vine Copula and characterizing the intermuscular dependent structure by R-Vine Copula, the MCDF of the input sEMG signal is required to obey (0,1) uniform distribution. Therefore, we use the single sample Kolmogorov Smirnov (KS) to test the fitted MCDF. The significance level is 0.05. The smaller the KS statistic, the greater the possibility of uniform distribution of MCDF. Taking Fig. 2 (b) as an example, KS test results are shown in Table 1. As can be seen from Table 1, KS statistics are generally small and p are all greater than 0.05, indicating that the marginal distribution function of 8-channel sEMG signals cannot reject the null hypothesis of uniform distribution (0, 1) at the significance level of 0.05. The intermuscular dependent structure was constructed based on R-Vine Copula with the help of “vinecopula” package in R software, and the RVM was obtained by the MST algorithm. In order to demonstrate the advantages and disadvantages of R-Vine Copula in describing the intermuscular interdependence structure, C-Vine Copula and D-Vine Copula are used for comparisons. If the p is greater than 0.05, the null hypothesis that there is no difference between the two models cannot be rejected. Values of AIC and logarithmic likelihood function and results of Vuong test are shown in Table 2. The AIC value of R-vine Copula is lower than that of both C-vine and D-vine, and the logarithmic likelihood function value of R-vine Copula is higher than that of both C-vine and D-vine. It shows that R-Vine Copula has a higher degree of fitting to intermuscular dependent structures, and the model description is more accurate. Three Vuong tests all tended to that R-vine Copula is significantly better than C-vine and D-vine (p < 0.05), at the significance level of 0.05. Although C-Vine Copula compared with D-Vine, the Vuong statistic is positive, but there is no significant difference (p > 0.05). Table 1. KS test results of MCDF (H1). UT
AD
MD
PD
PM
IN
BB
TB
KS statistic
0.0134
0.0143
0.0122
0.0094
0.0105
0.0143
0.0134
0.0111
P
0.3268
0.2538
0.4461
0.7637
0.6363
0.2563
0.3273
0.5672
Table 2. Performance comparison of three Vine Copulas. R-Vine vs C-Vine
R-Vine vs D-Vine
C-Vine vs D-Vine
LL
2036.61/2004.50
2036.61/1997.48
2004.50/1997.48
AIC
– 3965.22/-3913.00
– 3965.22/-3900.97
– 3913.00/-3900.97
Vuong statistic
2.44
2.40
0.54
P
*
*
0.59
Note: “* “: p < 0.05
Next, we use RVCMI (Eq. (6), N = 2: brivariate; N = 8: multivariable) to measure the nonlinear coupling strength between dual-channel and multi-channel muscles, and the average results are shown in Fig. 3 and Fig. 4. The RVCMI value located on the main
532
Y. Wu et al.
diagonal of Fig. 3 is the Self-Information (SI), which measures the amount of information contained in the sEMG signals. The SIs of 8-channel sEMG signals are very similar (about 7.36 bit). The RVCMI value (0.1278 bit) of TB and PD are significantly higher than those of other muscles, indicating that there is a strong nonlinear coupling between TB and PD. In addition, TB has higher coupling strength with AD and MD, indicating that TB is closely coupled with the deltoid muscle. We can also find that the RVCMI values of BB and other muscles are particularly low ( 0.65, p < 0.05) between H1 and H5, and the results indicate that the dual-channel intermuscular coupling strength relationship of five subjects is relatively consistent. Table 3. Mantel test of dual-channel intermuscular coupling strength between subjects. Subjects
H1
H2
H3
H4
H5
H1
–
–
–
–
–
H2
0.75,**
–
–
–
–
H3
0.73,**
0.76,**
–
–
–
H4
0.66,*
0.82,**
0.65,**
–
–
H5
0.72,**
0.86,**
0.74,**
0.77,**
–
Note: The numbers in the table are Mantel statistics about RVCMI, p is the corresponding p-Value, “* “: p < 0.05, “** “: p < 0.01
Fig. 3. Dual-channel intermuscular coupling strength
Fig. 4. Multi-channel intermuscular coupling strength
R-Vine Copula Mutual Information for Intermuscular Coupling Analysis
533
5 Discussion The study of intermuscular coupling is of great significance to the theoretical analysis of human motion, the rehabilitation evaluation of stroke motor function and the exploration of central nervous system motor control mechanism. However, due to their own characteristics, different coupling analysis methods will lead to differences in the revealed intermuscular coupling relationship. This paper proposes RVCMI method by combining R-Vine Copula theory with entropy theory. It not only inherits the advantages of R-Vine Copula and MI, but also considers the dependent structure of variables. R-Vine Copula allows various dependence structures among variables, and different pairs Copula are selected to describe the dependence among different variables. Moreover, MI based on R-Vine Copula does not need to make a prior assumption about MCDF and JCDF of variables, and are not limited by dimension, providing a new way to estimate MI. Some studies have shown that [11] sEMG signals come from the motor area of cerebral cortex, its PDF is close to Gaussian distribution in some cases. However, by using the non-parametric kernel density estimation method, we found that the PDF of sEMG signal is a symmetrical unimodal distribution with the characteristics of peak and thick tail, which may be related to the spatial recruitment strategy, type distribution and electrode arrangement of MU in muscle [12]. Compared with three types of Vine Copula, R-Vine Copula has lower AIC value and higher log likelihood function value than C-Vine and D-Vine Copulas. It indicates that R-Vine Copula can describe the dependent structure of multi-channel sEMG signals more accurately and effectively, which lays a good foundation for the subsequent intermuscular coupling analysis. According to RVCMI, five subjects showed similar dual-channel intermuscular coupling strength relationship: PD and TB had a strong nonlinear coupling, which was indispensable in the intermuscular coupling, while BB had a low degree of coupling with other muscles and did not seem to have any functional connection. This may be due to the unique physiological and anatomical structure and kinematic background of each muscle. There are still some limits in this study. Different from C-Vine and D-Vine, R-vine Copula constructs the tree structure based on the actual dependence between variables, which makes the parameter estimation process of R-Vine Copula very time-consuming to some extent. We will start the next work on this problem.
6 Conclusion In this paper, we have introduced R-Vine Copula into the intermuscular coupling analysis, and proposed novel RVCMI method based on R-Vine Copula for MI estimation. RVCMI naturally integrates dual channel and multi-channel, linear and nonlinear coupling analysis. The experimental results show that the PDF of sEMG signal in the process of reaching movement may be a kind of “peak fat tail” distribution, while the R-Vine Copula without fixed form is more flexible and effective in describing the intermuscular dependent structure. The RVCMI results indicate that the intermuscular coupling is mainly reflected in MD and TB, and the overall difference between the subjects is significant. Thus, RVCMI is an advanced intermuscular coupling analysis method, which has good application value in exploring neuromuscular information regulation strategies.
534
Y. Wu et al.
Acknowledgments. This work is supported by National Natural Science Foundation of China under Grants 61871427 and 62071161, and Key R & D Projects of Shandong Province under Grant 2019JZZY021005.
1. References 1. Takei, T.: Neural basis for hand muscle synergies in the primate spinal cord. Proc. Natl. Acad. Sci. U S A 114(32), 8643–8648 (2017) 2. Wen, W.: Coupling and synchronization analysis methods of EEG signal with mild cognitive impairment: a critical review. Front. Aging Neurosci. 7(54), 1–7 (2015) 3. Ince, R.A.A.: A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula. Hum. Brain Mapp. 38(3), 1541–1573 (2017) 4. Schepsmeier, U.: A goodness-of-fit test for regular vine copula models. Econ. Rev. 38(1), 25–46 (2019) 5. Ma, J.: Mutual information is copula entropy. Tsinghua Sci. Technol. 16(1), 51–54 (2011) 6. Bedford, T.: Probability density decomposition for conditionally dependent random variables modeled by vines. Ann. Math. Artif. Intell. 32(1), 245–268 (2001) 7. Morales-Ndpoles, O.: About the number of vines and regular vines on n nodes. Delft University of Technology (2010) 8. DißMann, J.: Selecting and estimating regular vine copula and application to financial returns. Computat. Stat. Data Anal. 59, 52–69 (2013) 9. Brechmann, E.: Truncated and simplified regular vines and their applications. Can. J. Stat. 40(1), 68–85 (2010) 10. Israely, S.: Direction modulation of muscle synergies in a hand-reaching task. IEEE Trans. Neural Syst. Rehabil. Eng. 25(12), 2427–2440 (2017) 11. Nazmi, N.: A review of classification techniques of EMG signals during isotonic and isometric contractions. Sensors 16(8), 1304 (2016) 12. Nazarpour, K.: A note on the probability distribution function of the surface electromyogram signal. Brain Res. Bull. 90(100), 88–91 (2013)
A New Feature Selection Method for Driving Fatigue Detection Using EEG Signals Zaifei Luo1 , Yun Zheng2 , Yuliang Ma2(B) , Qingshan She2 , Mingxu Sun3 , and Tao Shen3 1 School of Electronic and Information Engineering, Ningbo University of Technology, Ningbo
315211, China 2 School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China
[email protected] 3 School of Electrical Engineering, University of Jinan, Jinan 250022, China
Abstract. This study aims to extract the high-level features of driving fatigue using electroencephalography (EEG). The commonly used feature selection method is power spectral density (PSD) of five frequency bands, i.e., Alpha, Beta, Gamma, Delta and Theta band. This study proposes a new approach combined with ensemble empirical mode decomposition (EEMD) and PSD. EEMD provides several Internal Mode Function (IMF) components that can be used to extract PSD features. Multiple machine learning approaches, i.e., k-Nearest Neighbor (KNN), support vector machine (SVM), and hierarchical extreme learning machine algorithm with Particle Swarm Optimization (PSO-H-ELM), were used to evaluate the two feature selection methods. The results show that the accuracy based on EEMD’s PSD is obviously superior to feature extraction of frequency band’s energy spectrum. By comparing the accuracies, we came to the conclusion that the new feature selection method using the PSO-H-ELM classifier performed better with the highest average accuracy of 94.58%. Keywords: Electroencephalography · Power spectral density · Ensemble empirical mode decomposition · Fatigue driving detection
1 Introduction With the rapid development of modern industry, cars have become a popular means of transportation. Simultaneously, the fierce competition in modern society urges people to work hard, which may lead to overfatigue. Fatigue is a major cause of traffic accidents [1]. Therefore, fatigue driving detection has become an important topic in the field of driving safety. In addition, there are other factors that have been found to contribute to fatigue. These include monotonous environments, sleep deprivation, chronic drowsiness, and drug and alcohol use [2, 3]. In this paper, we used the most common form of sleep deprivation to achieve a state of fatigue. In recent decades, the public has gradually begun to pay attention to fatigue driving, and various fatigue driving detection methods have been proposed [4]. At present, the main fatigue detection methods are as follows: © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 535–542, 2022. https://doi.org/10.1007/978-981-16-6554-7_59
536
Z. Luo et al.
(1) Behavioral features, including the variability of lateral lane position, vehicle heading difference metric [5]. These methods are simple in principle, but they are difficult to implement because a common metric have not yet been standardized. (2) Facial expression features, such as the degree of resting eye closure or nodding frequency [6, 7]. These methods are influenced by a number of factors, including image angles and image illumination. This reduces their accuracy and applicability. (3) Physiological features, including electrocardiogram (ECG) [8], heart rate, electromyogram (EMG), and electroencephalogram (EEG)-based features [9, 10]. Unlike behavioral and facial features, these serve as objective markers of the physiological changes of the human body in response to surrounding conditions. Among these signals, EEG is considered to be the most direct, effective, and promising one used to detect driving fatigue [10]. The most accurate equipment to collect EEG signals are wearable EEG signal acquiring devices. These systems enable braincomputer interface (BCI) to measure EEG signals and determine whether the driver is in a state of fatigue. Different feature selection methods directly affect the accuracy of fatigue driving monitoring. The commonly used method is to divide the EEG signal into Alpha (8– 15 Hz), Beta (16–24 Hz), Gamma (25–100 Hz), Delta (0.1–3 Hz) and Theta (4–7 Hz) frequency bands by short-time Fourier transform (STFT), and further calculate the power spectral density (PSD) of each frequency band as the feature of the signal. The limitation of this method is that the window length of STFT is fixed. To overcome this limitation, an integrated empirical mode decomposition (EEMD) algorithm is developed in this paper. Empirical Mode Decomposition (EMD) is a new adaptive time-frequency analysis method proposed by Huang [11]. EMD performs signal decomposition according to the time scale characteristics of the data itself. This is where EMD is superior to the STFT, and it gets rid of the limitations of the STFT. Ensemble Empirical Mode Decomposition (EEMD) is a new approach improved by Wu and Huang based on Empirical Mode Decomposition (EMD). This new approach consists of sifting an ensemble of white noise-added signal and treats the mean as the final true result [12]. By adding white noise, we improve the signal distribution of extreme value point, which can effectively solve the mode mixing of the original EMD as well as improve the SNR of signal significantly. Through the EEMD processing, we obtained a series of intrinsic mode functions (IMF) components which contain the time-frequency features. Then the power spectrum density of every IMF component is calculated respectively to build feature space. K-Nearest Neighbor (KNN) algorithm and Support Vector Machines (SVM) are commonly used in fatigue driving detection, but the classification accuracy is not ideal because of the simple structure. Some research groups have turned to the extreme learning machine (ELM) algorithm for classification, thanks to its fast-training speed [13]. However, the shallow structure of ELM will lead to the failure of learning signal features. To address this issue, a new ELM-based hierarchical learning framework has been proposed for multilayer perceptron classification by Huang et al. [14], known as the hierarchical extreme learning machine(H-ELM).When compared with other MLP training methods, like the traditional ELM, the training of the H-ELM is much faster and more accurate. In order to further improve the accuracy of driving fatigue detection, a time-frequency
A New Feature Selection Method for Driving Fatigue Detection
537
domain EEG feature detection method combining particle swarm optimization (PSO) and H-ELM algorithm was proposed. Based on the unique characteristics of EEG signal characteristics by PSO [15], the parameters of H-ELM kernel function can be optimized to improve the classification accuracy.
2 Algorithm 2.1 Feature Extraction Algorithm EMD decomposes a signal into a series of intrinsic mode functions (IMFs), each of which has a different time scale [16]. EMD mainly includes the following six steps [17]: (1) Extract all the local maximum/minimum values of signal x(t). The upper and lower envelope eupper (t), elower (t) are calculated by cubic spline interpolation, the local maxima and minima respectively. (2) Compute the envelope mean mean1 (t) by: mean1 (t) =
eupper (t)+elower (t) 2
(1)
(3) h1 (t) = x(t) − mean1 (t), h1 (t) is the input signal. (4) Repeat step 1–3, until the number of extrema points and zero crosses must be equal to or at most one difference [21], put IMF(t) ← h1,k (t), x(t) ← x(t) − h1,k (t). (5) Calculate the convergence criterion by: SD(k) =
N |h1,k−1 (t)−h1,k (t)|2 t=0 |h1,k−1 (t)|2
(2)
(6) Repeat step 1–5, until SD(K) < ε (j times), calculate the IMFs as:
IMFj (t) ← hj,k (t)(jthIMF)
(3)
Here, N is the time duration. The “a ← b” arrow is an assignment or a substitution operation; it means that the value of the variable a is to be replaced by the current value of the variable b. EEMD mainly uses the statistical characteristic of the frequency distribution of white noise to solve the EMD model mixing phenomenon, its main principle [18] is: When adding white noise to the signal, the extreme point distribution of the low frequency component changes so that the distribution of extreme point interval is uniform, which avoids the mixing of mode. EEMD mainly includes the following three steps: (1) Add white noise to the original signal. Repeat this step M times. (2) Use EMD algorithm to decompose the signals after adding white noise each time, and get different IMF components and a residual term respectively. (3) Calculate the average value of IMF as the final IMF component.
538
Z. Luo et al.
√ M is determined by Equation: e = k/ M , k is the amplitude coefficient of added white noise sequence, e is the standard deviation input signal with noise is decomposed. After performing STFT and PSD, the powers of the first three IMFs were extracted. A total of 96 IMFs PSD features (32 electrodes × 3 IMFs) were thereby extracted from the EEG. We adopted a Hanning window-based discrete STFT to extract the PSD features. Suppose the EEG signal recorded by an electrode is x[n] = {x1 , x2 , . . . , xn }, the STFT of the EEG signal is: STFT {x[n]}(m, wk ) =
N
n=1 x[n]w[n − m]e
−jwk n
(4)
Where wk = 2πN k is the angular frequency and k = 0, 1, . . . , N − 1. w[n] represents a window function. The Hanning window used in this study is as follows: 2 πn = sin (5) w[n] = 0.5 1 − cos N2πn −1 N −1 The energy spectrum of the IMF signal is calculated as: (6) Where fn represents the IMF component, F(ω) is the Fourier transform of the IMF component, and F ∗ (ω) is the conjugate function of F(ω). 2.2 Classifier As mentioned in the introduction, due to the single hidden layer network structure of ELM, ELM performs poorly when processing high-noise data. Therefore, several methods such as Multi-layer Perceptual Extreme Learning Machine (e.g., H-ELM) are proposed, and use ELM with multi-layer network structure. H-ELM algorithm constructs a self-encoding function, in the form of encoding and decoding to learn and represent the original input signal, and constantly reduce the reconstruction error to achieve the purpose of representing the original signal. The principle of this function is similar to the feature extractor in the deep learning framework [19]. PSO-H-ELM: Particle Swarm Optimization (PSO) is a Swarm intelligence algorithm proposed by R. Eberhart and J. Kennedy in 1995 on the basis of the BOID model [20], which is derived from the study on the predatory behavior of birds. The idea is to simplify the bird flock into a particle swarm in space, and the particle in the swarm dynamically adjusts its position and speed by tracking the two extreme values of the swarm in each iterative search process: the optimal solution Pbest found by the particle itself and the optimal solution Gbest found by the swarm. Actually, PSO-H-ELM algorithm optimizes the penalty factor S and l2-norm in HELM algorithm by PSO algorithm, so as to optimize the H-ELM classifier model and achieve the highest accuracy.
A New Feature Selection Method for Driving Fatigue Detection
539
3 Experiments 3.1 Purpose The purpose of our experiment is to collect EEG data under different alertness (wakefulness and fatigued) conditions. All subjects were required to take part in the experiments under two conditions for comparison. 3.2 Subjects A total of 6 subjects (6 males, average age 24 years) with a valid driver’s license were recruited to participate in the study. All the subjects were healthy. On the night before the experiment, all subjects got enough sleep to ensure the validity of the data collected. 3.3 Experiment In this paper, driving simulation equipment and related software are used to simulate a real driving process. The driving simulator platform itself includes a fixed car steering wheel, the brake and accelerator pedals, a large screen, a high-performance computer, simulation software, and a multifunctional data acquisition board. In order to obtain the EEG data under different physiological states, subjects were required to sleep for 4 h and 8 h respectively under fatigue state and awake state before collecting EEG signals. Subjects worked as usual the next day, performing a driving simulation at 8:00 pm. EEG signal acquisition began at 8:10 p.m. and stopped at 8:30 p.m. 3.4 Preprocessing We chose to use a 10 s segment of EEG data as a sample, creating 240 samples for each subject. The data from 6 subjects was collected, resulting in a total of 1440 samples. Among these, 240 samples from one subject were set aside as testing data, while the remaining 1200 samples from the other five subjects were used as training data. Preprocessing steps then included: • The original sampling frequency of the EEG was 1000 Hz resulting in samples with too much data. Down-sampling was performed to reduce the EEG fre-quency to 200 Hz, which was quicker to classify without destroying the features. • The most informative EEG data is typically found between 0.1 Hz and 30 Hz. In the process of signal acquisition, electrode resistance, drift, sweating, and skin potential factors can change the baseline EEG signal voltage slowly and continuously, so highpass filtration at 0.1 Hz was performed to remove this slow voltage drift. Band-pass filtration on the EEG signal from was therefore performed 0.1–50 Hz to remove noises such as power frequency noise, white noise, and EMG (the frequency of EMG signals produced by a muscle con-traction is usually greater than 100 Hz).
540
Z. Luo et al.
4 Results and Discussion In this section, we process the original EEG signal to achieve the best classification performance and determine whether the proposed PSO-H-ELM offers an improvement over existing method. All the methods were implemented in the MATLAB 2014a environment on a PC with a 3.4 GHz processor and 8.0 GB RAM. We choose signals of one electrode of same subject at different alertness levels to test the performance of our proposed feature extraction approach. Figure 1 shows the comparison of normal signals and fatigue signals during the processing of feature extraction. Obviously, extracting the PSD features of IMFs from EEMD can make the difference of EEG signals bigger, which means the new proposed approach could have better extraction performance with fewer features.
Fig. 1. Comparison of normal and fatigue EEG signals
We chose 240 samples from one subject as the testing data, with the remaining 1200 samples from the five other subjects used as training data. This arrangement was chosen to avoid any possible confounds from the random selection of training and test data. Our experiment used two feature selection methods including the PSD and EEMD-PSD, three classifiers including the SVM, KNN, and PSO-H-ELM, and the average accuracy of six subjects in each combination was shown in Fig. 2. Briefly, the SVM model achieved accuracies of 83.13% and 90.76%, respectively. Use of the KNN resulted in accuracies of 86.88% and 94.38%. The PSO-H-ELM reached accuracies of 88.40% and 94.58%. From the results shown in Fig. 2, we can conclude that PSD features of IMF components have better performance than PSD features of frequency bands no matter using which classifier, indicating that the proposed feature
A New Feature Selection Method for Driving Fatigue Detection
541
Fig. 2. The accuracies of KNN, SVM and PSO-H-ELM with different feature extraction approaches.
extraction approach is superior to the traditional ones in the detection of driving fatigue. Consequently, using EEMD-PSO and PSO-H-ELM can achieve the highest accuracy of 94.58%.
5 Conclusion In this paper, we have proposed a new feature extraction method based on PSD and EEMD, and three classifiers provided evidence that its effectiveness is superior to existing feature extraction methods, and show that the EEMD-PSD feature has a more significant difference between fatigue and normal state. The PSO-H-ELM classification algorithm and the proposed EEMD-PSD feature extraction method were used to achieve the highest accuracy of 94.58%. The limitation of this paper is that there are only six subjects, which may lead to unconvincing results. Acknowledgement. This work is supported by the National Natural Science Foundation of China under grant 62071161, 61871427 and Key R&D projects of Shandong Province under grant 2019JZZY021005.
References 1. Zhang, G., Yau, K.K.W., Zhang, X., Li, Y.: Traffic accidents involving fatigue driving and their extent of casualties. Accid. Anal. Prev. 87, 34–42 (2016) 2. Nilsson, T., Nelson, T.M., Carlson, D.: Development of fatigue symptoms during simulated driving. Accid. Anal. Prev. 29, 479–488 (1997) 3. Ting, P.-H., Hwang, J.-R., Doong, J.-L., Jeng, M.-C.: Driver fatigue and highway driving: a simulator study. Physiol. Behav. 94, 448–453 (2008) 4. Drivers’ fatigue studies: MILOSEVIC, S. Ergonomics 40, 381–389 (1997) 5. Morris, D.M., Pilcher, J.J., Switzer, I.F.S.: Lane heading difference: an innovative model for drowsy driving detection using retrospective analysis around curves. Accid. Anal. Prev. 80, 117–124 (2015)
542
Z. Luo et al.
6. D’Orazio, T., Leo, M., Guaragnella, C., Distante, A.: A visual approach for driver inattention detection. Pattern Recogn. 40, 2341–2355 (2007) 7. Bergasa, L.M., Nuevo, J., Sotelo, M.A., Barea, R., Lopez, M.E.: Real-time system for monitoring driver vigilance. IEEE Trans. Intell. Transp. Syst. 7, 63–77 (2006) 8. Wang, L., Li, J., Wang, Y.: Modeling and recognition of driving fatigue state based on R-R intervals of ECG data. IEEE Access 7, 175584–175593 (2019) 9. Ma, Y., et al.: Driving fatigue detection from EEG using a modified PCANet method. Comput. Intell. Neurosci. 2019(3), 1–9 (2019) 10. Ren, Z., et al.: EEG-based driving fatigue detection using a two-level learning hierarchy radial basis function. Front. Neurorobot. 15, 618408 (2021) 11. Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk, T.H., Faubert, J.: Deep learningbased electroencephalography analysis: a systematic review. J. Neural Eng. 16(5), 051001 (2019) 12. Riaz, F., Hassan, A., Rehman, S., Niazi, I.K., Dremstrup, K.: EMD-based temporal and spectral features for the classification of EEG signals using supervised learning. IEEE Trans. Neural Syst. Rehabil. Eng. 24(1), 28–35 (2016) 13. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1, 11–41 (2009) 14. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70, 489–501 (2006) 15. Tang, J., Deng, C., Huang, G.B.: Extreme learning machine for multilayer perceptron. IEEE Trans. Neural Netw. Learn. Syst. 27, 809–821 (2016) 16. Ma, Y., et al.: Driving drowsiness detection with EEG using a modified hierarchical extreme learning machine algorithm with particle swarm optimization: a pilot study. Electronics 9(5), 775 (2020) 17. Damerval, C., Meignen, S., Perrier, V.: A fast algorithm for bidimensional EMD. IEEE Signal Process. Lett. 12(10), 701–704 (2005) 18. Boudraa, A.O., Cexus, J.C.: EMD-based signal filtering. IEEE Trans. Instrument. Meas. 56, 2196–2202 (2007) 19. Xiao-juna, Z., Shi-qinb, L., Xue-lia, Y., Liu-juana, F.: Electroencephalogram denoising method based on improved EMD. Comput. Eng. 38(1), 151–153 (2012) 20. Tang, J., Deng, C., Huang, G.: Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions on Neural Networks and Learning Systems 27(4), 809–821 (2016) 21. Maali, Y., Al-Jumaily, A.: A novel partially connected cooperative parallel PSO-SVM algorithm: Study based on sleep apnea detection, 2012 IEEE Congress on Evolutionary Computation, vol. 2012, pp. 1–8 (2012)
A New Strategy for Mental Fatigue Detection Based on Deep Learning and Respiratory Signal Jie Wang1,2,3 , Jilong Shi1,2 , Yanting Xu1,2 , Hongyang Zhong1,2 , Gang Li1,2,3(B) , Jinghong Tian1,2 , Wanxiu Xu1,2 , Zhao Gao1,2 , Yonghua Jiang1,2,4 , Weidong Jiao1,2 , and Chao Tang4 1 Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology and
Equipment of Zhejiang Provincial, Zhejiang Normal University, Zhejiang 321005, China [email protected] 2 College of Engineering, Zhejiang Normal University, Jinhua 321004, China 3 College of Mathematics and Computer Science, Zhejiang Normal University, Jinhua 321004, China 4 Xingzhi College, Zhejiang Normal University, Lanxi 321100, China
Abstract. Mental fatigue is often associated with decreased mental alertness and worsening performances. But its detection method is still a difficult issue due to the contradiction between practicability and accuracy. In the current study, we attempt to provide a new method for mental fatigue detection to realize the unity of practicability and accuracy based on deep learning and respiratory signal. To this end, respiratory signals were collected and two deep learning models, convolutional neural network (CNN) and Long Short Term Memory (LSTM), were constructed. Wavelet scale maps and time series of respiratory signal were as input to CNN and LSTM respectively. The data set was divided into training set, verification set and test set according to the ratio of 6:2:2. Bayesian optimization was used for hyperparametric optimization of CNN and LSTM. The results showed that LSTM model has a better performance of test accuracy of 89.16% than CNN that of 77.29%. Our findings indicated that respiratory signal combined with LSTM classifier may be a practical and effective strategy for mental fatigue detection. Keywords: Deep learning · CNN · LSTM · Respiratory signal · Mental fatigue
1 Introduction Maintaining sustained attention during long-term cognitive tasks usually leads to a high degree of mental fatigue, prone to anxiety, depression, difficulty concentrating, slow thinking, prolonged reaction time, decreased work efficiency, increased error rate and other fatigue symptom. Coupled with the accumulation of primitive fatigue caused by the fast pace, high pressure, poor sleep quality and other factors of modern life, mental fatigue has become one of the biggest negative factors affecting people’s physical and mental health and work efficiency. Mental fatigue is an important factor in the humanmachine system with high safety requirements [1], such as air traffic control, manned © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 543–552, 2022. https://doi.org/10.1007/978-981-16-6554-7_60
544
J. Wang et al.
aerospace, car driving, airplane driving, etc. It is also the causative factor of certain psychogenic brain diseases, such as chronic fatigue syndrome, depression, etc. The greater the degree of mental fatigue, the greater its negative impact. According to China Expressway, in the major traffic accidents on expressways, the proportion of accidents caused by the driving fatigue accounted for more than 40%, posing a great threat to the safety of peoples’ lives and property. Around the world, 21% of traffic accidents are caused by the driving fatigue every year [2]. Therefore, it is a very important issue to develop a practical and effective method for mental fatigue detection. With the development of science and technology, a number of approaches have emerged to study mental fatigue, such as questionnaire surveys, psychological indicators, physiological signal (such as electroencephalogram (EEG), blood oxygen signal, electrocardiograph (ECG), respiratory signal, etc.), etc. Questionnaire survey is identified as highly subjective, which can’t objectively measure fatigue and is usually a method used to assist in the determination of fatigue. The mental fatigue detection methods based on EEG, ECG and blood oxygen signal have higher accuracy than other detection methods [3]. Unfortunately, the participants have to add contact electrodes on the human scalp or body surface during the test process, which is cumbersome and can cause psychological pressure on participants, resulting theirs weak practicality. Respiratory signal is now gradually followed with interest by the researchers in the field of mental fatigue detection based on its advantages that is easy to recorded accurately and don’t need to directly contact with participants [4]. To our knowledge, there is no research report on the use of respiratory signals alone but combined with other physiological signals for mental fatigue detection. The main reason may be that the accuracy of using respiratory signal alone is too low. For example, Fu et al. reported that using respiratory signals for driving fatigue detection can only get the accuracy of 78.5%, but the combined use of respiratory signal, electromyogram and EEG can achieve an accuracy of 81.9% [4]. Therefore, it is very important to find an effective classifier to improve the accuracy of using respiratory signals for mental fatigue detection. In recent years, with the development of artificial intelligence technology, more and more machine learning and deep learning methods are applied to the field of biomedical signal processing [5]. As is well known, machine learning generally requires very complicated feature extraction steps, and can obtain better classification accuracy through appropriate feature selection algorithms. But deep learning does not require feature extraction steps, and the classification accuracy is higher than machine learning. Therefore, in present study, we attempt to use respiratory signal with deep learning classifier to achieve high mental fatigue detection accuracy.
2 Methods and Materials 2.1 Participants Four males and four females engineering undergraduate students participated in this experiment. They were aged 20.4 ± 1.4 years, and their mean body mass index was 20.8 ± 0.9 kg/m2 . All participants have read and signed the informed consent form before the experiment, and the Zhejiang Normal University Ethics Committee have approved this study. Every enrolled subject have a regular life, normal eyesight, and no brain
A New Strategy for Mental Fatigue Detection
545
disorders. They were also asked to not stay up late and not drink alcohol and drugs within three days before the test. They were prohibited from smoking and having coffee and tea in 8 h. 2.2 Respiratory Signal Collection and Preprocessing Respiratory signals were collected with BIOPAC MP150 for all participants during the whole task (see Fig. 1(a)). The task is a continuous mental arithmetic task with 200 different problems (a double-digit between 60 and 99 plus another random double-digit between 60 and 99 and then multiplied by a random single digit between 6 and 9), which can successfully induce mental fatigue among the participants according to our previous study [6]. Every problem was designed to be at the same difficulty level and accomplished within 35 s.
Fig. 1. The flow chart of mental fatigue detection based on deep learning and respiratory signal. (a) Original respiratory signal collected. (b) Original data is filtered and downsampled. (c) Generate wavelet scale map for CNN. (d) Bayesian optimization for hyperparameters. (e) Mental fatigue detection with CNN and LSTM.
546
J. Wang et al.
The collected respiratory signal is a low-frequency signal, which is usually mixed with interference signals due to the subject’s body movement. It is necessary to remove the artifacts. In order to improve the classification accuracy, firstly, the baseline drift was removed from the respiratory signal, then a second-order Butterworth bandpass filter between 0.1 Hz and 1.5 Hz was applied to remove the artifacts, and finally the data were downsampled to 100 Hz (see Fig. 1(b)). In this study, the first 603 s of the experimental data were chosen as wake state, and the last 603 s were regarded as fatigue state. In order to more fully characterize the information contained in the respiratory signal, a time window of 4 s was utilized with a 75% superimposition. In this study, we constructed two different deep learning models, convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). CNN is a network structure that is excellent for image processing, and LST is good at processing time series signal. Regarding the difference in the data input form between the two network structures, wavelet scale maps (means time-frequency image obtained from wavelet transform, see Fig. 1(c)) and time series (see Fig. 1(b)) of respiratory signal were as input to CNN and LSTM respectively. 2.3 CNN Model CNN is usually using convolutional layers to extract high-dimensional features, avoiding complicated feature extraction process like machine learning. CNN has three major characteristics of weight sharing, pooling and local perception, which greatly reduces the amount of parameters and complexity of the network. With the rapid development of deep learning, more and more excellent deep learning models have emerged. GoogleNet is the champion model of the 2014 imageNet Challenge. As different from other CNN structures, GoogleNet uses the inception structure to improve the performance of the network structure by increasing the width and depth of the network. In addition, GoogleNet improves the traditional inception structure through adding three 1 × 1 convolution kernels (see Fig. 2), which can greatly reduce the amount of network parameters, and at the same time, can learn abundant features to improve network performance. In this study, we build a CNN model based on the GoogleNet model shown in Fig. 3. For the output
Fig. 2. Inception structure
A New Strategy for Mental Fatigue Detection
547
layer, the neurons in the fully connected layer are set to 2, the activation function is softmax function, and the final classification layer uses the cross-entropy loss function.
Fig. 3. CNN network structure diagram based on GoogleNet
2.4 LSTM Model As we all know, mental fatigue is a process of continuous accumulation, which the subsequent states are affected by the previous state and have a strong sequential nature. LSTM is a variant of recurrent neural network (RNN). LSTM has the unique memory unit, which allows it to store and extract the time information from the timing signal, and can solve the long-term dependence of the timing signal [7, 8]. In this study, we build a LSTM model shown in Fig. 4. Since the training samples and feature quantities are not particularly large, good training results can be achieved by only setting a single-layer LSTM, and the number of hidden layer neurons is set to 100. For the output layer, the settings are the same as CNN model.
548
J. Wang et al.
Fig. 4. LSTM network structure
2.5 Bayesian Optimization In deep learning, it is also necessary to define model hyperparameters for the training process. The selection of hyperparameters has a great influence on the classification results of the model. However, hyperparameters cannot be automatically obtained through the learning process, and they can only be set before training. Therefore, the hyperparameters need to be adjusted and optimized to achieve the optimal solution to improve the performance of the model. Since the tuning of hyperparameters is a black box problem, it is impossible to see the tuning process. Researchers have developed many algorithms for hyperparameter optimization, such as grid search, random search, treestructured Parzen estimator, sequential model-based optimization for general algorithm configuration, particle swarm optimization, Bayesian optimization, and so on. Table 1. Bayesian optimization parameters and their ranges Optimization variable
Variable data type
Value range
Momentum (CNN)
Float
[0.8 1]
Gradientdecayfactor (LSTM)
Float
[0.8 1]
Minibatchsize
Int
[20 128]
Initiallearnrate
Int
[1E-4 1]
Learnratedropfactor
Float
[0 1]
Learnratedropperiod
Int
[1 4] * 10
Maxepochs
Int
[2 4] * 10
L2regularization
Float
[1E–10 1E–2]
In this study, Bayesian optimization is used for CNN and LSTM model (see Fig. 1(d)). Compared with other model-free optimization algorithms, Bayesian optimization pays
A New Strategy for Mental Fatigue Detection
549
more attention to optimization efficiency, which is to use a small number of objective function evaluations to obtain near-optimal solutions. Before the model training, it is necessary to define the parameter categories and ranges. Bayesian optimization uses the surrogate model to fit the real objective function, and estimates and selects the next set of parameter combinations based on the fitting results. As long as there are enough iterations, in theory, the global optimal solution will eventually be found [9]. The optimized hyperparameters and their constraints are given in Table 1, and the number of optimization iterations is set to 120. 2.6 Model Training and Predicting In this study, CNN and LSTM models were built based on Matlab 2019b software environment. In order to improve the training speed of the model, two NVIDIA GeForce RTX 2080ti are used, and Matlab parallel pool is enabled to speed up the calculation. Totally, there are 1200 samples (600 samples for wake and fatigue state respectively) for the models. The whole data set for each state is divided randomly into training set, validation set and test set according to the ratio of 6:2:2 to avoid overfitting. In addition, the CNN model and the LSTM model use SGDM and Adam gradient optimizers respectively to adjust the gradient during training. SGDM adds first-order momentum on the basis of SGD, which simulates the momentum of the motion of objects in physics. When the gradient is updated at the same latitude, the momentum increasing will accelerate the gradient descent speed; when the gradient is not updated at the same latitude, the momentum reduction can reduce the oscillation of the gradient during the update. Meanwhile, Adam adds second-order momentum on the basis of the first-order momentum, which can adaptively update the learning rate and set different learning rates for different parameters, so that the model can obtain better learning effects. In addition, the mini-batch gradient descent strategy is adopted in the training process, and the training set and validation set are disrupted in each training epoch, which can make full use of the information of the data and avoid falling into the local optimal solution during training. Finally, the performances of the trained model are evaluated based on four evaluation indicators for mental fatigue detection (see Fig. 1(e)): Accuracy [(TP + TN)/(TP + TN + FP + FN)], Precision [TP/(TP + FP)], Recall [TP/(TP + FN)], and F1 [2TP/(2TP + FP + FN)]. Where TP is true positive, FP is false positive, FN is false negative, and TN is true positive.
3 Results and Discussion In this study, we use respiratory signal and two deep learning models, CNN and LSTM, to detect mental fatigue. Wavelet scale map generated by the wavelet transform of the preprocessed signal segment is input into the CNN model, and the preprocessed signal segment is directly input into the LSTM model. The results are given in Fig. 5 and Table 2, indicating that LSTM model is better than CNN model in mental fatigue detection with respiratory signal. As shown in Fig. 5, the accuracies of verification set and test set of LSTM model are higher than those of CNN model except for the sixth subject. Table 2 shows that the average accuracy, precision, recall, and F1 of LSTM model
550
J. Wang et al.
for test set are 89.16%,86.00%, 93.13%, and 89.31% respectively, while the average accuracy, precision, recall, and F1 of CNN model for test set are 77.29%, 75.55%, 76.88%, and 75.00% respectively. In another research, Ge et al. [10] also reported that the LSTM model (the accuracy and recall are 85.5% and 84.2% respectively) for sleep apnea detection is better than that of CNN model (the accuracy and recall are 80.7% and 78.1% respectively). It can be inferred that LSTM model may be more reliable and effective than CNN model in dealing with timing signals due to the LSTM’s unique memory unit, especially for respiratory signal.
Fig. 5. Verification and Test results of LSTM and CNN. The subjects are arranged from small to large according to the test accuracy of LSTM.
Table 2. Evaluation indexes of CNN model and LSTM model Model
Accuracy
Precision
Recall
F1
LSTM
89.16%
86.00%
93.13%
89.31%
CNN
77.29%
75.55%
76.88%
75.00%
In addition, we achieved a high test accuracy of 89.16% for mental fatigue detection using only single-channel respiratory signal. Liu et al. reported a driving fatigue classification accuracy of 92.70% with complex feature extraction methods from three EEG channels using machine learning classifiers [11]. Ogino et al. [12] obtained a classification accuracy of 72.7% using one EEG channel. Wei et al. [13] introduced the non-hair-bearing method with four EEG channels for driving fatigue detection and gained a classification accuracy of 80.0%. Fu et al. used respiratory signal for driving fatigue detection with the accuracy of 78.5% using machine learning [4]. From the above researches we can infer that using respiratory signal with LSTM classifier can obtain a classification accuracy close to using a few EEG channels, and have a higher accuracy than that of using respiratory signal with machine learning. We can conclude that
A New Strategy for Mental Fatigue Detection
551
using respiratory signal with LSTM model is a practical and effective strategy for mental fatigue detection.
4 Conclusion In this study, we proposed a new strategy for mental fatigue detection based on deep learning and respiratory signal. Both CNN and LSTM models were constructed for comparison. The results showed that LSTM model is better than CNN model for mental fatigue detection using respiratory signal, and we achieved a high accuracy of 89.16% with LSTM model. Our findings indicated that respiratory signal combined with LSTM classifier may be a practical and effective strategy for mental fatigue detection. Acknowledgments. This work was partly supported by the National Natural Science Foundation of China (No. 82001918), Zhejiang Provincial Natural Science Foundation of China (No. LQ19E050011), Key Laboratory of Urban Rail Transit Intelligent Operation and Maintenance Technology & Equipment of Zhejiang Province Independent Research Project (No. ZSDRTZZ2020002), Key Research and Development Program of Zhejiang Province (No. 2019C01134), and National Undergraduate Innovation and Entrepreneurship Training Program (No. 202010345042).
References 1. Cao, J., Shi, J., Li, G., Guo, Y., Shan, P.: Research on ECG respiratory monitoring system of automobile driver based on PVDF. Mod. Electron. Technol. 42(10), 79–82+87 (2019) 2. Li, G., et al.: The Maximum eigenvalue of the brain functional network adjacency matrix: meaning and application in mental fatigue evaluation. Brain Sci. 10(2) (2021) 3. Hu, S., Peters, B., Zheng, G.: Driver fatigue detection from electroencephalogram spectrum after electrooculography artefact removal. IET Intell. Transp. Syst. 7(1), 105–113 (2013) 4. Fu, R., Tian, Y., Wang, S., Wang, L.: The recognition of driver’s fatigue based on dynamic Bayesian estimation. Chin. J. Biomed. Eng. 38(06), 759–763 (2019) 5. Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Fookes, C.: Deep learning for patient-independent epileptic seizure prediction using scalp EEG signals. IEEE Sens. J. 21(7), 9377–9388 (2021) 6. Li, G., Li, B., Wang, G., Zhang, J., Wang, J.: A new method for human mental fatigue detection with several EEG channels. J. Med. Biol. Eng. 37(2), 240–247 (2017) 7. Guo, X., Chen, L., Shen, C.: Hierarchical adaptive deep convolution neural network and its application to bearing fault diagnosis. Measurement 93(7), 490–502 (2016) 8. Felix, A.G., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000) 9. Cui, J., Yang, B.: Survey on Bayesian optimization methodology and applications. J. Softw. 29(10), 3068–3090 (2018) 10. Ge, J., Liu, Z.: The algorithm based on CNN and LSTM for sleep apnea syndrome detection. Electron. Sci. Technol. 34(02), 21–26 (2021) 11. Liu, X., et al.: Toward practical driving fatigue detection using three frontal EEG channels: a proof-of-concept study. Physiol. Meas. (2021). https://doi.org/10.1088/1361-6579/abf336
552
J. Wang et al.
12. Ogino, M., Mitsukura, Y.: Portable drowsiness detection through use of a prefrontal singlechannel electroencephalogram. Sensors 18(12), 4477 (2018) 13. Wei, C., Wang, Y., Lin, C., Jung, T.: Toward drowsiness detection using non-hair-bearing EEG-based brain-computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 26(2), 400–406 (2018)
The Analysis and AI Prospect Based on the Clinical Screening Results of Chronic Diseases Lingfeng Xiao1 , Yanli Chen1(B) , Yingxin Xing1 , Lining Mou1 , Lihua Zhang1 , Wenjuan Li2 , Shuangbo Xie3 , and Mingxu Sun3 1 Jinan Central Hospital, Jinan 250013, China 2 Jinan Vocational College of Nursing, Jinan 250102, China 3 University of Jinan, Jinan 250022, China
Abstract. The incidence of chronic diseases is increasing year after year in China, which has caused a heavy burden of diseases. Clinical screening helps identify patients with chronic diseases, thereby reducing the mortality rate of chronic diseases through early diagnosis and treatment. Clinical trials have shown that AI (Artificial Intelligence) improves the accuracy of chronic diseases detection. In our study, high-risk factor questionnaire investigation were conducted among aged 40– 69 residents of 25 community health service centres in a district, Detected high-risk people will be received the clinical screening. Then, the researchers analyse and evaluate the high-risk rate of the chronic diseases and the positive detection rate of clinical screening. A total of 4036 residents completed the assessment of high-risk groups in 2019. The high-risk rates of lung cancer, breast cancer, upper gastrointestinal cancer, colorectal cancer, liver cancer, cardiovascular-cerebrovascular diseases and cataracts were 57.85%, 22.13%, 74.63%, 41.65%, 53.39%, 83.94% and 20.24%, respectively. AI-aided chronic diseases screening increases the utilization rate of screening resources and increases the detection rate of chronic diseases, it is the direction of future efforts that exploring a new model of chronic disease screening based on AI plus medical. Keywords: Chronic disease · Early diagnosis and treatment · Clinical screening · Artificial Intelligence
1 Introduction Chronic diseases threaten the country’s residents’ health seriously and have become a significant public health problem affecting the national economic and social development in China. According to the latest statistics from the National Health Commission of the People’s Republic of China, there are 260 million people with chronic diseases in China, and the trend of this illness are developing younger. Chronic disease deaths account for This work is supported by Science and Technology Development Projects of Jinan Health Commission under grant 2020-3-04. This work is supported by Key R & D projects of Shandong Province under grant 2019JZZY021005. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 553–562, 2022. https://doi.org/10.1007/978-981-16-6554-7_61
554
L. Xiao et al.
86.6% of all deaths in China, and the burden of the disease accounts for 70% of the total burden of diseases [1]. The incidence of chronic diseases such as cancer, cardiocerebrovascular diseases, diabetes and chronic respiratory diseases has been increasing year by year and has become a significant challenge for Chinese even the world’s people. The proportion of people over 60 years old will be twice as high as it is present by 2050 estimated [2], and the situation of chronic disease prevention and control are serious. There are common requests for people’s health in this stage how to effectively prevent and control the occurrence and development of chronic diseases. The General Office of the State Council issue in 2017. It proposes the implementation of early diagnosis and treatment to reduce the risk of the high-risk population. Moreover, it will include disease screening technology, which is clinically diagnosable, available for treatment, acceptable to the masses, affordable by the country as a public health measure [3]. Our city has always focused on the screening and early diagnosis and treatment of chronic diseases. Our city took the lead in launching the ‘Early Diagnosis and Treatment Program for Major Chronic diseases in the a district’ in January 2019. This project is mainly carried out for high-risk population with a high incidence of chronic diseases such as lung cancer, breast cancer, upper gastrointestinal cancer, colorectal cancer, liver cancer, cardio-cerebrovascular diseases, and cataract. There are 4036 residents of the community having completed risk assessment, 3574 patients having completed clinical screening, and 25 cases of malignant tumours have been confirmed, with phased results achieved since the start of this project. This study analyses the screening results of major chronic diseases through screening data collected by the major chronic disease early diagnosis and treatment project in 2019.
2 Objects and Methods 2.1 Objects 25 community health service centres in a district were selected as program sites. Cluster sampling method was adopted to screen eligible populations, and the questionnaire on high-risk chronic disease factors was filled in. The inclusion criteria are: 40 to 69 years old (according to the date of birth of the ID card), residents of the district; Fully capable of behaviour; Voluntary signing of informed consent; No history of malignancy. In the end, 4036 people participated in the questionnaire survey. 2.2 Evaluation of High-Risk Groups The high-risk factor questionnaire designed by the program team was used to assess the high-risk population, including necessary information, living environment, eating habits, lifestyle, psychological emotion, history, family history, female menstrual history, and fertility history. The questionnaire should be filled out by the researched people on the mobile phone client, or the paper questionnaire entered and submitted by the staff, and evaluated by the risk assessment system and background software. According to the evaluation results, high-risk subjects were organized to receive free clinical screening in the hospital.
The Analysis and AI Prospect
555
2.3 Clinical Screening For the assessed high-risk populations, the community doctors will make an appointment to designated hospitals for corresponding high-risk screening for lung cancer, breast cancer, upper gastrointestinal cancer, colorectal cancer, liver cancer, cardio-cerebrovascular diseases, and cataracts. They will choose CT (computed tomography) screens for people who have a high risk of lung cancer and choose abdominal color Doppler ultrasound for people with a high risk of liver cancer. Besides, they will choose electronic gastroscopy for people who have a high risk of upper gastrointestinal cancer, they will choose electronic bowel endoscopy for people who have a high risk of colorectal cancer, and they choose breast color Doppler ultrasound and Mammography for people who with a high risk of breast cancer. Moreover, doctors also provide treatment and follow-up recommendations for the patients after confirming the pathology by analyzing the polyps and ulcers biopsy or removal under the microscope. 2.4 Quality Control 1) Questionnaire investigation: The project team’s experts conducted a unified training for the staff of 25 community health service centres for filling in the index. Then, the staffs will guide researched people fill out by their mobile phone application or submit a paper questionnaire what entered by the staff. Ensure the quality by the process of quality control. 2) Clinical screening: The professional and technical people with the qualification of examination and diagnosis will follow the relevant disease diagnosis and treatment norms and standards to perform clinical screening. 2.5 Statistical Analysis The author will utilise SPSS 23.0 software package to organise and analyse the data. He will then select the frequency (n) and percentage (%) to perform a statistical description of count data and choose the chi-square test to compare the rates among groups. All statistical testing methods are Bilateral tests, and P < 0.05 is considered statistically significant.
3 Results 3.1 Distribution of High-Risk Groups of Major Chronic Diseases By December 2019, a total of 4036 residents had completed the high-risk assessment questionnaire, including 1568 males (38.85%) and 2468 females (61.15%). 2335 cases (1167 males, accounting for 49.98%, and 1168 females, accounting for 50.02%) were assessed as having a high risk of lung cancer by the risk assessment system, with a highrisk rate of 57.85%. 893 case were assessed as having a high risk of breast cancer (3 males, accounting for 0.33%, and 890 females, accounting for 99.67%), with a high-risk rate of 22.13%. 3012 patients (1255 males, accounting for 41.67% and 1757 females,
556
L. Xiao et al.
accounting for 58.33%) were assessed as having a high risk of upper gastrointestinal cancer, with a high-risk rate of 74.63%. 1681 patients (692 males (41.17%) and 989 females (58.83%)) were assessed as having a high risk of colorectal cancer, with a highrisk rate of 41.65%. 2155 patients (968 males (44.92%) and 1,187 females (55.08%)) were assessed as having a high risk of liver cancer by the risk assessment system, with a high-risk rate of 53.39%. Among the 3,388 people (1319 males, accounting for 38.93%, and 2069 females, accounting for 61.07%) were assessed as having a high risk of cardiocerebrovascular, the high-risk rate was 83.94%. 817 persons (291 males (35.62%) and 526 females (64.38%)) were assessed as having a high risk of cataract, with a high-risk rate of 20.24%. As shown in Table 1. Table 1. Distribution of high-risk groups of major chronic diseases Gender
Lung cancer
Upper gastrointestinal cancer
Colorectal cancer
Liver cancer
Breast cancer
Cardio-cerebrovascular disease
Cataract
Male
1167
1255
692
968
3
1319
291
Female
1168
1757
989
1187
890
2069
526
χ2
288.800
39.635
6.502
71.676
713.924
0.059
4.505
P
0.000
0.000
0.011
0.000
0.000
0.809
0.034
3.2 Participation in Clinical Screening In this study, 4036 residents participated in the high-risk factor questionnaire survey, 4002 people at high risk were screened, 3574 people actually participated in clinical screening, and 89.31% of the screening tasks were completed (One person may be assessed as multiple high-risk groups). The chi-square test found the actual screening Table 2. Participation in clinical screening Screening site
High-risk population
Actual screening number (actual screening rate %)
Lung cancer
2335
1987 (85.10)
Upper gastrointestinal cancer
3012
1634 (54.25)
Colorectal cancer
1681
837 (49.79)
Liver cancer
2155
1951 (90.53)
Breast cancer
893
657 (73.57)
3388
2186 (64.52)
817
353 (43.21)
Cardio-cerebrovascular diseases Cataract χ2
1547.688
P
0.000
The Analysis and AI Prospect
557
rate of 7 chronic diseases, and the difference was statistically significant (χ2 = 1547.688, P = 0.000). The actual screening rate of liver cancer was the highest, followed by lung cancer, while the actual screening rate of colorectal cancer and cataract is low, as shown in Table 2. 3.3 Clinical Screening Results A total of 3574 people were screened clinically, and 25 cancer patients were found, with a detection rate of 0.69%,and this result is similar to Haifan XIAO’s [4]. Among them, 1987 patients with high-risk of lung cancer participated in clinical lung screening, and 1018 patients were found with positive pulmonary nodules after examination. There were 13 cases of lung cancer, accounting for 0.65% of the pulmonary check-ups. 1951 people with high-risk of liver cancer participated in clinical liver screening, and 1053 cases of fatty liver were found after examination. 1 cases of metastatic liver cancer were found. 1634 patients with high-risk of upper gastrointestinal cancer participated in upper gastrointestinal clinical screening, and 910 cases of atrophic gastritis were found. 356 gastric and duodenal polyps were found. 149 cases of gastric and duodenal mucosa bulge were found, accounting for 9.12% of the clinical screening time of upper gastrointestinal. 100 cases of gastroduodenal ulcer were found, 22 cases of gastrointestinal metaplasia were found, 6 cases of upper gastrointestinal cancer were found. 837 patients with high-risk of colorectal cancer participated in the clinical screening of colorectal cancer sites, and 221 patients were found with colon polyps after examination. 190 cases of colorectal polyps were found, accounting for 22.70% of the clinical screening time in the colorectal area. 5 cases of colorectal cancer were found, accounting for 0.60% of the clinical screening time in the colorectal area. A total of 657 people participated in breast clinical screening, 131 cases of breast nodules were found after examination, accounting for 19.94% of the breast examination time, and 26 cases of “Nodules BIRADS classification level 3 or above” were found, accounting for 3.96% of the breast examination time. Among the 2186 participants, 1181 cases of carotid plaque were found, accounting for 54.02% of the screening time, and 87 cases of carotid stenosis were found, accounting for 3.97% of the screening time. A total of 353 people participated in cataract screening, and 35 cases were found, accounting for 9.92% of the cataract screening time, as shown in Table 3.
558
L. Xiao et al. Table 3. Positive results of screening for major chronic diseases
Serial number
Screening items
Special project
Total
Number of positives
Positive ratio
1
Lung CT
Pulmonary nodules (≥5 mm)
1987
1018
51.23%
13
0.65%
356
21.79%
Gastric and duodenal mucosa bulge
149
9.12%
Gastric and duodenal ulcer
100
6.12%
Gastrointestinal metaplasia
22
1.35%
910
55.69%
Esophageal malignancy
3
0.18%
Gastric malignancy
3
0.18%
221
26.40%
190
22.70%
5
0.60%
131
19.94%
26
3.96%
1053
53.97%
1
0.05%
1181
54.02%
87
3.97%
35
9.92%
Lung cancer 2
Gastroscopy
Gastric and duodenal polyps
1634
Atrophic gastritis
3
Colonoscopy
Colon polyps
837
Colorectal polyps Colorectal malignancy 4
Breast ultrasound
Breast nodules
657
Nodules BI-RADS classification level 3 or above 5
6
Abdominal color Doppler ultrasound
Fatty liver
Cardio-cerebrovascular
Carotid plaque
1951
Liver cancer (primarily colon cancer with liver metastasis) 2186
Carotid artery stenosis 7
Cataract
Cataract
353
The Analysis and AI Prospect
559
4 Conclusion and Discussion 4.1 Conclusion This study analyses and evaluates the high-risk rate and the positive detection rate of the screening results of major chronic diseases through screening data collected by the major chronic disease early diagnosis and treatment project in 2019. The high-risk rates of lung cancer, breast cancer, upper gastrointestinal cancer, colorectal cancer, liver cancer, cardiovascular-cerebrovascular diseases and cataracts were 57.85%, 22.13%, 74.63%, 41.65%, 53.39%, 83.94% and 20.24%, respectively. Researchers’ further direction is increasing the positive detection rate of clinical screening and reducing the number of false-positives based on AI. To achieve the goal of early prevention, early diagnosis, early treatment, and reducing the mortality rate of chronic diseases. Then, those behaviours will be beneficial to improve the quality of life of people with chronic diseases. 4.2 The High-Risk Rate Detected by Screening is High There are high-risk 4036 people assessed, and 4002 people screened as high-risk, which high-risk rate is 99.16% in this study. It includes high-risk rates for the lung cancer, breast cancer, upper gastrointestinal cancer, colorectal cancer, liver cancer, cardiocerebrovascular diseases and cataracts is 57.85%, 22.13%, 74.63%, 41.65%, 53.39%, 83.94%, and 20.24%, respectively. ‘2017–2018 Zhenjiang City major Chronic Disease Epidemic Situation’ shows more than 30000 residents over the age of 18 who have been excluded from chronic diseases such as hypertension, diabetes, malignant tumours, and cardio-cerebrovascular diseases are listed as screening targets. There are 44.0% of people with a high risk of chronic diseases in the above research [5], and the difference is statistically significant (χ2 = 1124.083, P = 0.000) compared with the screening results of this study. Due to the screening population aged from 40 to 69 years old and screening extend based on the community health services centres, those community doctors can provide the accurate judgement and guideline for residents’ risk of chronic disease who live this community. It is the reason why the high-risk risk rate detected by screening is high. 4.3 Clinical Screening Rate Chi-square test has found the clinical screening rate of 7 chronic diseases, and the difference has statistical significance in this study (χ2 = 1547.688, P = 0.000). The clinical screening rate of liver cancer was the highest, followed by lung cancer. The clinical screening rate for colorectal cancer is relatively low. It is painless and noninvasive for testing liver cancer and lung cancer by abdominal color Doppler ultrasound and CT. However, it is an invasive examination, complexed preparation, and painful progress for testing colorectal cancer through colonoscopy. Based on this situation, it is regular that individuals cause resistance and hinder compliance from screening [6].
560
L. Xiao et al.
4.4 Prospect of AI Plus Healthcare For the traditional approach for chronic diseases, the high-risk population is assessed based on the high-risk factors questionnaire and risk assessment system and sent to the hospital for clinical screening. This approach requires experienced clinicians to analyze lung CT images, gastroenterology, abdominal ultrasonography and other images. However, the drawbacks of this approach are heavy workload, long time consuming and intense subjectivity. By contrast, there are significant advantages in the artificial intelligence model in the field of image analysis. For instance, it has fast processing speed, high accuracy and strong stability. Meanwhile, there are plenty of conditions that have verified those beneficial factors [7]. In detail, AI-assisted cancer screening will help doctors to check thousands of lung CT and endoscopic images. It will make up for the shortage of human resources in prediction and judgement effectively and solve the diagnosis problems related to patients’ missed and delay. As shown in Table 4, there is an increasing trend for applying artificial intelligence in the field of cancer screening. Table 4. Artificial intelligence techniques for cancer prediction Author
Algorithm
Type of cancer
Data type
Results
Hsu [8]
ANN (Artificial Neural Network)
Lung cancer
Sex, age, pattern of nodules, size of partially solid nodules, ground-glass nodules
Sensitivity 75.0% Specificity 85.0% AUC 0.87
Stéphane [9]
ANN
Lung cancer
Diameter and radiological aspect of lung nodules
Sensitivity 90% Positive predictive value 0.95
Shi [10]
ANN
Lung cancer
CT images of lung cancer contained 3 texture features and 10 fractal features
Sensitivity 90.9% Specificity 100% Accuracy 95.1%
Moritz [11]
ANN
lung cancer
3936 lung positron emission tomography images
Sensitivity 95.9% Specificity 98.1%
Pang [12]
CNN (Convolutional Neural Network)
Lung cancer
2219 lung CT images
Accuracy 86.84% (continued)
The Analysis and AI Prospect
561
Table 4. (continued) Author
Algorithm
Type of cancer
Data type
Results
Gustavo [13]
CNN
Lung cancer
Size, shape, and AUC 0.913 three dimensional volume of lung nodules
Wu [14]
CNN
cataract
37638 cataract slit lamp photographs
AUC 0.996
Wu [15]
Multivariate Logistic regression
Colorectal cancer
Age, gender, constipation, blood stool, fecal occult blood test positive
Sensitivity 60.2% Specificity 61.9% AUC 0.67
Huang [16]
Decision tree
cardio-cerebrovascular
Occupation, hypertension, regular exercise, hypertension family history
Sensitivity 74.8% Specificity 74.2% Accuracy 74.5% AUC 0.822
Based on the algorithms’ analysis in the field of cancer prediction in Table 4, it will be common to predict and analyze the data through the artificial neural network cancer prediction model in the future. The content of data includes gender, age, family, history of cancer, smoking history, liver cirrhosis, hypertension, stroke history, poor eating habits. Those types of data will become effective for predicting whether the patients have major chronic diseases. The achievement of a cancer prediction model with high accuracy requires verifying, optimizing and improving the model through cross-validation. When early diagnosis and treatment goals have been achieved, it is functional to reduce the cost of cancer treatment and improve the early diagnosis and treatment rate and survival rate of patients.
References 1. Xiong, Z.: The challenges and countermeasures of prevention and control of chronic diseases in China. Prev. Control Chronic Dis. China 27(09), 720–721 (2019) 2. Chen, W.: Population aging in China from an international perspective. J. Beijing Univ. 53(06), 82–92 (2016) 3. Wan, S.-P., Yi, F., Wang, Q.-Q.: Current situation and countermeasures of cancer health education in China. Cancer Prev. Treat. 32(11), 955–961 (2019) 4. Xiao, H.-F., Yan, S.-P., Xing, K.: Preliminary analysis of the clinical screening results of the urban cancer early diagnosis and treatment project in Hunan Province from 2012 to 2018. Chin. Cancer 28(11), 807–815 (2019)
562
L. Xiao et al.
5. Gu, X.-Y.: Investigation on the relationship between the prevalence of chronic diseases and risk factors in Zhenjiang. J. Dentiol. 14(3), 208–210 (2019) 6. Yuan, Y.-N., Yang, L., Zhang, X.: Analysis of the screening results of the early diagnosis and treatment of colorectal cancer in Beijing from 2014 to 2017. Chin. Public Health 36(01), 33–35 (2020) 7. Fazal, M.I., Patel, M.E., Tye, J.: The past, present and future role of artificial intelligence in imaging. Eur. J. Radiol. 105, 246–250 (2018) 8. Hsu, Y.-C., Tsai, Y.-H., Weng, H.-H.: Artificial neural networks improve LDCT lung cancer screening: a comparative validation study. BMC Cancer 20(1), 1023 (2020) 9. Chauvie, S., et al.: Artificial intelligence and radiomics enhance the positive predictive value of digital chest tomosynthesis for lung cancer detection within SOS clinical trial. Eur. Radiol. 30(7), 4134–4140 (2020). https://doi.org/10.1007/s00330-020-06783-z 10. Shi, H., Yang, F., Huang, J.-H: An algorithm of lung cancer CT image segmentation based on artificial neural network model. Chin. Med. Equipment 34(10), 86–89+93 (2019) 11. Moritz, S., Daniela, A.F., Urs, J.M.: Automated detection of lung cancer at ultralow dose PET/CT by deep neural networks-Initial results. Lung Cancer 126, 170–173 (2018) 12. Pang, S.-C., Meng, F., Wang, X.: VGG16-T: a novel deep convolutional neural network with boosting to identify pathological type of lung cancer in early stage by CT images. Int. J. Comput. Intell. Syst. 13(1), 771–780 (2020) 13. Perez, G., Arbelaez, P.: Automated lung cancer diagnosis using three-dimensional convolutional neural networks. Med. Biol. Eng. Comput. 58(8), 1803–1815 (2020). https://doi.org/ 10.1007/s11517-020-02197-7 14. Wu, X.-H., Huang, Y.-L., Liu, Z.-Z.: Universal artificial intelligence platform for collaborative management of cataracts. Lancet 103(11), 1553–1560 (2019) 15. Wu, W.-M., Xu, D.-L., Li, X.-Q.: A predictive model for colorectal cancer based on artificial neural network. China Tumor 28(08), 621–628 (2019) 16. Huang, X.-X., Yan, Y.-J., Wei, M.-Q.: Comparison of screening group with high risk of stroke among logistic regression, decision trees and neural networks. Prev. Control Chronic Dis. China 24(6), 412–415 (2016)
Spatio-Temporal Evolution of Chinese Pharmaceutical Manufacturing Industry Based on Spatial Measurement Algorithms Fang Xia, Yanyin Cui, Jinping Liu, and Shuo Zhang(B) School of Health Management, Changchun University of Chinese Medicine, Changchun 130117, China
Abstract. This study is for calculating the total factor productivity of the innovation efficiency of Chinese pharmaceutical manufacturing industry, analyzing its space-time evolution characteristics. In this paper, Data Envelopment AnalysisMalmquist was used to measure the total factor productivity (TFP) of product research and development (R&D) efficiency and achievement transformation efficiency of the pharmaceutical manufacturing industry in 29 provinces of China from 2010 to 2018, the Kernel Density Estimation and Standard Deviation Ellipse Analysis were used to analyze its spatial and temporal characteristics and evolution. Results indicate as this: The efficiency of product R&D and the efficiency of achievement transformation in Chinese pharmaceutical manufacturing industry are on the rise. The trend of polarization of technological innovation efficiency between provinces in Chinese pharmaceutical manufacturing industry has become smaller, and there is a phased imbalance between regions. The elliptical center of gravity of product research and development efficiency and achievement transformation efficiency of Chinese pharmaceutical manufacturing industry are in Henan Province, and the center of gravity evolution direction is “Southwest-Southeast-Northwest” and “Southeast-Northwest-Southeast” respectively. Keywords: Pharmaceutical manufacturing · Efficiency of technological innovation · Distribution of the area · Space-time evolution
1 Introduction Innovation capability has gradually become a new competitive advantage in the industry with the growth of Chinese market economy. Under the background of rapid reconstruction of domestic supply chain and industrial chain, the competition of pharmaceutical manufacturing industry is becoming more and more fierce, and the improvement of technological innovation ability is crucial to the development and extension of pharmaceutical manufacturing industry [1, 2]. Chinese medicine manufacturing sales value increased from 2010 in 1.117 trillion yuan to 3.517 trillion yuan in 2018, and the number of patent applications accepted increased from 4324 in 2010 to 21698 in 2018, Chinese pharmaceutical manufacturing industry is developing at a rapid pace. This has led to a © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 563–575, 2022. https://doi.org/10.1007/978-981-16-6554-7_62
564
F. Xia et al.
growing imbalance in regional development, and the innovation efficiency level of pharmaceutical manufacturing industry is also different among different regions. It is mainly reflected in the characteristics of regional innovation efficiency of pharmaceutical manufacturing industry, the low efficiency of innovation achievements transformation and the lack of core technology [3, 4]. In order to ensure the balanced and high-quality development of Chinese pharmaceutical manufacturing industry, it is particularly important to accurately locate the weak links of innovation development and identify the areas with low innovation efficiency. Existing studies have related industrial innovation technological capability from different perspectives. Firstly, from the early BBC and CCR models, researchers gradually paid attention to the dynamic study of innovation efficiency evaluation, measuring the level of innovation efficiency and its changes [5, 6]. Secondly, from the perspective of regional economy, scholars in this field generally believe that regional location, resources, and economic operating conditions have an important impact on innovation. There are differences in the level of innovation among regions [7, 8]. Thirdly, from the perspective of spatial evolution, by studying the dynamic evolution process of spatial hot spots of innovation efficiency, experts can effectively make up for the lack of spatial research in existing researches [9, 10]. Based on the review of relevant research, this paper explores the spatiotemporal dynamic evolution of scientific and technological innovation efficiency in different stages of Chinese pharmaceutical manufacturing industry from 2010 to 2018 by the means of “national and regional dimensions, time and space dimensions”. First of all, the paper analyzes the efficiency level of the product R&D stage and the achievement transformation stage of Chinese pharmaceutical manufacturing industry to study the trend of time evolution; Secondly, the distribution of technological innovation efficiency in Chinese pharmaceutical manufacturing industry is analyzed by using the kernel density estimation method; Finally, the spatial distribution characteristics and pattern evolution of technological innovation efficiency in Chinese pharmaceutical manufacturing industry were explored by means of standard deviation ellipse analysis.
2 Date and Model 2.1 Theory and Methods Date Envelopment Analysis-Malmquist Model. Date Envelopment Analysis (DEA) is an efficiency evaluation method established in 1978 by Charnes that can evaluate the relative efficiency of a set of decision making units (DMUs) with multiple inputs and multiple outputs by using linear programming technique, but cannot measure the change of the efficiency values in different periods [11]. In 1994, Fare and others combined the Malmquist index theory with the DEA method, and proposed the DEA-Malmquist model [12]. Suppose that (xt , yt ) represents the inputs and outputs of period t, and (xt+1 , yt+1 ) denotes inputs and outputs of period t + 1, Dtc (xt , yt ) and Dtc (xt+1 , yt+1 ) are output functions under the corresponding period of technical conditions, and c can be considered
Spatio-Temporal Evolution
565
as the constant returns to scale (CRS). TFP is the following: TFP = M
t+1
x
t+1
,y
t+1
,x ,y t
t
Dct (xt+1 , yt+1 ) Dct+1 (xt+1 , yt+1 ) = × Dct (xt , yt ) Dct+1 (xt , yt )
21 (1)
When TFP > 1, it indicates an increase in TFP. When TFP < 1, it indicates a decrease in Total factor productivity. When the TFP = 1, it means that the TFP remains the same. The TFP can be broken down into “Effch” and “Tech” with the CRS, as follows: Effch =
Dct+1 (xt+1 , yt+1 ) Dct (xt , yt )
Dct (xt , yt )
Dct (xt+1 , yt+1 )
(2) 21
× Dct+1 (xt , yt ) Dct+1 (xt+1 , yt+1 ) TFP = M t+1 xt+1 , yt+1 , xt , yt = ffch × Tech Tech =
(3) (4)
Kernel Density Estimation. Kernel Density Estimation (KDE) is a Non-parametric Method commonly used to estimate probability density and evaluate variables with dynamically uneven evolution [13]. Suppose §1 ,§2 , §3 ……,§n is n sample points with independent distribution F, and the probability density function f(x) formula is: f (x) =
1 n Kh (xi − x) i=1 n
(5)
Where “h” is bandwidth, it is a smoothing parameter. Kh (.) is the KDE function, the value is non-negative, the integral value is 1, and the average value is 0. The Kh (.) used in this paper is the Epanechnikov. The function is: 3 1 − u2 |u| ≤ 1 K(u) = 4 (6) 0 others Standard Deviation Ellipse Analysis. Standard Deviation Ellipse Analysis is a spatial calculation method that analyzes the spatial distribution of various features in geospatial area by using the center, two axes and axons of the ellipse as parameters, which can accurately show the geospatial distribution and variation characteristics of various features [14]. The Standard Deviation Ellipse parameters generated according to the following linear function: x = xi − xave ; y = yi − yave
tanθ =
n W 2 x2 − n W 2 y2 + i=1 i i i=1 i i
n
(7)
n 2 2 2 + 4 i=1 Wi xi yi − i=1 Wi yi
2
n
2 2 2 i=1 Wi xi yi
n
2 i=1 Wi xi yi
n
n 2 2
x cosθ − W y sinθ W i i i i i=1 i=1 Wi xi sinθ − Wi yi cosθ δx = ; δy =
n
n 2 2 i=1 Wi i=1 Wi
(8)
(9)
566
F. Xia et al.
Where (xare , yare ) denotes the evaluation center coordinates of xi, yi , Wi2 is the regional product development rate and results conversion rate; (§ , y ) is the relative coordinate of each point from the regional center; The corner of the center of gravity distribution pattern can be obtained according to tanθ. δx and δy are standard deviations along the X and Y axes, respectively.
2.2 Constructing Target System The growth of any industry is staged, the technological innovation of pharmaceutical manufacturing industry has the characteristics of long-term, complex, high-risk and discontinuity, and this paper divides the innovation process of pharmaceutical manufacturing industry into two stages of product R&D stage and technological transformation on the basis of previous research [15, 16]. Select input and output variables from people, finance and materials to establish a measurement system for innovation efficiency in pharmaceutical manufacturing industry (see Table 1), which aims to explore the weak links of innovation efficiency in pharmaceutical manufacturing industry. Table 1. Technological innovation efficiency measurement index system. Variable
Product R&D stage
Technological transformation stage
Input variable
Full-time equivalent of R&D personnel
Expenditure on new products development
Expenditure on R&D
Full-time equivalent of R&D personnel
R&D institutions
New products
Expenditure for technical renovation Output variable
Patent applications
Sales revenue of new products
2.3 Date Sources Based on the provinces as the basic decision-making units, this paper selects the relevant data of the pharmaceutical manufacturing industry in the China Statistical Yearbook on High Technology Industry from 2010 to 2018. The regional grouping is based on the China Statistical Yearbook on High Technology Industry, which is grouped into four parts: Beijing, Tianjin, Shanghai, Hebei, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong and Hainan in the eastern region. The central region includes Shanxi, Anhui, Jiangxi, Henan, Hubei and Hunan. The western region includes Inner Mongolia, Guangxi, Chongqing, Sichuan, Guizhou, Yunnan, Qinghai, Shaanxi, Ningxia, Tibet, Gansu and Xinjiang. The northeast region includes Liaoning, Jilin and Heilongjiang. In this paper, Tibet, Qinghai and Hong Kong, Macao and Taiwan, where statistics are missing or abnormal, are excluded, and the data of 29 provinces are analyzed.
Spatio-Temporal Evolution
567
3 Results and Discussions 3.1 Analysis of the Changing Trend of Innovation Efficiency in Chinese Pharmaceutical Manufacturing Industry The Malmquist Index dynamically reflects changes in the efficiency of technological innovation in the pharmaceutical manufacturing industry. The product R&D efficiency and technological transformation efficiency of 29 provinces and four sub-districts in China in 2010–2018 can be calculated, and the following results can be obtained by comparing the changes: The overall efficiency of product R&D in Chinese pharmaceutical manufacturing industry increased in 2010–2018 (Fig. 1 and Fig. 3). Specifically, the trend of change in the western region is consistent with the national average. Except for 2010–2012, TFP in the central region is greater than 1, indicating that the product R&D efficiency in the central region is in a steady rise stage. The product R&D efficiency in the eastern and northeastern regions has increased considerably, and the TFP in 2016–2018 is much larger than the national average, indicating that the product R&D efficiency in the eastern and northeastern regions has been developing well in recent years. Among them, the western region of Gansu, Ningxia, Shaanxi, the central region of Henan, Jiangxi, the eastern region of Beijing, Guangdong, Shanghai, the northeast region of Heilongjiang product R&D efficiency improved. 2.5
TFP
2
National
1.5
Eastern
1
Central Western
0.5
Northeast 0 2010-2012
2012-2014
Year
2014-2016
2016-2018
Fig. 1. Average change in the TFP for the product R&D stage, 2012–2018.
The overall technological transformation efficiency of Chinese pharmaceutical manufacturing industry in 2010–2012 showed an upward trend (Fig. 2 and Fig. 4). Specifically, the trend of change in the western and central regions is basically in line with the national average. The TFP in the eastern and northeastern regions has been greater than 1 since 2012 and is much higher than the national average, indicating that the technological transformation efficiency of pharmaceutical manufacturing achievements in the eastern and northeastern regions has remained at a medium-high level in recent years. Among them, Chongqing in the western region, Inner Mongolia, Hunan in the central region, Beijing, Guangdong, Shandong, Shanghai, Tianjin, Hebei in the eastern region, Jilin in the northeast region, the level of efficiency of the technological transformation is improving well.
F. Xia et al.
TFP
568
1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
National Eastern Central Western Northeast 2010-2012
2012-2014
Year
2014-2016
2016-2018
Fig. 2. Average change in the TFP for the technological transformation stage, 2012–2018.
Fig. 3. Changes in the product R&D efficiency in China, 2012–2018.
3.2 The Geographical Distribution of Innovation Efficiency in Chinese Pharmaceutical Manufacturing Industry Kernel Density Estimation (KDE) is a non-parameter estimation method that can estimate the probability density of random variables and obtain the distribution pattern of random variables. This paper analyzes the distribution situation and dynamic evolution process of the product R&D efficiency and technological transformation efficiency of Chinese pharmaceutical manufacturing products by using KDE.
Spatio-Temporal Evolution
569
Fig. 4. Changes in the technological transformation efficiency in China, 2012–2018.
Figure 5 plots the trend of KDE of the product R&D efficiency in all regions of Chinese pharmaceutical manufacturing industry in 2012–2018. The corresponding peak of the to the KDE curve of product R&D efficiency rises and then decreases during observation, and the width of the main front narrows and then widens. This shows that the level of product R&D efficiency in the domestic pharmaceutical manufacturing industry is gradually improved and then reduced, and the differences in product R&D efficiency in various provinces and cities are gradually narrowed and then expanded. The peak of KDE in the west and east has experienced “up-down” fluctuations, while the central and north-eastern regions have experienced “up-down-up” fluctuations, and the width of the main peak will be narrower in 2018 than in 2012. The eastern region as a whole shows the trend of decreasing the main peak, the width of the main peak becoming smaller, the northeast, west and central regions showing the trend of the main peak rising, and the width of the main peak becoming smaller. This indicates that the trend of product R&D efficiency of pharmaceutical manufacturing industry in northeast, western and central regions has experienced certain fluctuations, the overall research and development efficiency is improved, the level of product R&D in the eastern region has reached a bottleneck period, the product R&D efficiency has decreased, and the polarization trend of product R&D level in various regions has decreased. From 2010 to 2018, the KDE curve peak of product R&D efficiency in the eastern region of china showed the trend of “Twin Peaks” to “Single Peak”, which indicates that the polarization of product R&D efficiency in the eastern region is gradually decreasing. The KDE curve peak of in the central region is presented as the development trend of “single peak” and “twin peaks”, indicating that the level of research and development efficiency of pharmaceutical manufacturing industry in the central region is gradually polarizing. The KDE curve of product R&D efficiency in northeast and western regions is single peak during the observation period, which shows that there is no two-tier
570
F. Xia et al.
differentiation in the level of product R&D efficiency of pharmaceutical manufacturing industry in northeast and western China.
Fig. 5. KDE for the product R&D efficiency in China, 2012–2018.
Figure 6 shows the trend of KDE of the technological transformation efficiency in all regions of Chinese pharmaceutical manufacturing industry in 2012–2018, and the peak corresponding to the KDE curve of the technological transformation efficiency of pharmaceutical manufacturing industry in China showed a trend of rising and then decreasing slightly during the observation period, and the width of the peaks continued to decrease. This shows that the technological transformation efficiency level of domestic pharmaceutical manufacturing industry shows a slight decrease after gradual improvement, and the trend of polarization between provinces becomes smaller. The change of KDE peak and peak width in the western region is basically in line with the overall level of the whole country. The KDE density in both the central and western regions experienced “up-down” fluctuations. In the northeast region, except for the main peak of the KDE curve increased significantly in 2012–2014, and the width of the peaks narrowed significantly, the rest of the time were in a large wave width and low peak state. This shows that the technological transformation efficiency level of pharmaceutical manufacturing industry in Northeast China is very unstable. The KDE curve of the technological transformation efficiency of the pharmaceutical manufacturing industry in the central region extended to the right in 2012–2014, the Twin Peaks in 2014–2016, and the right tail of the distribution curve in 2016–2018 contracted to the left and the left tail extended. This means that the technological transformation efficiency of the pharmaceutical manufacturing industry in the central region was polarized in 2014–2016 and eased by adjustment in 2016–2018. From 2010 to 2018,
Spatio-Temporal Evolution
571
the KDE curve peak of the technological transformation efficiency in the western region became “single peak” into “twin peaks” overall, which indicates that the level of technological transformation efficiency of pharmaceutical manufacturing in the western region is gradually polarizing. The KDE curve of the technological transformation efficiency in the eastern region is single peak, which shows that there is no polarization in the technological transformation efficiency level of the pharmaceutical manufacturing industry in the eastern region.
Fig. 6. KDE for the technological transformation efficiency in China, 2012–2018.
3.3 The Spatial Evolution of Technological Innovation Efficiency in Chinese Pharmaceutical Manufacturing Industry In order to intuitively explore the technological innovation efficiency spatial amassing characteristics and spatial balance of Chinese pharmaceutical manufacturing industry, this Standard Deviation Elliptical method respectively plots the 2010–2018 product R&D efficiency and technological transformation efficiency of Chinese pharmaceutical manufacturing industry elliptic distribution and center of gravity movement. Combined with Fig. 7 and Table 2, from the point of view of center of gravity distribution, the elliptical center of gravity for the product R&D efficiency in Chinese pharmaceutical manufacturing industry in 2010–2018 is located in the northwest of Henan Province, and moves between 33°73 N–34°27 N, 111°5 E–113°24 E. The deflection direction of the elliptical center of gravity during the observation period is “SouthwestSoutheast-Northwest”, the axonal angle changes between 61.17 and 97.57°, and the center of gravity shift distance increases first and then decreases. In terms of the distribution shape of the ellipse, the Short semi-axis/long semi-axis index fluctuated from
572
F. Xia et al.
0.75 in 2010–2012 to 0.80 in 2016–2018, and the shape is getting closer and closer to the positive circle. This shows that the spatial distribution of product R&D efficiency tends to be balanced, and the interval difference decreases overall, but the differences become larger between 2010–2012 and 2012–2014. From the change of the area of the standard deviation ellipse, the change trend of the standard deviation elliptical area in 2010–2018 is 1.23 in 2016–2018 and 2010–2012, which indicating that the regional pull effect and radiation range of the product R&D efficiency of Chinese pharmaceutical manufacturing industry are increasing.
Fig. 7. Standard deviation ellipse in the product R&D efficiency, 2012–2018.
Table 2. Standard deviation elliptical parameters for the product R&D efficiency. Year
Shape length
Shape area
Center coordinate longitude/(◦ )
Center coordinate latitude/(◦ )
Azimuth/(◦ )
Short semi-axis/long semi-axis
2010–2012
58.75
266.26
112.18
34.27
61.17
0.75
2012–2014
76.84
449.49
111.54
33.73
97.57
0.71
2014–2016
65.26
332.53
113.24
33.59
77.53
0.80
2016–2018
64.82
328.48
112.72
33.92
74.97
0.80
According to Fig. 8 and Table 3, it can be seen that the elliptical center of gravity of the technological transformation efficiency of Chinese pharmaceutical manufacturing industry in 2010–2018 is located in the northern part of Henan Province, and moves between 33 13 N–33 259 N, 112 95 E–113 63 E. The deflection direction of the elliptical center of gravity during the observation period is “Southeast-Northwest-Southeast”, the axonal angle changes between 55.50 and 68.69°, and the distance of center of gravity shift gradually increases. Judging from the distribution shape of the standard deviation
Spatio-Temporal Evolution
573
ellipse, the Short semi/long semi-axis index decreased from 0.87 fluctuations in 2010– 2012 to 0.76 in 2016–2018, and the shape gradually deviated from the positive circle, indicating that the spatial distribution of technological transformation efficiency in the pharmaceutical manufacturing industry tends to polarize. The change trend of standard deviation elliptical area in 2010–2018 was 1.09 in 2016–2018 and 2010–2012, indicating that the regional influence range of the technological transformation efficiency of Chinese pharmaceutical manufacturing industry is increasing.
Fig. 8. Standard deviation ellipse in the technological transformation efficiency, 2012–2018.
Table 3. Standard deviation elliptical parameters for the technological transformation efficiency. Year
Shape length
Shape area
Center coordinate longitude/(◦ )
Center coordinate latitude/(◦ )
Azimuth/(◦ )
Short semi-axis/long semi-axis
2010–2012
60.47
288.93
112.95
35.13
66.64
0.87
2012–2014
62.53
302.54
113.63
33.54
57.41
0.76
2014–2016
70.19
383.83
113.04
33.59
68.69
0.79
2016–2018
63.88
315.48
113.07
33.37
55.50
0.76
4 Conclusion The paper, which analyzes the data of Chinese pharmaceutical manufacturing industry from 2010 to 2018 from the two dimensions, of which is “national and region, time and space”, finds (1) the efficiency of technological innovation in Chinese pharmaceutical manufacturing industry is on the rise overall, and the product R&D efficiency and the technological transformation efficiency are improved. However, the trend of polarization
574
F. Xia et al.
of innovation efficiency between provinces in Chinese pharmaceutical manufacturing industry has become smaller, and there is a phased imbalance between regions. According to the theory of regional balanced development [17], this phenomenon may be related to the volatility and phased regularity characteristics of economic development among Chinese regions; (2) the elliptical center of gravity of product R&D efficiency and technological transformation efficiency in Chinese pharmaceutical manufacturing industry is located in Henan Province, where the deflection direction of product R&D efficiency elliptical center of gravity during the observation period is “Southwest-SoutheastNorthwest”, and the elliptical center of gravity of technological transformation efficiency is “Southeast-Northwest-Southeast” during the observation period. This paper only describes the trend of time change, spatial distribution and space-time transfer of technological innovation efficiency in Chinese pharmaceutical manufacturing industry, and the follow-up research can introduce more geographical factors or related variables to explore the spatial pattern trend of technological innovation efficiency and analyze its driving mechanism. In addition, it is also an important research problem to compare the different regions systematically, analyze and reveal the changing trend of innovation efficiency pattern, space aggregation and spatial differences in different regions.
References 1. Li, W., Wang, S., Chen, Y.: Relationship between regional level of pharmaceutical economic development and R&D capital investment in pharmaceutical manufacturing industry in China. Chin. J. New Drugs 30(1), 6–12 (2021) 2. Hussaim, A.B., Nurul, W.A.L.: The impact of technological innovation and governance institution quality on Malaysia’s sustainable growth: evidence from a dynamic relationship. Technol. Soc. 54, 27–40 (2018) 3. Wu, L., Wang, X., Yin, X.: A comparative analysis of the service supply of knowledgeintensive business services and the development of pharmaceutical industry in China and America. Forum Sci. Technol. China 33(5), 180–185 (2017) 4. Gu, S., Wu, H., Wu, Q., et al.: Innovation-driven and core technology breakthroughs are the cornerstones of high-quality development. China Soft Sci. 33(10), 9–18 (2018) 5. You, T., Chen, X., Holder, M.: Efficiency and its determinants in pharmaceutical industries ownership: R&D and scale economy. Appl. Econ. 17, 2217–2241 (2010) 6. Xia, M., He, Q., Jiang, L.: Evaluation of technological innovation ability in pharmaceutical manufacturing industry. Stat. Decis. 36(18), 175–179 (2020) 7. Xu, W., Fang, L.: An empirical study on influence of high-tech development zones innovation on regional economic growth in East China. Econ. Geogr. 35(2), 30–36 (2015) 8. MacKinnon, D., Cumbers, A., Chapman, K.: Learning, innovation and regional development: a critical appraisal of recent debates. Prog. Hum. Geogr. 3, 293–311 (2002) 9. Liu, Y., Ouyang, Y.: The space-time characteristics and dynamic evolution of innovation efficiency of Chinese new generation information technology industry. J. Hunan Univ. (Soc. Sci.) 34(5), 52–61 (2020) 10. Zhu, X.H., Li, Y., Zhang, P.F., et al.: Temporal-spatial characteristics of urban land use efficiency of Chinese 35 mega cities based on DEA: decomposing technology and scale efficiency. Land Use Policy 88, 1–13 (2019) 11. Charnes, A., Cooper, W.W., Rhodes, E.: Measuring the efficiency of decision making units. Eur. J. Oper. Res. 2(6), 429–444 (1978)
Spatio-Temporal Evolution
575
12. Fare, R., Grosskopf, S., Norris, M.: Productivity growth, technical progress, and efficiency change in industrialized countries. Am. Econ. Rev. 84(5), 1040–1044 (1994) 13. Parzen, E.: On estimation of a probability density function and model. Ann. Math. Stat. 33(3), 1065–1076 (1962) 14. Lefever, D.W.: Measuring geographic concentration by means of the standard deviational ellipse. Am. J. Sociol. 1, 88–94 (1926) 15. Meng, W., Li, C., Shi, X.: The innovation efficiency of Chinese high-tech industry is analyzed in stages – based on the three-stage DEA model. Macroeconomics 41(2), 78–91 (2019) 16. Dong, H., Zhang, R.: The spatial pattern and evolution characteristics of innovation efficiency in Chinese high-tech industry. Stat. Decis. 37(3), 106–111 (2021) 17. Yuan, F., Chen, Z.: New urbanization towards balanced development: an analysis of coupling coordination model of population-land-finance. J. Central China Normal Univ. (Hum. Soc. Sci.) 57(3), 1–16 (2018)
Evaluating the Spatial Aggregation and Influencing Factors of Chinese Medicine Human Resources in China: A Spatial Econometric Approach Fang Xia, Jinping Liu, Yanyin Cui, and Hongjuan Wen(B) School of Health Management, Changchun University of Chinese Medicine, Changchun 130117, China
Abstract. Based on the data of the National Statistical Yearbook from 2012 to 2019, the spatial pattern and changes of human resources in Chinese Traditional Medicine (TCM) in China were measured to the main factors affecting the spatial distribution of human resources in TCM by means of agglomeration degree analysis and spatial autocorrelation analysis, as well as the spatial econometric model. Results indicate as this: (1) The spatial distribution of human resources of TCM in China is not balanced, and the regional differences are significant. (2) TCM human resources have significant spatial autocorrelation and agglomeration characteristics, and the spatial aggregation patterns are the type of “low-high” and “low-low” mainly. (3) Per capita health care expenditure is the most important economic environment factor, and the number of consultations in TCM-type health institutions is the most critical health care factor. Keywords: Chinese medicine human resources · Agglomeration · Spatial measurement
1 Introduction The reform of China’s medical and health system has brought unprecedented opportunities and challenges to the development of Traditional Chinese medicine and health care. Traditional Chinese medicine (TCM) human resources play a central role in the overall situation of safeguarding the health of residents and promoting the development of TCM, which directly affects the quality and effectiveness of TCM services [1]. With the rapid development of Chinese economy, the distribution of TCM human resources presents regional unbalanced spatial agglomeration. The imbalanced distribution of TCM human resources will lead to the unfairness of regional health services, meanwhile it will have a negative effect on local integration. Most of the literature in the field of public health in China for TCM human resources focus on the description of the current situation, equity analysis and planning evaluation. For example, Xiang Nan [2] used agglomeration and Thiel index to analyze the allocation level of TCM human health resources; Xu Yue [3] used descriptive analysis and gray © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 576–586, 2022. https://doi.org/10.1007/978-981-16-6554-7_63
Evaluating the Spatial Aggregation and Influencing Factors
577
model method for data analysis of the current situation of TCM human resources in China and predicted the number of TCM human resources in China; Lu Xiufang [1] used a combination of HRDI, Lorenz curve, Gini coefficient and Thayer index to measure the equity of TCM human resource allocation in terms of population and geography, respectively. Spatial analysis can further explore the causes of uneven resource allocation on this basis. Based on the panel data of 31 Chinese provinces (excluding Hong Kong, Macao and Taiwan) from 2012 to 2019, this paper analyzes the spatial aggregation pattern of human resources of TCM in China by using agglomeration index and spatial econometric model, explores the internal mechanism affecting the spatial distribution of human resources of TCM, and provides data support for relevant government departments to formulate policies.
2 Data and Model 2.1 Theory and Methods Health Resources Agglomeration Degree. Health resources agglomeration degree (HRAD) is the introduction of the concept of agglomeration from the field of economics into health resource evaluation by scholars in the health field [4], which reflects the concentration of health resources in an area that accounts for 1% of the country’s land area. HRAD is the following: HRADi =
HRi HRn ∗ 100% Ai An ∗ 100%
=
HRi /Ai HRn /An
(1)
Where HRADi is the concentration of TCM human resources in region i, HRi is the number of TCM human resources owned by region i, HRn is the total number of TCM human resources in the country, Ai is the land area of region i, and An is the land area of the country. Spatial Autocorrelation Test. Spatial autocorrelation is used to test for the possible existence of spatially non-random distributions and spatial autocorrelation features, which is as a way to determine the degree of correlation between the observations of a spatial sample and its surrounding [5]. The existence of spatial autocorrelation requires the development of a spatial econometric model. We use Moran I index as an indicator for spatial autocorrelation test, and its formula is as follows: n n i=1 j=1 Wij Yi − Y Yj − Y n n (2) Moran I = S 2 i=1 j=1 Wij In the formula, S 2 = 1n ni=1 Yi − Y , Y = 1n ni=1 Yi , Where Y i is the observation in region i, n is the total number of regions, W ij is the neighboring spatial weight matrix, indicating that any element of it uses the neighboring criterion or distance criterion, and its purpose is to define the mutual neighboring relationship of spatial objects, 1 if two regions are neighboring, and 0 if they are not. Moran’I takes values in the range of [−1,
578
F. Xia et al.
1], if Moran’I > 0, it means that there is positive spatial autocorrelation among the variables; if Moran’I < 0, it means that there is negative spatial autocorrelation among the variables; if Moran’I = 0, it means that there is no spatial autocorrelation between the variables and the variables exhibit random distribution in space. Model Construction. Spatial regression models can be used to analyze the spatial heterogeneity and aggregation of TCM human resource distribution and to identify the spatial distribution of variables and disturbance terms. Spatial Lag Model (SLM) and Spatial Error Model (SEM) are often used to verify the spatial effects of spatial correlation [6]. This paper constructs an OLS (Ordinary Least Square) model, a spatial lag model, and a spatial error model for the spatial clustering of TCM human resources in the following form. Ordinary Least Square (OLS): The OLS model, the linear regression model, has the following general form: y = β0 + β 1 x + ε
(3)
y is the explanatory variable; x is the independent variable; β 0 is the intercept parameter; β 1 is the slope parameter; and ε is the random error vector. Space Lag Model (SLM): The spatial lag model reveals the overflow effect whether the explanatory variables have a diffusion phenomenon or spillover effect in a certain region. Its model expression is: y = ρWy + X β + μ
(4)
W is the spatial weight matrix and ρ is the parameter of the spatial lag term, which measures the degree of spatial interaction between observations. Spatial Error Model (SEM): The spatial dependence of the spatial error model is present in the error perturbation term, which measures the effect of error shocks in neighboring regions about the dependent variable on the observations in the region. Its expression is: y = X β + μ, μ = λWμ + ε
(5)
μ and ε are vectors of error perturbation terms, which obey the assumption of independent identical distribution. λ is the N × 1 order spatial error autocorrelation coefficient, which reflects the impact on the region after the error shock to the dependent variable in the neighboring region, and the parameter β reflects the impact of the independent variable on the dependent variable.
2.2 Variables and Data Sources In this paper, the number of provincial TCM human resources from 2012 to 2019 was selected as the explanatory variable, and GDP per capita, The proportion of financial allocations for medical institutions of Chinese medicine, health care expenditure per capita,
Evaluating the Spatial Aggregation and Influencing Factors
579
the number of consultations in TCM-type medical institutions, the number of beds in TCM-type medical institutions, the number of TCM-type medical and health institutions, the aging rate, and the urbanization rate were selected as explanatory variables for the factors influencing the spatial aggregation of TCM human resources in China The analysis was conducted. The original data were obtained from the China Health and Family Planning Statistical Yearbook, China Statistical Yearbook, and the official data published on the website of the National Bureau of Statistics. Due to the availability and comparability of data, Hong Kong, Macao and Taiwan are not considered in this paper (Table 1). Table 1. Table of variables. Influencing factors
Indicators
Variable name
Variable description
Economic development level
Ln GDP per capita
Ln Pgdp
ln (regional GDP/total population)
The proportion of financial allocations for medical institutions of Chinese medicine
Ln TCMfp
Financial allocation for TCM-type health institutions/Financial allocation for health and health sector
Health care factors
Other social factors
Ln Health care spending Ln PHS per capita Ln Visits to TCM medical institutions
Ln TCMMIV
Ln Number of beds in TCM medical institutions
Ln TCMNB
Ln Number of medical and health institutions of Chinese medicine
Ln TCMNH
Ageing rate
Ar
Population over 65 years old/total population
Urbanization rate
Ur
Number of urban population/total population by province
580
F. Xia et al.
3 Results and Discussions 3.1 General Characteristics of the Spatial Distribution of TCM Human Resources Figure 1 shows the characteristics of the spatial distribution of TCM human resources in four time points: 2012, 2014, 2017 and 2019. Overall, the number of TCM human resources in China shows a more obvious upward trend.
Fig. 1. The spatial distribution characteristics of Chinese medicine personnel in China.
3.2 Analysis of the Spatial Aggregation of Chinese Medicine Human Resources in China Using inter-provincial panel data from 2012–2019 in China, this paper provides a comparative analysis of the agglomeration of TCM human resources in each province, and delves into its agglomeration status and dynamic distribution characteristics. Figures 2 and 3 show the distribution of TCM human resource agglomeration at the provincial level in China in 2012 and 2019. It can be seen that except for nine provinces, namely Heilongjiang, Ningxia, Tibet, Inner Mongolia, Gansu, Yunnan, Jilin, Qinghai, and Xinjiang, the TCM human resource agglomeration index in all other provinces in 2012 and 2019 is greater than 1, indicating that the equity of TCM human resource allocation in the remaining 22 provinces in China is better good. It can be seen that the development of TCM human resources in China is relatively unbalance, with significant
Evaluating the Spatial Aggregation and Influencing Factors
581
Fig. 2. 2012 TCM human resource aggregation in China by Province
characteristics of regional differences. The TCM human resource concentration areas are mainly distributed in the southeast coastal provinces and some provinces in the middle reaches of Yangtze River, while the sparse areas are mainly distributed in the vast western region. The eastern coastal region has formed an obvious TCM human resource agglomeration zone, and the overall pattern shows an obvious “East-West” decreasing agglomeration pattern.
Fig. 3. 2019 TCM human resource aggregation in China by Province
582
F. Xia et al.
Throughout the spatial distribution characteristics and agglomeration level of TCM human resources during the observation period, it can be found that the quantity of TCM human resources in China has improved significantly, but there are large disparities in their agglomeration levels between provinces and regions. Except for some more developed regions where the agglomeration level is higher, the agglomeration in other regions is still at a low level, and even a few regions are still at the beginning stage, and TCM human resources are extremely scarce and it is difficult to reach equilibrium in the short term. 3.3 Spatial Autocorrelation Analysis of TCM Human Resource Agglomeration Level Pharmaceutical Manufacturing Industry By measuring the agglomeration level above, we initially concluded that the development trend of TCM human resources shows certain agglomeration characteristics in space. We used Moran’s test and Moran’s scatter plot to determine whether there is spatial correlation of the explanatory variables. The results of the significance test of Moran’s indices are shown in Table 2. The distribution of Moran’s I index from 2012–2019 was 0.084–0.092, global Moran’s I index values were positive and passed the 95% significance level and P-value test, and the Z-score was between 1.694 and 1.785. The results indicate that there is a very significant positive spatial correlation and a more significant spatial correlation in the distribution of TCM human resources in China, indicating that the aggregation phenomenon of the spatial distribution of TCM human resources agglomeration in China is significant. Table 2. Global Moran’s I test for the agglomeration of Chinese medicine human resources in China. Year
2012
2013
2014
2015
2016
2017
2018
2019
Moran
0.084
0.087
0.090
0.086
0.091
0.092
0.091
0.089
Z
1.694
1.736
1.770
1.713
1.774
1.785
1.764
1.744
P
0.045
0.041
0.038
0.043
0.038
0.037
0.039
0.041
Figure 4 shows the local Moran’s I scatter plots of the Chinese TCM human resource agglomeration index in 2012 and 2019. The spatial correlation of China’s TCM human resource agglomeration level can be further analyzed based on the local Moran’s I scatter plot. As can be seen from Fig. 1, Moran’s I points for all 31 Chinese provinces appear in all four quadrants of the space-rectangular coordinate system, mainly concentrated in the second quadrant (low-high agglomeration) and the third quadrant (low-low agglomeration). The results of the global analysis support that the Chinese TCM human resource agglomeration index is characterized by positive spatial autocorrelation, which means the slope of the fitted line is positive, but most provinces are still dominated by low values of agglomeration. It indicates that more regions themselves and their neighboring
Evaluating the Spatial Aggregation and Influencing Factors
583
regions are in the low value range of TCM human resource agglomeration, and the proportion of such low value agglomeration in the distribution areas is significantly higher than that of high value agglomeration.
Fig. 4. Scatterplot of China’s Chinese Medicine Human Resource Aggregation 2012, 2019 Moran Index
3.4 Analysis of Factors Influencing the Spatial Distribution of TCM Personnel Based on the spatial autocorrelation test above, we further processed OLS, spatial econometric SLM and SEM models for the factors influencing the spatial clustering of TCM human resources. Optimal model selection: Firstly, the LM diagnosis was performed based on the OLS residuals, and the spatial dependence test results found that both LMlag was larger than LM-err and LM-lag was more significant, so the choice of spatial error model was more reasonable (Table 3). Second, the spatial measurement results show that the goodness-of-fit of both the spatial lag model (SLM) and the spatial error model (SEM) improves compared to the OLS regression, indicating that the SLM and SEM models are overall better than the OLS regression model. Finally, according to the LOGL value and AIC criterion, SEM slightly outperformed the SLM model and OLS model, so the SEM model was chosen to explain the spatial agglomeration influencing factors of TCM human resources. The specific results are shown in Table 4. Table 3. Test of spatial dependence of Chinese medicine personnel in China. Spatial autocorrelation diagnosis
MI/DF
Value
Probability
Moran’s I
–
7.231
0.000
LMLAG
1
12.816
0.000
Robust LMLAG
1
2.275
0.006
LMERR
1
42.847
0.000
Robust LMERR
1
30.306
0.000
584
F. Xia et al.
Table 4. Spatial measurement results of factors influencing the aggregation of TCM personnel from 2012–2019. Variables
OLS
Slm
Sem
Ln Pgdp
−0.6212***
−1.1610
−0.8822
TCMfp
0.4706***
0.4009***
0.3699***
Ln PHS
4.7004***
6.0146***
6.3408***
Ln TCMMiv
2.6681***
2.6998***
2.7559***
Ln TCMnh
7.2090***
5.8479***
5.7615***
Ln TCMnb
5.2526***
3.9898***
3.9081***
Ur
−.0139
−0.0164
−0.0139
Ar
0.0437
1.7176
0.9357
R2
0.7642***
0.8089***
0.8069***
−0.1152
Spatial lag (ρ)
−0.4691**
Spatial error (λ) Log Likelihood
−597.1648
−583.9017
−581.8986
AIC
1211.262
1187.803
1183.79
Note: *, **, *** indicates significant at the 10%, 5%, and 1% levels, respectively
The spatial econometric analysis results of the factors influencing the agglomeration of TCM human resources in China are shown in Table 4. First, the economic environment are key factor influencing the aggregation of TCM human resources. The effect of the proportion of financial allocation to TCM-type institutions on the aggregation of TCM human resources is significant, with an effect coefficient of 0.370, indicating that the proportion of financial allocation to TCM-type institutions affects the spatial aggregation choice of TCM human resources. Per capita health care expenditure has a large positive impact on the concentration of TCM human resources in China, driven by market factors and an increase in all types of factors (health care resources, income, systems, policies, etc.) in the population’s demand for health products and services, thus absorbing the supply of TCM human resources [7]. The percentage of financial allocation to TCM-type institutions and per capita health care expenditure reflect the importance of TCM in the region [8], and the more the government attaches importance to TCM the more favorable TCM human resources work. Secondly, health care are the guarantee factors influencing TCM human resource agglomeration. The number of TCM-type health institutions, the number of consultations in TCM-type health institutions, and the number of beds in TCM-type health institutions all have significant positive effects on TCM human resource agglomeration. The OLS model and the spatial econometric model were found to have positive coefficients. The number of TCM health institutions, the number of consultations in TCM health institutions, and the number of beds in TCM health institutions represent the level of urban medical care in the region, which has more TCM health resources and is attractive to both local and neighboring provinces’ TCM human resources [9]. Quality
Evaluating the Spatial Aggregation and Influencing Factors
585
TCM health resources provide talent support role for the development of TCM career and enhance the talent aggregation capacity of the region.
4 Conclusion Based on the spatial econometric model, this paper makes an empirical analysis on the agglomeration and difference of human resources in Chinese traditional Chinese medicine. The conclusions are as follows: The overall human resources of TCM in China show a relatively obvious upward trend, but the spatial distribution shows a large imbalance. There are significant regional differences. The eastern region has a high degree of TCM human resources concentration, while the central and western regions have a low degree of TCM human resources concentration. The global autocorrelation results show that Moran’s index presents spatial positive autocorrelation, indicating that there is spatial clustering and spatial dependence of Chinese TCM human resources, and the local autocorrelation results show that the spatial agglomeration pattern is dominated by low-high type and low-low type, and the proportion of distribution areas with low value agglomeration is significantly higher than that of high-value agglomeration areas. From the spatial regression results of SEM, we know that the spatial agglomeration of TCM human resources is driven by exogenous factors of economic environment and endogenous factors of health care, while other social factors show little affect on TCM human resources. In terms of economic externalities, per capita health care expenditure presents the main influence in variables related to TCM human resources, and in terms of intrinsic health care drivers, the number of visits to TCM-type health institutions has the greatest influence on the agglomeration of TCM human resources, and the number of TCM-type health institutions and beds is weaker, indicating that the agglomeration of TCM human resources is mainly influenced by the demand for health care with the ability to pay. This paper analyzes the TCM human resources agglomeration status and its influencing factors using spatial econometric model, this spatial econometric analysis only considers the provincial quantity of TCM human resources, while the data at the local and municipal levels, as well as the micro survey data of TCM human resources, are important supports needed for in-depth research on the changes of TCM human resources spatial agglomeration, and future research should refine the research units for further optimization of TCM human resources Future research should refine the research units to provide theoretical support to further optimize the spatial layout of TCM human resources.
References 1. Lu, X., Liu, C., Li, C., et al.: A study on the equity of TCM staffing in China: based on Gini coefficient and Thiel index. Chinese Health Econ. 36(10), 46–50 (2017) 2. Xiang, N., Xu, R., Yang, Y., et al.: Aggregation degree and Thiel index of equity of human resource allocation in Chinese medicine. Educ. Modern. 6(39), 185–186 (2019)
586
F. Xia et al.
3. Xu, R., Hong, B., Li, Z., et al.: Analysis of the current situation and development forecast of Chinese medicine human resources nationwide. Chin. J. Tradit. Chin. Med. Inf. 25(06), 1–5 (2018) 4. Yuan, S., Gu, F., Liu, W., et al.: A methodological exploration of using agglomeration to evaluate the equity of health resource allocation. China Hosp. Manage. 35(02), 3–5 (2015) 5. Wei, N., Yu, C., Bao, J., et al.: Analysis of spatial aggregation of total health costs per capita and its influencing factors in China. Chin. Health Serv. Manage. 33(03), 190–192 (2016) 6. Ahmad, M., Jabeen, G.: Dynamic causality among urban agglomeration, electricity consumption, construction industry, and economic performance: generalized method of moments approach. Environ. Sci. Pollut. Res. 27(2), 2374–2385 (2019). https://doi.org/10.1007/s11356019-06905-1 7. Yang, Z., Ding, Q.Y., Wang, Y.: Spatial and temporal differences in health expenditures and environmental technology elasticity of Chinese residents. J. Central China Normal Univ. (Nat. Sci.) 51(02), 247–252 (2017) 8. Song, C., Xu, A., Wang, D.: Research on human resource demand forecasting of Chinese medicine hospitals based on gray GM(1,1) model. China Res. Hosp. 5(06), 1–5 (2018) 9. Yu, Z., Zhao, L., Xu, R., et al.: Current situation and efficiency of Chinese medicine and health resources allocation in Beijing, Tianjin and Hebei. Chin. Health Resour. 24(01), 59–61+70 (2021)
Spatial Distribution of Human Resources Allocation Level of Chinese Traditional Medicine Jinping Liu, Fang Xia, Yanyin Cui, Ziying Xu, and Hongjuan Wen(B) School of Health Management, Changchun University of Chinese Medicine, Changchun 130117, China
Abstract. Objective. The purpose of the study is to explore the spatial distribution pattern and evolution characteristics of Chinese Traditional medicine human resources in order to provide a scientific basis for the construction of TCM talents. Methods. Spatial auto-correlation and relative development rate were used to measure the spatial distribution pattern and change characteristics of human resources in TCM. Results. The number of TCM human resources in China increased year by year from 2012 to 2019, but the ratio of TCM physicians to TCM pharmacists was unbalanced. Global Moran’s I showed that the human resources of TCM in China presented spatial auto-correlation, and the Moran index of all indicators showed a decline, and then an increasing trend. Local Moran’s I showed that the characteristics of TCM human resource aggregation were obvious, and the aggregation mode changed greatly. Conclusion. The distribution of human resources in TCM of China is unevenly distributed, and there is obvious regional agglomeration in some provinces. Improving the rationality of the distribution of human resources in Chinese medicine in China can promote balanced development among regions. Keywords: Human resources of Chinese medicine · Configuration level · Space layout
1 Introduction The Healthy China 2030 Plan has led to the hope that it is necessary to give full development to the unique advantages of Chinese medicine, improving the service capacity of Chinese medicine, developing Chinese medicine health care and treatment services in order to further development of Chinese medicine inheritance and innovation [1]. As the first resource of TCM inheritance and innovation, the quantity, structure and distribution of traditional Chinese medicine (TCM) human resources directly affect the development of TCM industry [2]. Although the quality structure of Chinese medicine human resources has improved, there are differences in the distribution of Chinese medicine human resources between regions, with “insufficiency” and “surplus” situation [3–5]. The long-term existence of this imbalance has seriously affected the overall supply efficiency of Chinese medicine human resources. The research on human resources of traditional Chinese medicine in the existing literature mostly focuses on the current situation and fairness of the allocation of human resources of traditional Chinese medicine in © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 587–594, 2022. https://doi.org/10.1007/978-981-16-6554-7_64
588
J. Liu et al.
a certain region of China [6–8]. There is still a lack of research on the spatial distribution pattern and evolution of human resources of traditional Chinese medicine in the country. The research, the spatial agglomeration and distribution characteristics of human resources in traditional Chinese medicine, are important clues to analyze the uneven development of regions. This study measures the spatial agglomeration and flow of Chinese traditional medicine human resources from the perspective of space, finds out the reason mechanism of its agglomeration or diffusion, and grasps the overall development trend of Chinese traditional medicine human resources from the macro perspective, so as to provide theoretical basis for reducing the spatial imbalance of Chinese traditional medicine human resources allocation.
2 Data and Model 2.1 Theory and Methods Spatial Autocorrelation. This paper performs global and local spatial auto-correlation analysis by Geoda software. The purpose is to test the spatial concentration and spatial distribution pattern of human resources in traditional Chinese medicine in China. Global spatial auto-correlation uses the Global Moran’s I index to measure the spatial distribution and concentration of all Chinese medicine human resources in the research space. Moran’s I ranges from −1 to 1, Moran’s I > 0 is a positive spatial auto-correlation, and Moran’s I < 0 is a negative spatial auto-correlation. The larger the absolute value, the stronger the correlation. If Moran’s I = 0, it means random distribution and independent of each other in space. Calculated as follows: N N N i=1 j=1 Wij Xit − X t Xjt − X t (1) I= 2 N N i=1 j=1 Wij Xit − X t In the formula, W ij is the spatial weight between i and j, X it , X jt are the i-th and j-th observations in year t, X t is the mean value of X it in year t. Local spatial auto-correlation uses the Local Moran’s I index to detect the similarity or correlation between TCM human resources in each province and its neighbors. I > 0 means that the research location and its neighborhood have high (low) values; I < 0 means that the high (low) value area is surrounded by low (high) values. Calculated as follows: N Xit − X t j=1 Wij Xjt Xij − X t (2) I= 2 N i=1 Xit − X t In the formula, W ij , X it , X jt , X t have the same meaning as the above formula.
Spatial Distribution of Human Resources Allocation Level
589
2.2 Data Sources This paper takes Chinese 31 provinces, municipalities, and autonomous regions (excluding Hong Kong Special Administrative Region, Macau Special Administrative Region and Taiwan) as the analysis unit. The human resources data of Chinese medicine practitioners (assistant) physicians, trainee Chinese pharmacists, and Chinese pharmacists (personnel) are from the 2012–2019 China Health and Family Planning Statistical Yearbook. The population numbers of the provinces, municipalities, and districts are from the 2012–2019 National Statistical Yearbook.
3 Results and Discussions 3.1 The Time Trend of Human Resources of Chinese Medicine in China The human resources of Chinese medicine showed a trend of increasing yearly for every 10,000 people in China from 2012 to 2019. The number of licensed (assistant) doctors in Chinese medicine increased from 368,264 to 624,783, with an average annual growth rate of 7.84%; The average growth rate was 2.96%, while the stock of Chinese pharmacists (personnel) was small and the growth rate was the slowest, with an average annual growth rate of only 2.41%. It can be seen that practicing (assistant) physicians in the category of Chinese medicine are growing rapidly, This is consistent with the status quo of “emphasis on medicine but not medicine” for a long time in China (Table 1). Table 1. 2012–2019 Chinese TCM human resources time change trend Years Population Chinese medicine Trainee Chinese practitioners (assistant) Physician People
Chinese pharmacist
People/10,000 People People/10,000 People population population
People/10,000 population
2012 135404
368264 2.72
12473 0.09
107630 0.79
2013 136072
398284 2.93
13992 0.10
110243 0.81
2014 136782
418573 3.06
14686 0.11
111991 0.82
2015 137462
452190 3.29
14412 0.10
113820 0.83
2016 138271
481590 3.48
11482 0.08
116622 0.84
2017 139008
527037 3.79
16218 0.12
120302 1.45
2018 139653
575454 4.12
15570 0.11
123913 0.89
2019 140385
624783 4.45
15302 0.11
127154 0.91
3.2 The Spatial Distribution of Human Resources in Traditional Chinese Medicine in China The results of the global spatial auto-correlation analysis show (Fig. 1) that the Global Moran’s I corresponding to the three types of Chinese medicine human resources are
590
J. Liu et al.
all positive, and all the index values are significant at the 1% level, indicating that since 2012 Medical human resources have a certain degree of relevance at the provincial level, The concrete manifestation is the spatial agglomeration of different degrees of high or low value in space. The Global Moran’s I index of Chinese medicine practitioners (assistants) and Chinese pharmacists (persons) both showed a trend of gradual decrease and then increase. The Moran’s I index of the former is higher than that of the latter, indicating that the initial spatial agglomeration of the former is higher than that of the latter. The trend of spatial agglomeration of trainees showed a trend of first rising, then decreasing, and then tending to rise after short-term dynamic fluctuations. In 2013–2016, the spatial agglomeration showed a gradually weakening pattern evolution. After 2013, the spatial concentration The trend of agglomeration is gradually increasing. From the perspective of time series evolution, although Global Moran’s I, the human resource of Chinese medicine in China, fluctuates, it is showing an overall upward trend.
0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2012
2013
2014
2015
2016
2017
2018
2019
prac cing (assistant) physicians in tradi onal Chinese medicine trainee TCM physicians Chinese pharmacists (persons)
Fig. 1. The global Moran’s I index and change trend of Chinese various TCM human resources from 2012 to 2019.
3.3 Local Spatial Auto-correlation of Human Resources in Chinese Medicine This paper defines the spatial weight matrix with the Queen’s first-order adjacency matrix, and sets Hainan and Guangdong adjacent to each other [9–12], The local spatial auto-correlation analysis of human resources in traditional Chinese medicine in 2012, 2015, and 2019 was carried out, and the LISA diagram was obtained (Fig. 2), the purpose is to further analyze the spatial difference of human resources of traditional Chinese medicine in the province, the trend analysis is carried out by using the local spatial auto-correlation index to study the spatial distribution pattern of the province and
Spatial Distribution of Human Resources Allocation Level
Practicing(Assistant) Physician in Traditional Chinese Medicine with a population of 10,000
Trainee Chinese Physician with a population of 10,000
Chinese pharmacist(person) with a population of 10,000 Fig. 2. Chinese TCM human resources LISA agglomeration map
591
592
J. Liu et al.
its agglomeration evolution and transition characteristics. From the perspective of the evolution of the provincial temporal and spatial pattern, the agglomeration and development of various traditional Chinese medicine human resources have significant spatial differentiation characteristics. The spatial agglomeration mode is mainly high-high type and low-low type, and the characteristics of dynamic evolution over time are obvious. ➀ From the perspective of practicing (assistant) physicians in traditional Chinese medicine with a population of 10,000. The local spatial autocorrelation has not changed significantly in the last decade, as shown by the “high-high” development pattern in Tianjin, Beijing and Inner Mongolia. ➁ From the perspective of ten thousand population of trainee TCM physicians, the provinces with the “high-high” development model are mainly concentrated in some provinces in the southwest, including Yunnan, Guizhou, Guangxi. The provinces with the “low-low” agglomeration pattern are mainly distributed in some provinces in East China, mainly including Anhui and Shanghai. ➂ From the perspective of Chinese pharmacists (persons) with a population of 10,000, the performance is low-low agglomeration in Tibet, low-high agglomeration in Shanghai, high-high agglomeration in Henan and Zhejiang, and high-high agglomeration in Shandong after 2015. On the whole, from 2012 to 2019, only a few provinces have obvious spatial agglomeration characteristics, accounting for about 10%, and most spatial unit agglomeration characteristics are not obvious.
4 Conclusion From 2012 to 2019, the number of Chinese medicine human resources is increasing year by year. Although the supply level has been improved, there is still a certain gap with the development goal that the total number of Chinese medicine professional and technical human resources will reach 893300 by 2020 proposed in 13th Five Year Plan for the development of Chinese medicine talents [2]. From the overall growth, the growth rate of the licensed (Assistant) doctors of traditional Chinese medicine is 3.04 times that of the Chinese pharmacists. According to the national organization and staffing standards of Chinese medicine hospital (Trial), the ratio of doctors and pharmacists should be 1:0.8 [12], while in 2019, the ratio of TCM teachers and Chinese pharmacists is only 1:0.2, which shows that the proportion of TCM pharmacists is seriously unbalanced and is not conducive to the improvement of the accessibility of TCM services. In order to improve the healthy and sustainable development of traditional Chinese medicine, the government and Chinese medicine colleges and universities should strengthen the training of traditional Chinese medicine practitioners, balance the proportion of traditional Chinese medicine practitioners, and optimize the structure of traditional Chinese medicine talents. The results of global spatial autocorrelation show that Moran ‘I index of all the indicators in all years is positive, indicating that since 2012, it has shown different degree of high or low value spatial agglomeration. The research finds that the high
Spatial Distribution of Human Resources Allocation Level
593
value areas are mainly concentrated in North China and southwest provinces. In most provinces, the level of TCM human resource agglomeration is relatively low, but the provinces with a higher degree of agglomeration show an increasing trend, indicating that the development of TCM human resources in China is unbalanced, and TCM human resources in some areas are developing rapidly. Local analysis results show that the agglomeration development of TCM human resources in China presents obvious regional characteristics, and the trend of agglomeration imbalance is strengthened, showing the evolution characteristics of central growth. The higher development level of Chinese medicine health human resources tends to be more centralized distribution, and the trend of this agglomeration distribution is gradually enhanced, while the agglomeration degree of the lower development level of provinces shows a general stable trend. With the difference of urban development stages, the contradiction between the increase of TCM human resource concentration in developed areas and the decrease of TCM human resource concentration in underdeveloped areas is further highlighted. This spatial agglomeration may be related to the different degree of attention in policy-making and the support for TCM human resources caused by the imbalance of regional development in China [13]. The analysis results show that the gap between the human resources gap of Chinese medicine in China is large, the distribution of resources is not balanced, the construction of Chinese medicine talents in North China is obviously more perfect, and the imbalance of the region may also be related to the flow of talent caused by different development conditions in the region [14]. The economically developed areas have relatively complete supporting facilities, large room for promotion, and attractiveness to talents, so Chinese medicine has richer human resources [15]. Therefore, it is necessary to increase further investment in economically underdeveloped regions, and take measures to improve the coordination of regional development, reduce the gap in human resources of traditional Chinese medicine between regions, and improve the rationality of the distribution of human resources of TCM in China. To sum up, the number of Chinese medicine human resources allocation has increased significantly from 2012 to 2019, but the imbalance of the proportion of Chinese medicine practitioners and the increase in the spatial agglomeration of Chinese medicine human resources are still prominent, and regional differences still dominate. On the one hand, establish and improve the incentive mechanism suitable for the characteristics of Chinese medicine talents, actively guide the flow of surplus and high-quality Chinese medicine human resources to economically underdeveloped areas, and narrow the gap in the allocation of Chinese medicine human resources in various regions, so as to improve the fairness of the allocation of Chinese medicine human resources. On the other hand, for economically underdeveloped provinces, the government should further improve the preferential policies, and must rely more on efficiency rather than balance, internalize external, and focus on the impact of factors such as economic level gaps on residents’ demand for traditional Chinese medicine services. The optimization of the spatial distribution of human resources in Chinese medicine eliminates obstacles. However, spatial analysis is essentially a descriptive analysis [16], and the conclusions of this study are relatively simple. In the future, spatial measurement models should be comprehensively used to analyze the deeper mechanisms affecting the temporal and spatial migration of human resources in TCM.
594
J. Liu et al.
References 1. Hu, L., Weng, Y., Cheng, L., et al.: Analysis of the allocation of human resources for nursing in Chinese medicine hospitals in China. Chin. Hosp. 24(07), 37–39 (2020) 2. Lu, X., Liu, C., Li, C., Yan, A.: Study on the fairness of Chinese medicine staffing in China: based on Gini coefficient and Thiel index. Chin. Health Econ. 36(10), 46–50 (2017) 3. Liu, Z., Yu, L., Gao, Y., et al.: Hefei Community Health Institutions, the status quo of Chinese medicine talent survey and analysis. Chin. J. Chin. Med. Inf. 20(7), 1–2 (2013) 4. Yang, C., Ruan, S., Chen, X., et al.: Analysis of the allocation of human resources of Chinese medicine in Fujian Province. J. Chin. Med. Manage. 19(11), 1017–1021 (2011) 5. Feng, Y.: Study on the Status and Needs of Chinese Medicine Health Human Resources in Gansu Province. Lanzhou University, Lanzhou (2012) 6. Chang, G., Sun, Y., Ren, X., et al.: A study on the fairness of the allocation of health resources in Ningxia based on the concentration index. Chin. Health Manage. 34(05), 350–353+372 (2017) 7. Zhou, K., Zhang, X., Ding, Y., et al.: Inequality trends of health workforce in different stages of medical system reform (1985–2011) in China. Hum. Resour. Health 13(1), 30–32 (2015) 8. Lu, X., Yin, C., Li, C., et al.: Equity analysis of the allocation of health human resources in maternal and child health hospitals in our country: based on the assumption of resource homogeneity. Chin. Health Manage. 35(04), 263–265+281(2018) 9. Chen, X.: Regional disparities in Zhejiang coastal areas and their causes since 1990. Geogr. Sci. 29(1), 22–29 (2009) 10. Jiang, Q., Zhao, F.: Application of spatial autocorrelation analysis method in epidemiology. Chin. J. Epidemiol. 06, 539–546 (2011) 11. Wang, Q.: Practical Methods of Regional Economic Research. Economic Science Press, Beijing (2014) 12. Xu, Y., Hong, B., Li, Z., et al.: Analysis of current status and development forecast of human resources in traditional Chinese medicine in China. Chin. J. Chin. Med. Inf. 25(06), 1–5 (2018) 13. Wu, X., Shen, S., Tian, S.: Analysis on the equity of Chinese medicine and health resource allocation from 2013 to 2017. Hygiene Soft Sci. 34(01), 55–59 (2020) 14. Zhan, D., Zhang, X.: Spatial analysis of the number of general practitioners and influencing factors in China. Chin. Gen. Pract. 22(22), 2660–2665 (2019) 15. Xu, Y., Man, X., He, W., et al.: Research on the allocation of Chinese medicine resources in Beijing based on the relief of non-capital functions. Chin. Med. Herald 26(04), 39–42 (2020) 16. Wang, X., Chen, Y., Liu, B.: A spatial statistical analysis of China’s regional real estate economic development level: a combined study of global Moran’s I, Moran scatter plot and LISA cluster plot. Math. Stat. Manage. 33(01), 59–71 (2014)
The Improvement Path of E-health Literacy of Undergraduates in Jilin Province Based on the Structural Equation Model Peixu Cui, Fang Xia, Jinping Liu, and Xin Su(B) School of Health Management, Changchun University of Chinese Medicine, Changchun 130117, China
Abstract. This paper aims to analyze the status quo of E-health literacy of undergraduates in Jilin Province and study the improvement path of E-health literacy. Methods: A questionnaire survey was conducted among 1631 undergraduates in Jilin Province, and the structural equation model was used to analyze the influence factors of their E-health literacy. Results: Self-management ability is a key factor in the formation of E-health literacy and has an important impact on the level of E-health literacy. Health information utilization and general self-efficacy are mediating variables in the formation of E-health literacy. Conclusion: The E-health literacy level of undergraduates in Jilin Province still needs to be improved. Appropriate intervention measures can be taken to improve students’ self-management ability, health information utilization ability and general self-efficacy, thereby improving undergraduates’ E-health literacy Level. Keywords: Undergraduates · Structural equation model · E-health literacy
1 Introduction According to the definition of “Healthy People 2010” report of the US Department of health and human services, health literacy is “the degree to which an individual has the ability to acquire, process and understand the basic health information and services needed to achieve appropriate health [1]. The rapid development of the Internet has made electronic health (E-health) information resources increasingly abundant, and the rational use of E-health information resources is conducive to maintaining and improving personal health. E-health literacy is a term referring to the ability of individuals to obtain, understand, evaluate and apply online health information or services to maintain and promote their own health [2]. According to a variety of studies, people with higher levels of E-health literacy have a good sense of self-efficacy and are able to better search and use health information [3], better perform self-management and participate in healthy behaviors [4]. This paper will investigate the status quo of e-health literacy level of undergraduates in Jilin Province, and analyze the key path and effect relationship of Ehealth literacy improvement of undergraduates, and provide certain theoretical reference for undergraduates to carry out health management and E-health literacy promotion under the new situation. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 595–603, 2022. https://doi.org/10.1007/978-981-16-6554-7_65
596
P. Cui et al.
2 Data Sources and Methods 2.1 Research Object Using the stratified random sampling method, a total of 2,000 students from 6 colleges and universities in Jilin Province were surveyed, and 1631 valid questionnaires were actually collected. The effective response rate of the questionnaire was 81.5%. 2.2 Questionnaire Design According to the “2012 National Residents’ Health Literacy Survey Questionnaire”, the final questionnaire was formed with appropriate modifications to the E-health behavior. The questionnaire contains four parts: E-health literacy, self-management ability, health information utilization and general self-efficacy. The questionnaire was filled out anonymously. The person in charge supervised the students to fill out the questionnaire independently and was collected on the spot. The completion rate of the main part of the questionnaire was not less than 90% and it was judged as a valid questionnaire. There are a total of 51 items in the formal questionnaire. The Cronbach’s alpha coefficient is 0.841 and the KMO value is 0.844, which has good reliability and validity. 2.3 Measuring Tools E-health Literacy Assessment Scale: The E-health literacy assessment scale was Chineseized by Guo Shuaijun and others [5]. It contains a total of 8 items, uses the Likert 5-level scoring method. The options “very inconsistent”, “non-conforming”, “unclear”, “conforming” and “very conforming” are counted as 1, 2, 3, 4, and 5 points respectively. The scores for all topics add up to the total score, the total score is 40 points, 32 points or more are qualified, the higher the scores, the higher the E-health literacy level. In this survey, the overall Cronbach’s α coefficient of the scale is 0.980, indicating that the scale has good reliability. General Self-efficacy Seale Chinese Version: GSES has a total of 10 items, involving self-confidence when individuals encounter setbacks or difficulties. It adopts a Likert 5-point scale. The options “very inconsistent”, “non-conforming”, “unclear”, “conforming”, and “very conforming” are counted as 1, 2, 3, 4, and 5 points respectively. The scores of all questions add up to the total score, the total score is 50 points, and the Cronbach’s α coefficient of the scale is 0.973. Health Self-management Ability Scale: This scale is a self-rating scale. It adopts Likert-Scale format and different evaluation grade standards according to the content and expression of specific problems of different subscales. Subscale 1 and subscale 3 are all marked with “always, often, sometimes, occasionally, never”; the options of subscale 2 are “agree, more agree, not sure, disagree, totally disagree” and “confident, more confident, uncertain, less confident, not confident”. They are counted as 1, 2, 3, 4, and 5 points respectively. The Cronbach’s α coefficient of the scale is 0.971.
The Improvement Path of E-health Literacy of Undergraduates
597
Health Information Utilization Capacity Scale: The E-health information utilization questionnaire was adapted based on the 2012 American Health Information Utilization Trend Scale. An 11-item E-health information utilization scale for undergraduates was formed after many translations and back translations, document integration and expert consultation. The entry uses the Likert 5-level scoring method, with a full score of 44 points. The higher the score, the higher the utilization of E-health information. The Cronbach’s α coefficient of the scale is 0.874.
2.4 Statistical Analysis After using EpiData 3.1 to enter the data, using SPSS 23.0 for descriptive analysis and Pearson correlation analysis, the test level is α = 0.05; the structural equation model was constructed by AMOS software.
3 Results 3.1 Basic Status of Survey Subjects Among the 1631 survey subjects, 681 were males (41.8%), 950 were females (58.2%); 596 were freshmen (36.5%), 677 were sophomores (41.5%), and 305 were juniors (18.7%), 51 were seniors (3.3%).
Table 1. Comparison of E-health literacy scores of students with different demographic characteristics (x ± s, points) Mean ± SD
Variables
Categories
n (%)
Gender
Male
681 (41.8)
29.21 ± 7.61
Female
950 (58.2)
29.71 ± 7.64
City
441 (27.1)
30.32 ± 7.64
Town
315 (19.3)
29.74 ± 7.66
Rural area
875 (53.6)
29.19 ± 7.62
Elementary school and below
283 (17.4)
29.00 ± 7.63
Junior high school
Birthplace
Father’s education
Mother’s education
738 (45.2)
29.62 ± 7.66
High school/technical secondary school/vocational 399 (24.5) high school
29.75 ± 7.58
College and undergraduate
120 (7.4)
29.81 ± 7.44
Master degree and above
91 (5.6)
30.45 ± 7.60
Elementary school and below
414 (25.4)
29.37 ± 7.57
Junior high school
666(40.8)
29.38 ± 7.65
High school/technical secondary school/vocational 388 (23.8) high school
30.07 ± 7.71
(continued)
598
P. Cui et al. Table 1. (continued)
Variables
Categories
n (%)
Mean ± SD
College and undergraduate
93 (5.7)
29.05 ± 7.68
Master degree and above
70 (4.3)
31.27 ± 7.44
185 (11.3)
28.83 ± 7.76
1000–2999 yuan
510 (31.3)
28.65 ± 7.73
3000–4999 yuan
486 (30.0)
29.88 ± 7.75
5000–9999 yuan
329 (20.2)
30.88 ± 7.55
10,000 yuan or more
121 (7.4)
30.22 ± 7.87
Original family
1387 (85.0) 29.62 ± 7.43
One-parent family
156 (9.6)
29.54 ± 7.76
Reorganize the family
88 (5.4)
29.50 ± 7.81
Very good
636 (39.0)
30.67 ± 7.60
Well
628 (38.5)
29.53 ± 7.58
Good
330 (20.2)
27.99 ± 7.72
General
33 (2.0)
25.97 ± 7.18
Bad
4 (0.3)
34.00 ± 7.44
Family monthly income Below 1000 yuan
Family structure
Self-rated health
The results show that there were statistically significant differences in the E-health literacy scores of items of different genders, birthplaces, parents’ educational level, monthly family income, family structure, and self-rated personal health status. The results are shown in Table 1. 3.2 E-health Literacy Status of Undergraduates in Jilin Province The results show that the total score of E-health literacy of undergraduates in Jilin Province is (29.6 ± 6.74), which is slightly higher than the Chinese norm (28.58 ± 7.00). There are 438 people with a score of 32 or more among them and the passing rate of E-health literacy is only 26.9%, indicating that their E-health literacy level still needs to be improved. The results are similar to Guo Shuaijun and Tian Xiuxiang’s [6], but still lower than the results of other countries studies. It may be explained by the advancement of E-health literacy in China and the lagging behind of the use of electronic resources in other countries [7]. The total score of undergraduates’ self-management ability is (144.78 ± 18.84) points, which is lower than the Chinese norm (153.60 ± 20.21 points), and the average score of each dimension item is: self-management behavior (3.45 ± 0.61), slightly higher than the Chinese norm (3.42 ± 0.74); self-management awareness (4.24 ± 0.82), higher than the Chinese norm (4.05 ± 0.80); self-management environment (3.74 ± 0.89), slightly lower than the Chinese norm (3.88 ± 0.82) [8]. It shows that the self-management ability of undergraduates needs to be improved. Although undergraduates have good selfmanagement cognition, the ability to transform it into self-management behavior is still lacking. The average score of general self-efficacy is (3.71 ± 0.80), and the average score
The Improvement Path of E-health Literacy of Undergraduates
599
of health information utilization is (2.99 ± 1.02). Table 2 indicates that undergraduates’ ability to use electronic resources to obtain and utilize health information needs to be strengthened. Table 2. Comparison of undergraduates’ self-management ability scores and norms, health information utilization and general self-efficacy scores Scores (x ± s) n = 1631
Norm n = 1205
Self-management behavior
3.45 ± 0.61
3.42 ± 0.74
A and task A->B
854
Z. Tao et al.
From Fig. 2, the accuracy of variant 2 is higher than variant 1, because variant 2 uses a class-wise cross-domain alignment method, maintaining category information. Variant 1 and variant 3 only have different generation methods, but the accuracy of variant 3 is 1.8% higher than variant 1, which indicates that the bi-directional crossdomain generation method is better. The results of variant 3 and variant 4 are very close while the accuracy of variant 5 is 5.34% higher than that of variant 3. This can show that the class-wise cross-domain alignment method and the dual consistent classifiers are effective. The accuracy of BCGAN-DA is higher than that of the other variants which indicates that all modules in BCGAN-DA are helpful to no-label classification task. The Convergence of the Model. The experiment was carried out on task B->A. withs/t out_otherloss is that the loss function only includes LGAN , with_otherloss is that the loss s/t s/t function includes not only LGAN , but also LMMD and Lcon .
Fig. 3. The loss in the process of task B->A training s/t
As shown in Fig. 3, when the loss function only includes LGAN , with the increase of training time, the loss decreases gradually, and then fluctuates, and the loss increases by a certain extent. This shows that the final result may be worse than the optimal result. s/t When the loss function also includes LMMD and Lcon , the loss decreases with the increase of time, and finally tends to a small stable value. This shows that the three loss terms can effectively solve above problems and verify the convergence of the method. In order to further verify the influence of the three loss terms on the convergence, we observe the accuracy during training.
Adversarial Unsupervised Domain Adaptation
855
Fig. 4. The accuracy in the process of task B->A training
As can be seen from Fig. 4, accuracy increases with the increase of training time, and finally remains stable at about 0.74. At the same time, by observing Figs. 3 and 4, it can be found that when the total loss is decreasing, the accuracy is also improving, s/t which proves that adding LMMD and Lcon , the problem of divergence with the extension of training time is effectively avoided. The overall performance of the model. We compare BCGAN-DA with MADA and DAAN. Table 1. Performance of different methods in task B->A. Method
Accuracy
Precision
Recall
F1 score
MADA
81.98
78.14
82.58
80.32
DAAN
86.32
82.77
86.93
84.78
BCGAN-DA
88.48
83.79
88.12
85.90
Table 1 shows the average performance of each method in task B->A, and it can be seen that BCGAN-DA has the best performance in all indicators. Compared with MADA, BCGAN-DA’s accuracy is increased by 6.5%, precision by 5.65%, recall by 5.54%, F1 score by 5.58%. Compared with DAAN, BCGAN-DA’s accuracy is increased by 2.16%, precision by 1.02%, recall by 1.19% and F1 score by 1.12%. This shows that BCGAN-DA has good classification performance for network traffic anomaly detection without label.
7 Conclusion In this paper, we propose BCGAN-DA for convergence network traffic anomaly detection with no labeled data. The proposed method includes bi-directional cross-domain
856
Z. Tao et al.
generators which make full use of two domains and reduce domain discrepancy with two directions, dual consistent classifiers which complete discrimination and classification at the same time, and class-wise cross-domain alignment which is helpful to improve the classification performance by quantifying the distribution discrepancy between domains and aligning source domain and target domain. Experiments on public network traffic datasets indicates that our method is effective and has better performance. Acknowledgment. This work is Supported by National Key R&D Program of China (2019YFB2103202, 2019YFB2103200).
References 1. Long, M.S., Cao, Y., Wang, J.M., et al.: Learning transferable features with deep adaptation networks. In: International Conference on Machine Learning, Lille, France, pp. 97–105 (2015) 2. Ding, Z.M., Fu, Y.: Deep transfer lowrank coding for cross-domain learning. IEEE TNNLS 30, 1768–1779 (2019) 3. Ganin, Y., Ustinova, E., Ajakan, H., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 2030–2096 (2017) 4. Tzeng, E., Hoffman, J., Saenko, K., et al.: Adversarial discriminative domain adaptation. In: IEEE Conference on Computer Vision and Pattern Recognsssition, Honolulu, pp. 2962–2971 (2017) 5. Pei, Z.Y., Cao, Z.J., Long, M.S., et al.: Multi-adversarial domain adaptation. In: 32nd AAAI Conference on Artificial Intelligence, Menlo, pp. 3934–3941 (2018) 6. Wang, G.Q., Han, H., Shan, S.G., et al.: Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 16, 56–69 (2021)
Encrypted Traffic Identification Method Based on Multi-scale Spatiotemporal Feature Fusion Model with Attention Mechanism Yonghua Huo1 , Hongwu Ge1 , Libin Jiao1 , Bowen Gao2 , and Yang Yang2(B) 1 Science and Technology on Communication Networks Laboratory, The 54th Research
Institute of CETC, Shijiazhuang, Hebei, China 2 State Key Laboratory of Networking and Switching Technology, Beijing University of Posts
and Telecommunications, Beijing, China [email protected]
Abstract. With the increasing complexity of encryption protocols in recent years, the existing network traffic identification and classification methods are facing great challenges. Researchers have found that most of the improved methods for deep learning are achieved by increasing the width and depth of the network. However, a large number of parameters need to be calculated in the process of network training, which increase the computational complexity of the algorithm. To address the problems of different important features characterization, this paper proposes a new classification model, which firstly preprocesses original traffic, followed by extracting the spatial features by CNN layers, obtaining the temporal features from LSTM layers, and then we fuse multi-scale features before features are put into attention mechanism to improve the ability of feature representation. Finally, we validate the effectiveness of our method on handling the ISCXTor2016 dataset. The experimental results show that the classification algorithm proposed in this paper achieved the lowest loss and the highest accuracy in the identification of encrypted traffic. Keywords: Tor · Encrypted traffic identification · Deep learning · Multi-scale feature fusion · Attention mechanism
1 Introduction With the rapid development of network technology and the sharp rise of network traffic in recent years, it is necessary to develop a fast and accurate automatic classification method for network traffic. Meanwhile, due to the wide use of network encryption algorithms, the existing network traffic identification and classification methods are facing great challenges. In order to better manage Internet traffic, optimize network quality, understand real-time network traffic distribution, and prevent network attacks, better network traffic classification methods are needed.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 857–866, 2022. https://doi.org/10.1007/978-981-16-6554-7_92
858
Y. Huo et al.
The obstacle to the identification and classification of network traffic is encryption. Encryption of network traffic is critical to protecting the privacy and security of Internet users, which is regarded as a key technology used in various privacy enhancement tools. For example, Tor, originally called the Onion Router, is the most widely used anonymous communication software with low latency and high quality. Tor forwards traffic through three nodes to achieve anonymity. Because the three forwarding nodes are located in different regions of the world, the anonymous communication system transmits TCP data stream through channels to provide anonymous service, which makes Internet censorship difficult to trace. However, with the increasing complexity of network applications and the wide use of encryption protocols, the traditional traffic classification methods have been unable to accurately identify the traffic characteristics. At present, most of the improved methods for deep learning are achieved by increasing the width and depth of the network. Although it can improve the classification accuracy of network traffic, a large number of parameters need to be calculated, which increases the computational complexity of the algorithm. At the same time, there are some problems such as over fitting and low training efficiency. To solve these problems, this paper proposes a multi-scale spatiotemporal feature fusion model with attention mechanism. Firstly, the original traffic data is preprocessed to get the statistical characteristics. Secondly, the spatial features of traffic data are extracted by convolution layers of different sizes. Then, the traffic features extracted by CNN layers are input to LSTM layer to extract the temporal features. Multi-scale feature fusion can merge multiple modal features, reduce the heterogeneity differences between modes, provide more information for model decision-making, and improve the accuracy of the overall classification results. In order to achieve multi-scale feature extraction, traffic features with different granularity are fused to enrich the feature representation. Finally, the fused feature vectors are input into the attention mechanism to learn the importance of different inputs, so as to dynamically calculate the weight of inputs at different times in the classification process, which helps to improve the final classification effect.
2 Related Works The obstacle to the identification and classification of network traffic is prevalent in research and applications, Network traffic classification has been widely used in various fields. There are mainly the following methods in the field of network traffic classification: Based on port matching, deep packet inspection (DPI), behavior characteristics and machine learning (ML). Jia et al. [1] presented a hierarchical classification approach for Tor anonymous traffic. An improved decision tree algorithm (Tor-IDT) was used to identify the Tor anonymous traffic from the mixed traffic, and then the Tri-Training algorithm was used to segment the identified anonymous traffic at the application level. Experiments proved that this hierarchical classification algorithm had wider applicability and higher classification accuracy. Kim et al. [2] proposed an approach to classify Tor traffic using hexadecimal raw packet header and convolutional neural network model. Comparing with competitive machine learning algorithms, the approach showed a remarkable accuracy.
Encrypted Traffic Identification Method
859
Calvo et al. [3] described a Machine Learning methodology used to identify Tor network traffic utilizing decision trees C5.0 and Random Forest. They followed a whitebox approach and accomplished accuracy of over 95% in the prediction in both models. They also presented an analysis of the importance of the top predictor variables. Sarkar et al. [4] presented a deep neural network (DNN) based system for the detection and classification of encrypted Tor traffic, which is 6.2% higher than previous work on the same dataset. Additionally, the robustness of the proposed DNN classifier was evaluated using adversarial samples generated from a Generative Adversarial Network (GAN). Wang et al. [5] proposed a Tor traffic identification and multilevel classification framework based on network flow features, which realizes the identification of anonymous traffic (L1), traffic types (L2) of anonymous traffic, and applications (L3) on a mobile and a PC platform, respectively, and further analyzed differences between the mobile and the PC platform. Gurunarayanan et al. [6] developed a machine learning model to identify Tor traffic. Random oversampling and random undersampling were performed to remove data imbalance. To improve the efficiency of our classifiers, they used k-fold cross-validation and Grid Search algorithms for hyperparameter tuning. Results showed that they achieve more than 90% accuracy with random sampling and hyperparameter tuning methods.
3 Encrypted Traffic Identification Method Multi-scale spatiotemporal feature fusion model with attention mechanism is an endto-end encrypted traffic classification model, which is used to determine whether the traffic is Tor traffic. The multi-scale spatiotemporal feature fusion model with attention mechanism includes two stage to solve the problems of tor traffic identification. In the preprocessing stage, the pcap files storing the original Tor traffic is converted into standard formatted csv files. After that, the spatiotemporal features are captured to identify traffic types through the hybrid neural network model proposed in the classification stage. 3.1 Preprocessing of Encrypted Traffic In this paper, flow is defined as a sequence of packets with the same values for {Source IP, Destination IP, Source Port, Destination Port and Protocol}. We first generates bidirectional flows from original pcap files by virtue of the above definition. Then we use extracted 23 features from pcap files. Table 1 shows the extracted features mentioned in this paper. Finally, we got 67828 records with 23 extracted features showed in Table 1, including 59784 regular traffic labeled NonTor and 8044 Tor traffic.
860
Y. Huo et al. Table 1. Traffic extracted features
Feature
Description
Flow Duration
The duration of the flow
Flow Bytes/s
Flow Bytes per second
Flow Packets/s
Flow packets per second
Flow IAT
Flow Inter Arrival Time, the time between two packets sent in either direction (mean, min, max, std)
Fwd IAT
Forward Inter Arrival Time, the time between two packets sent forward direction (mean, min, max, std)
Bwd IAT
Backward Inter Arrival Time, the time between two packets sent backwards (mean, min, max, std)
Active
The amount of time a flow is active before it becomes idle (mean, min, max, std)
Idle
The amount of time a flow is idle before it becomes active (mean, min, max, std)
3.2 Classification Process of Encrypted Traffic In the classification stage, the spatiotemporal features are captured to identify traffic types through the hybrid neural network model proposed in this paper. Compared with the existing deep learning traffic classifiers based on single-mode input, this paper proposes a traffic classification framework based on multi-scale feature fusion with attention mechanism, which aims to fuse different granularity features and identify traffic, so as to adapt to the dynamic network environment. The steps for establishing neural network model used to identify tor traffic from dataset in this paper are as follows: (1) Normalize the traffic features extracted in the preprocessing stage and encode labels with onehot coding rules. (2) Divide ISCXTor2016 dataset randomly into training set and test set with the ratio of 9:1. (3) Use three Convolution layers with different convolution kernel sizes to extract the spatial features of traffic. (4) Put the traffic features extracted by CNN layers into LSTM layers separately to extract the temporal features of traffic. (5) Fuse different granularity features to enrich the feature representation. (6) Use attention mechanism to deal with the importance difference of spatiotemporal features ignored by the existing structure and extract significant fine-grained features. (7) Finally, flatten the feature map into one-dimensional vector, which is used as the input of the full connection layer, and use the softmax function for classification.
Encrypted Traffic Identification Method
861
Figure 1 shows multi-scale spatiotemporal feature fusion model with attention mechanism defined in this paper.
Fig. 1. Multi-scale spatiotemporal feature fusion model with attention mechanism
Table 2 shows the symbol definitions used in this section. Network traffic is a one-dimensional byte stream with hierarchical structure. The structure of bytes, packets and sessions in traffic is very similar to that of characters, words and sentences in natural language processing. Nowadays, deep convolution neural network is mainly used in the field of computer vision and natural language processing. Scholars have found that one-dimensional convolutional neural network is suitable for sequence data, while two-dimensional convolutional neural network is good at processing images. This kind of data sequence is very suitable for traffic classification
862
Y. Huo et al. Table 2. Description of symbols
Symbol
Description
x
Model output is a 67828*23 dimensional input vector
C()
Use one-dimensional convolution layer to obtain spatial features
L()
Use LSTM layer to obtain temporal features
d()
Add the regularization method dropout to prevent over fitting
concat()
Different granularity features are fused to improve the classification accuracy
A()
Utilize attention mechanism to extract salient features of multimodality to improve the output quality of dynamic structure
f()
The feature map is flattened into one-dimensional vector
g()
Use Dense layer and softmax function to classification
Y
Model output, whose value is Tor or nonTor
using one-dimensional convolutional neural network. Therefore, this paper uses C i (x)(i = 1,2,3) to represent one-dimensional CNN to extract the spatial features of payload. Considering that network traffic is essentially a kind of time series data, the variable si represents that the traffic features extracted from CNN layer are input into LSTM layer. LSTM model is used to extract the potential temporal features which have great influence on traffic classification. si = Li (Ci (x))
(1)
Multi-scale feature fusion refers to the effective processing of multiple different types of features, the reasonable completion of information complementarity between features with different granularity, and the solution of the possible redundancy problem, so as to obtain more abundant deep fusion features. In the process of fusion, the detailed features and overall features of network traffic are fused to get new features. The model can not only learn spatiotemporal features at the same time, but also play the complementary role of multiple scale features, so it can learn features more efficiently and improve the classification accuracy. We use the variable ms to represent the feature vector after multi-scale feature fusion. ms = concat(d (s1 ), d (s2 ), d (s3 ))
(2)
In the research process of human vision, scholars put forward attention mechanism, which aims to achieve efficient distribution of feature processing. Due to the differences in the importance of the extracted spatio-temporal features, the important salient features often contain more information and have a greater impact on the trend of the actual classification effect. After multi-scale feature fusion, if the model is given the ability to pay more attention to high importance features, the effective features that contribute to the classification can be optimized after feature fusion. Therefore, the variable ams represents that this paper uses attention mechanism to extract salient features of multimodality in order to improve the output quality of dynamic structure. ams = A(ms)
(3)
Encrypted Traffic Identification Method
863
Finally, we use regularization method named dropout to prevent over fitting. Then, flatten the feature map into one-dimensional vector, which is used as the input of the full connection layer, and use the softmax function for classification. Y = g(f (d (ams)))
(4)
4 Simulation 4.1 Datasets In this paper, flow is defined as a sequence of packets with the same values for {Source IP, Destination IP, Source Port, Destination Port and Protocol}. We chose the ISCXTor2016 dataset for the experiment. In the ISCXTor2016 dataset of Tor traffic, all flows are TCP, since Tor does not support UDP protocol. The ISCXTor2016 dataset is derived from UNB (University of New Brunswick) and contains two different traffic: Tor traffic and normal traffic which was labeled NonTor. 4.2 Simulation Results To prove the effectiveness of the multi-scale spatiotemporal feature fusion model with attention mechanism, we will identify Tor traffic in the ISCXTor2016 dataset containing normal traffic and Tor traffic. To demonstrate the performance of the proposed model, we use loss and accuracy as metrics (Table 3). Table 3. Confusion matrix Actual
Positive
Negative
True
TP
FP
False
FN
TN
Predict
Accuracy =
TP + FN TP + TN + FP + FN
(5)
In this experiment, this dataset is randomly divided, 90% of the samples are automatically assigned as training samples, 10% of the samples as test samples, In order to verify the prediction performance of the proposed multi-scale spatiotemporal feature fusion model with attention mechanism, we use the following four models to compare in the same experimental environment: (1) MCLA: Multi-scale spatiotemporal feature fusion model with attention mechanism proposed in this paper. (2) MCL: Remove attention mechanism from MCLA.
864
Y. Huo et al.
Fig. 2. The training loss and accuracy
(3) CL: Multi-scale spatiotemporal feature fusion is removed from MCL. (4) C: Only use CNN model. Figure 2(a) shows the training loss of the ISCXTor2016 dataset in the four traffic classifiers. The dataset processed by the classification algorithm proposed in this paper achieved the lowest loss and was the most effective in the identification of the tor traffic. For the other three algorithms, almost achieved the same results. Without attention mechanism, this three comparison algorithms is not stable and performs poorly, which may be due to the extracted features are too redundant. Figure 2(b) shows the training accuracy of the ISCXTor2016 dataset in the four traffic classifiers in the accuracy ranking, and the algorithm proposed in this paper achieved the first place, and it was much better than other algorithms. In addition, the accuracy of traffic identification using multi-scale features fusion but not utilizing attention mechanism is lower than that not used multi-scale features fusion, indicating that multi-scale feature fusion should be combined with attention mechanism to obtain high quality comprehensive features. Collectively, the multi-scale spatiotemporal feature fusion model with attention mechanism proposed in this paper achieves the highest accuracy and performs more remarkably.
Fig. 3. The test loss and accuracy
Encrypted Traffic Identification Method
865
Figure 3(a) shows the test loss of the ISCXTor2016 dataset in the four traffic classifiers. From the Tor traffic identification results of the original dataset, the test loss of MCLA neural model achieved the lowest loss less than 0.121, while the test loss of the other models was above 0.25, which showed that the model proposed in this paper was more efficient. We can see that there are some fluctuations in the loss curve, since the ISCXTor2016 is a more complex dataset with many dimensions, and different categories have different sample numbers. Figure 3(b) shows the test accuracy of the ISCXTor2016 dataset in the four traffic classifiers in the accuracy ranking. The combination of multi-scale feature fusion and attention mechanism helped the classifier to achieve the first place in accuracy. And LSTM layers helped classifiers performed better, indicating that the temporal features captured by LSTM model are helpful to improve the classification effect of encrypted traffic. However, We found that multi-scale feature fusion method needs to be combined with attention mechanism to get the best improvement in classification. Table 4. Comparison of accuracy of four models
Accuray
MCLA
CLA
CL
C
94.90%
88.68%
88.60%
87.47%
Table 4 shows the accuracy of the four traffic classifiers on the overall ISCXTor2016 dataset. The combination of multi-scale feature fusion and attention mechanism helped the classifier to achieve the first place in the accuracy metric on the overall dataset. And LSTM layers helped classifiers performed better, indicating that the temporal features captured by LSTM model are helpful to improve the classification effect of traffic data. However, It can been seen that multi-scale feature fusion needs to be combined with attention mechanism to get the best improvement in classification. It can be seen from the figure that compared with the three benchmark models mentioned above, the classification effect of multi-scale spatiotemporal feature fusion model with attention mechanism is better on the same dataset, and the classification accuracy is improved by 7.43%, 6.30% and 6.22% respectively. Multi-scale spatiotemporal feature fusion model with attention mechanism has obvious classification effect, since it first uses CNN to extract local spatial features, then uses LSTM to extract temporal information, and utilizes multi-scale fusion to obtain comprehensive consideration of traffic data features. Finally, attention mechanism is used to identify the importance of traffic features. Meanwhile, multi-scale spatiotemporal feature fusion model with attention mechanism adopts parallel structure, which effectively solves the problem of gradient disappearance and explosion caused by the complexity of serial model.
866
Y. Huo et al.
5 Conclusion In this paper, we propose a new multi-scale spatiotemporal feature fusion model with attention mechanism for classification of encrypted traffic. Network traffic is first expressed as a flow. The local spatial features of traffic are extracted by CNN layers with different granularity, and the temporal features of traffic are extracted by LSTM layers. Then the features of different granularity are fused and input into the attention mechanism to give different attention to traffic, so as to reduce the loss of original features. Using three comparison models, this paper verifies that the proposed model can better represent the characteristics of network encrypted traffic and effectively identify Tor traffic from regular traffic. However, it also appears that multi-scale feature fusion needs to be combined with attention mechanism to achieve remarkable results. In the next step, we will further analyze how to better explore other representation forms of network traffic, so as to further improve the accuracy of network traffic classification. Acknowledgment. This work is Supported by Open Subject Funds of Science and Technology on Communication Networks Laboratory (6142104200106).
References 1. Jia, L., Liu, Y., Wang, B., Liu, H., Xin, G.: A hierarchical classification approach for tor anonymous traffic. In: 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN), pp. 239–243, May 2017 2. Kim, M., Anpalagan, A.: Tor traffic classification from raw packet header using convolutional neural network. In: 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), pp. 187–190, July 2018 3. Calvo, P., Guevara-Coto, J., Lara, A.: Classifying and understanding tor traffic using treebased models. In: 2020 IEEE Latin-American Conference on Communications (LATINCOM), pp. 1–6, May 2020 4. Sarkar, D., Vinod, P., Yerima, S.: Detection of tor traffic using deep learning. In: 2020 IEEE/ACS 17th International Conference on Computer Systems and Applications (AICCSA), pp. 1–8, November 2020 5. Wang, L., Mei, H., Sheng, V.: Multilevel identification and classification analysis of tor on mobile and PC platforms. IEEE Trans. Ind. Inf. 17(2), 1079–1088 (2021) 6. Gurunarayanan, A., Agrawal, A., Bhatia, A., Vishwakarma, D.: Improving the performance of machine learning algorithms for tor detection. In: 2021 International Conference on Information Networking (ICOIN), pp. 439–444, January 2021
Power Terminal Data Security and Efficient Management Mechanism Based on Master-Slave Blockchain Shaoying Wang1 , Huifeng Yang1 , Lifang Gao1 , Qimeng Li1 , Pengpeng Lv1 , Xin Lu1 , and Peng Lin2(B) 1 ICT Branch, State Grid Hebei Electric Power Co., Ltd., Shi Jiazhuang 050000, Hebei,
People’s Republic of China 2 Beijing VectInfo Technologies Co., Ltd., Beijing 100088, People’s Republic of China
[email protected]
Abstract. How to realize the safe and efficient use of power terminal equipment data is a key research content to enhance the value of smart grid data. In order to solve this problem, based on the master-slave blockchain theory, this paper designs an intelligent terminal data management system architecture based on the masterslave blockchain. The architecture includes three modules: master blockchain, slave blockchain, and system service interface. In order to improve the security of data in the process of data use, this paper conducts research from the two dimensions of data security storage and data access control, and proposes a data security storage mechanism and a data access control mechanism. In order to improve the efficiency of data query when the amount of data is large, this paper adopts the grouping bloom filtering strategy and the B+ tree index strategy to reduce the number of data queries and improve the efficiency of data query. In the experimental part, from the performance analysis of the improved Bloom filter based on grouping and the performance analysis of the block query mechanism based on the B+ tree index, it is verified that the data usage mechanism proposed in this paper has high block creation and data query efficiency. Keywords: Smart grid · Power terminal · Data security · Blockchain
1 Introduction With the rapid development and application of smart grid technology at any time, the number of smart power terminal equipment is increasing rapidly. In order to improve the management and application performance of the smart grid, the data resources generated by the smart power terminal equipment need to be better applied. However, due to the increase in network attack incidents, the security of power terminal data has been greatly threatened. In order to improve the safety of the use of power terminal data, existing studies have proposed effective solutions in the fields of data reliability improvement,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 Q. Liu et al. (Eds.): Proceedings of the 11th International Conference on Computer Engineering and Networks, LNEE 808, pp. 867–874, 2022. https://doi.org/10.1007/978-981-16-6554-7_93
868
S. Wang et al.
data standard processing, and access control. As blockchain technology has the advantages of decentralization and anti-tampering of data, it has been used in the field of power services [1]. Literature [2] applies blockchain to energy transaction scenarios, and designs a regional energy transaction model based on smart contracts according to the higher data reliability requirements in energy transaction scenarios. In order to further enhance the practicability of energy transaction models based on blockchain technology, literature [3] takes large-scale energy transactions as the research object and proposes a distributed energy transaction system model. In order to improve the security of power data in mobile scenarios, literature [4] proposed a mobile network-based handheld power data acquisition and analysis device based on the advantages of mobile network’s high bandwidth, low latency, and high reliability. In order to solve the security problem of data use in multiple trust domains, literature [5] adopts trust theory and proposes a trust-based cross-domain authentication solution for service entities. Literature [6] proposes a data processing model of the smart grid big data analysis system from the perspective of data use standardization, which improves the effect of smart grid data processing. Regarding the data risk problems that may exist in the use of power data, literature [7] uses the entropy weight-gray model based on the known data risk characteristics to propose a power data risk prediction model, which reduces the uncertainty in the field of data risk in the use of power data. Through the analysis of the existing research, it can be known that the existing research mainly adopts the blockchain technology to realize the safe storage and safe use of data. However, with the rapid increase in the number of power terminal equipment, the amount of data requests that the blockchain needs to process is rapidly increasing, which brings greater challenges to the blockchain platform. In addition, because the data storage method in the blockchain is chain storage, the efficiency of data query and use is low, and it cannot meet the needs of the smart grid business analysis system. In order to solve this problem, this paper adopts master-slave blockchain technology to establish multiple blockchain platforms to improve data storage and query efficiency under the premise of ensuring data security. In order to improve the efficiency of data query in the blockchain platform, this paper proposes an improved Bloom filter mechanism based on grouping to improve the efficiency of data access, and uses an index mechanism based on B+ tree to improve the efficiency of data query.
2 Safety Management System Architecture In order to improve the efficiency and security of power terminal data use, this paper proposes an intelligent terminal data management system architecture based on the master-slave blockchain. The architecture model is shown in Fig. 1, including three modules: master blockchain, slave blockchain, and system service interface. According to the location of the smart terminal, the blockchain platform can be deployed. On each slave blockchain platform, the smart terminal can save the generated data to the slave blockchain platform. The data generated by the smart terminal includes production data and maintenance data. Among them, production data is mainly related data used in power
Power Terminal Data Security and Efficient Management Mechanism
869
business. Maintenance data mainly refers to the data generated during equipment maintenance. From the blockchain platform, it is possible to realize the work of smart terminal data in the scope of the chain. The slave blockchain platform includes two functions: intelligent terminal management and data management. Intelligent terminal management is mainly completed by the endorsement node, and data management is mainly completed by the submitting node. The endorsement node guarantees the legitimacy of the smart terminal through the identity verification module of the smart terminal. The system service interface is implemented with RESTful technology, which can improve the compatibility of the system. The master blockchain can realize the unified management between each slave blockchain through the PoA algorithm. In order to improve the intelligence and security of the system, smart contracts based on the identity authentication function are deployed on the blockchain to realize the intelligence and security of the contracts in the blockchain.
Fig. 1. Smart terminal data management system architecture based on master-slave blockchain
In order to add a smart terminal to the blockchain, the smart terminal needs to submit a registration request to the blockchain platform. Considering the security of the registration process, during the smart terminal registration process, the identification information of the smart terminal needs to be verified. The registration process mainly includes the three processes of the smart terminal issuing a registration request, verifying the identity of the smart terminal from the blockchain platform, and implementing the security management of the smart terminal through the key mechanism. In the process of the smart terminal issuing a registration request, the smart terminal needs to register to the permitted blockchain platform based on the location information and the management mechanism of the company to which it belongs. In the stage of verifying the identity of the smart terminal from the blockchain platform, the verification is mainly based on the agreement between the blockchain platform and the power company. In the security management stage of the smart terminal, each smart terminal information stored from the blockchain platform needs to be encrypted by an asymmetric encryption mechanism, thereby improving the security of the smart terminal. After the smart terminal is registered on the slave blockchain platform, the generated data can be submitted as agreed.
870
S. Wang et al.
3 Safe and Efficient Data Management Mechanism 3.1 Safe Data Usage Mechanism In order to improve the security in the process of data use, this paper conducts research from two dimensions of data security preservation and data access control. The process of data storage includes three processes: data collection, data storage request, and execution data storage. In the data collection stage, the smart terminal generates power business data according to business needs. In the data storage request stage, the smart terminal searches for the slave blockchain to which it belongs according to the data storage rules of the power company to which it belongs. After finding the slave blockchain node, send a data storage request to it. After receiving the data storage request from the blockchain node, the smart terminal that made the data storage request is authenticated. After the identity verification is passed, the security of the data transmission and storage process is realized by distributing the public key. The data access control process includes three processes: data query request, data user authentication, data query and return to the requester. In the data query request step, the data user sends the data attribute information he needs and his own identity attribute information to the blockchain master node. The master node of the blockchain verifies the identity information and data attribute information of the data user according to the query request information. After passing the verification, data query is performed based on the fast data query algorithm proposed in this article, and the query result is returned to the data requester. 3.2 Efficient Data Query Mechanism The data in the blockchain is stored in chronological order. When the data storage time becomes longer, the certificate of part of the data expires and needs to be authenticated again, resulting in multiple versions of the same data in the blockchain. In order to improve the efficiency of data search, data can be retrieved from the levelDB of the blockchain. Because the write speed of levelDB is faster, the read speed of levelDB needs to be improved to satisfy more and more data requests. This article improves the Bloom filter to improve the query efficiency of data in levelDB. By using multiple B+ tree indexes, the query efficiency of data on each edge block is improved. In order to improve the efficiency of data query, a global ID is defined for each piece of data. Efficient data management process includes two main processes: query based on data ID, and use of improved Bloom filter to find block information in levelDB. Based on the block information, the B+ tree index is used to quickly find the block. Bloom filter is a special hash function, including n-bit array and k hash functions. When the element x needs to be stored, k hash functions are first used for calculation, and the calculation is performed according to the calculation result. This facilitates the speed of data search. Although the Bloom filter can improve the speed of the query, when the amount of data is large, the performance of the query is easily affected by the size of the data. In order to solve this problem, this paper reduces the number of data queries to improve the efficiency of data queries. Use m to represent the length of the data, and decompose m into r segments of length lg. Each segment includes k characters. Among them, r is
Power Terminal Data Security and Efficient Management Mechanism
871
a power of 2. When data needs to be stored, first store the data in a certain segment, and then map it to k characters. When data query is needed, the segment where the data is located can be obtained by calculating xg = xs %r. After the segment is obtained, formula (1) can be used to calculate the characters mapped by the data. i(i+1)+
xi = xs &
lg k
2t , 0 0 ReLU (x) = max(0, x) = (1) 0x