137 37
English Pages 316 [311] Year 2024
Bing Wang Zuojin Hu Xianwei Jiang Yu-Dong Zhang (Eds.)
533
Multimedia Technology and Enhanced Learning 5th EAI International Conference, ICMTEL 2023 Leicester, UK, April 28–29, 2023 Proceedings, Part II
Part 2
Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Editorial Board Members Ozgur Akan, Middle East Technical University, Ankara, Türkiye Paolo Bellavista, University of Bologna, Bologna, Italy Jiannong Cao, Hong Kong Polytechnic University, Hong Kong, China Geoffrey Coulson, Lancaster University, Lancaster, UK Falko Dressler, University of Erlangen, Erlangen, Germany Domenico Ferrari, Università Cattolica Piacenza, Piacenza, Italy Mario Gerla, UCLA, Los Angeles, USA Hisashi Kobayashi, Princeton University, Princeton, USA Sergio Palazzo, University of Catania, Catania, Italy Sartaj Sahni, University of Florida, Gainesville, USA Xuemin Shen , University of Waterloo, Waterloo, Canada Mircea Stan, University of Virginia, Charlottesville, USA Xiaohua Jia, City University of Hong Kong, Kowloon, Hong Kong Albert Y. Zomaya, University of Sydney, Sydney, Australia
533
The LNICST series publishes ICST’s conferences, symposia and workshops. LNICST reports state-of-the-art results in areas related to the scope of the Institute. The type of material published includes • Proceedings (published in time for the respective event) • Other edited monographs (such as project reports or invited volumes) LNICST topics span the following areas: • • • • • • • •
General Computer Science E-Economy E-Medicine Knowledge Management Multimedia Operations, Management and Policy Social Informatics Systems
Bing Wang · Zuojin Hu · Xianwei Jiang · Yu-Dong Zhang Editors
Multimedia Technology and Enhanced Learning 5th EAI International Conference, ICMTEL 2023 Leicester, UK, April 28–29, 2023 Proceedings, Part II
Editors Bing Wang Nanjing Normal University of Special Education Nanjing, China Xianwei Jiang Nanjing Normal University of Special Education Nanjing, China
Zuojin Hu Nanjing Normal University of Special Education Nanjing, China Yu-Dong Zhang University of Leicester Leicester, UK
ISSN 1867-8211 ISSN 1867-822X (electronic) Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ISBN 978-3-031-50573-7 ISBN 978-3-031-50574-4 (eBook) https://doi.org/10.1007/978-3-031-50574-4 © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
Preface
We are delighted to introduce the proceedings of the fifth edition of the 2023 European Alliance for Innovation (EAI) International Conference on Multimedia Technology and Enhanced Learning (ICMTEL). This conference brought researchers, developers and practitioners from around the world who are leveraging and developing multimedia technologies and enhanced learning. The theme of ICMTEL 2023 was “Human Education-Related Learning and Machine Learning-Related Technologies”. The technical program of ICMTEL 2023 consisted of 119 full papers, including 2 invited papers in oral presentation sessions at the main conference tracks. The conference tracks were: Track 1, AI-Based Education and Learning Systems; Track 2, Medical and Healthcare; Track 3, Computer Vision and Image Processing; and Track 4, Data Mining and Machine Learning. Aside from the high-quality technical paper presentations, the technical program also featured three keynote speeches and three technical workshops. The three keynote speeches were by Steven Li from Swansea University, UK, “Using Artificial Intelligence as a Tool to Empower Mechatronic Systems”; Suresh Chandra Satapathy from Kalinga Institute of Industrial Technology, India, “Social Group Optimization: Analysis, Modifications and Applications”; and Shuihua Wang from the University of Leicester, UK, “Multimodal Medical Data Analysis”. The workshops were organized by Xiaoyan Jiang and Xue Han from Nanjing Normal University of Special Education, China and Yuan Xu and Bin Sun from University of Jinan, China. Coordination with the Steering Committee Chairs, Imrich Chlamtac, De-Shuang Huang, and Chunming Li, was essential for the success of the conference. We sincerely appreciate their constant support and guidance. It was also a great pleasure to work with such an excellent organizing committee, we appreciate their hard work in organizing and supporting the conference. In particular, the Technical Program Committee, led by our TPC Co-chairs, Shuihua Wang and Xin Qi, who completed the peer-review process of technical papers and put together a high-quality technical program. We are also grateful to the Conference Manager, Ivana Bujdakova, for her support, and all the authors who submitted their papers to the ICMTEL 2023 conference and workshops. We strongly believe that ICMTEL conference provides a good forum for researchers, developers and practitioners to discuss all science and technology aspects that are relevant to multimedia technology and enhanced learning. We also expect that future events will be as successful and stimulating, as indicated by the contributions presented in this volume. October 2023
Bing Wang Zuojin Hu Xianwei Jiang Yu-Dong Zhang
Organization
Steering Committee Imrich Chlamtac De-Shuang Huang Chunming Li Lu Liu M. Tanveer Huansheng Ning Wenbin Dai Wei Liu Zhibo Pang Suresh Chandra Satapathy Yu-Dong Zhang
Bruno Kessler Professor, University of Trento, Italy Tongji University, China University of Electronic Science and Technology of China (UESTC), China University of Leicester, UK Indian Institute of Technology, Indore, India University of Science and Technology Beijing, China Shanghai Jiaotong University, China University of Sheffield, UK ABB Corporate Research, Sweden KIIT, India University of Leicester, UK
General Chair Yudong Zhang
University of Leicester, UK
General Co-chairs Ruidan Su Zuojin Hu
Shanghai Advanced Research Institute, China Nanjing Normal University of Special Education, China
Technical Program Committee Chairs Shuihua Wang Xin Qi
University of Leicester, UK Hunan Normal University
viii
Organization
Technical Program Committee Co-chairs Bing Wang Yuan Xu Juan Manuel Górriz M. Tanveer Xianwei Jiang
Nanjing Normal University of Special Education, China University of Jinan, China Universidad de Granada, Spain Indian Institute of Technology, Indore, India Nanjing Normal University of Special Education, China
Local Chairs Ziquan Zhu Shiting Sun
University of Leicester, UK University of Leicester, UK
Workshops Chair Yuan Xu
Jinan University, China
Publicity and Social Media Chair Wei Wang
University of Leicester, UK
Publications Chairs Xianwei Jiang Dimas Lima
Nanjing Normal University of Special Education, China Federal University of Santa Catarina, Brazil
Web Chair Lijia Deng
University of Leicester, UK
Organization
ix
Technical Program Committee Abdon Atangana Amin Taheri-Garavand Arifur Nayeem Arun Kumar Sangaiah Carlo Cattani Dang Thanh David Guttery Debesh Jha Dimas Lima Frank Vanhoenshoven Gautam Srivastava Gonzalo Napoles Ruiz Hari Mohan Pandey Hong Cheng Jerry Chun-Wei Lin Juan Manuel Górriz Liangxiu Han Mackenzie Brown Mingwei Shen Nianyin Zeng
University of the Free State, South Africa Lorestan University, Iran Saidpur Government Technical School and College, Bangladesh Vellore Institute of Technology, India University of Tuscia, Italy Hue College of Industry, Vietnam University of Leicester, UK Chosun University, Korea Federal University of Santa Catarina, Brazil University of Hasselt, Belgium Brandon University, Canada University of Hasselt, Belgium Edge Hill University, UK First Affiliated Hospital of Nanjing Medical University, China Western Norway University of Applied Sciences, Bergen, Norway University of Granada, Spain Manchester Metropolitan University, UK Perdana University, Malaysia Hohai University, China Xiamen University, China
Contents – Part II
Computer Vision and Image Processing Nodes Deployment Optimization for Indoor Localization Using FIR Filter . . . . . Ruohan Yang, Yuan Xu, Rui Gao, and Kaixin Liu Security Management Method of Power Communication Access Network Based on EPON Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chengfei Qi, Chaoran Bi, Yan Liu, Tongjia Wei, Xiaobo Yang, and Licheng Sha Image Recognition Technology of UAV Tracking Navigation Path Based on ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lulu Liu, Degao Li, Junqiang Jiang, Shibai Jiang, Linan Yang, and Xinyue Chen
3
12
23
Intelligent Extraction of Color Features in Architectural Space Based on Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhengfeng Huang and Liushi Qin
40
Stability Tracking Detection of Moving Objects in Video Images Based on Computer Vision Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ningning Wang and Qiangjun Liu
57
Virtual Display Method of Garment Design Details Based on Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shu Fang and Fanghui Zhu
73
Reliability Testing Model of Micro Grid Soc Droop Control Based on Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhening Yan, Chao Song, Zhao Xu, and Yue Wang
88
Pedestrian Detection in Surveillance Video Based on Time Series Model . . . . . . 104 Hui Liu and Liyi Xie Computer Vision Based Method for Identifying Grouting Defects of Prefabricated Building Sleeves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Shunbin Wang and Lin Wu
xii
Contents – Part II
Stability Detection of Building Bearing Structure Based on Bim and Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Lin Wu and Shunbin Wang Intelligent Integration of Diversified Retirement Information Based on Feature Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Ye Wang and Yuliang Zhang Recognition Method of Abnormal Behavior in Electric Power Violation Monitoring Video Based on Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Mancheng Yi, Zhiguo An, Jianxin Liu, Sifan Yu, Weirong Huang, and Zheng Peng Damage Identification Method of Building Structure Based on Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Hongyue Zhang, Xiaolu Deng, Guoliang Zhang, Xiuyi Wang, Longshuai Liu, and Hongbing Wang Automatic Focus Fusion Method of Concrete Crack Image Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Chuang Wang, Jiawei Pang, Xiaolu Deng, Yangjie Xia, Ruiyang Li, and Caihui Wu Teaching Effect Evaluation Method of College Music Course Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Lin Han and Yi Liao Recognition of Running Gait of Track and Field Athletes Based on Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 Qiusheng Lin and Jin Wang Research on Action Recognition Method of Traditional National Physical Education Based on Deep Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . 239 Liuyu Bai, Wenbao Xu, Zhi Xie, and Yanuo Hu Personalized Recommendation Method for the Video Teaching Resources of Folk Sports Shehuo Based on Mobile Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Ying Cui and Yanuo Hu Intelligent Monitoring System of Electronic Equipment Based on Wireless Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 Minghua Duan and Caihui Wu
Contents – Part II
xiii
Construction Site Inspection System Based on Panoramic Image Cloud Processing Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Caihui Wu, Xiuyi Wang, Bin Chen, Xiaolu Deng, and Minghua Duan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Computer Vision and Image Processing
Nodes Deployment Optimization for Indoor Localization Using FIR Filter Ruohan Yang, Yuan Xu(B) , Rui Gao, and Kaixin Liu School of Electrical Engineering, University of Jinan, Jinan 250022, Shandong, China xy [email protected] Abstract. In order to improve the positioning accuracy of ultrawideband (UWB) positioning system, it is usually to reduce the measurement error or choose a better positioning algorithm, and the geometric relationship between the blind node and the reference node is also an important factor affecting the positioning accuracy. In this paper, the minimum geometric dilution of precision (GDOP) of the mean value of the trajectory is proposed as the criterion for judging the optimal geometric layout of the reference node. After the optimal geometric layout of the reference node is obtained by the evaluation criterion, the finite impulse response (FIR) filter is selected to filter the observed values to remove noise and improve the system performance again. After simulation, it can be concluded that this method can effectively reduce the positioning error and improve the positioning accuracy. Keywords: Mean geometric dilution of precision Finite impulse response
1
· Ultra-wideband ·
Introduction
Since the 20th century, due to the rapid development of information technology and artificial intelligence, the various walks of life have been constantly pushed forward, and location information has become increasingly important to people’s daily life and work. Various positioning technologies on the market continue to emerge [4]. Based on the ultra-wideband (UWB) positioning technology in indoor positioning technology, it stands out among many indoor positioning technologies due to its strong penetration ability, high resolution, and centimeterlevel positioning accuracy [5]. In UWB indoor positioning, the known coordinate points are called reference node (RN), and the points to be located are called blind node (BN) [2]. For its positioning accuracy, in addition to ranging accuracy and positioning algorithm, the geometric layout of the RNs is also an important factor affecting the positioning accuracy [1]. The geometric dilution of precision (GDOP) provides a theoretical basis for us to judge the geometric layout of the RNs. Originated from Loran-C, GDOP is widely used to test the influence of satellite geometry on single point positioning estimation and is a dimensionless number of signals, c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 3–11, 2024. https://doi.org/10.1007/978-3-031-50574-4_1
4
R. Yang et al.
which is often used to estimate the expected positioning estimation accuracy and shows superior performance in wireless sensor network (WSN) applications [7]. For UWB positioning technology, GDOP is also applicable. For this article, the positioning area is a known trajectory, so the minimum mean GDOP of the trajectory is proposed as the criterion for judging the geometric layout of the RNs, and then the optimal geometric layout can be obtained [6]. Based on this premise, experimental simulation is carried out, and the finite impulse response (FIR) filter is used to process the observed values. For FIR filter, in the 1950s, FIR filter has attracted people’s attention, under constant attention FIR filter has been further promoted [2]. The research shows that the robustness of FIR filter is higher than that of Kalman filter (KF) in the presence of external interference [8]. Therefore, this filter is selected in this paper to further improve the positioning accuracy. The structure of this paper is as follows. In the second chapter, the definition and calculation method of GDOP are analyzed. The third chapter discusses the establishment of the optimal geometric model of the RNs and the reasons for selecting the FIR filter. The fourth chapter is software simulation, which verifies the method in this paper. The fifth chapter summarizes this article.
2 2.1
Analysis of Geometric Dilution of Precision Positioning Error Analysis
Covariance matrix of positioning error [3]: −1 −1 −1 2 Var(Δx) = HT H H RH HT H = HT H σe
(1)
where, H is the observation matrix of the system, R is the covariance matrix of the observation error. The following two assumptions are made for the observation error model: 1) It is assumed that the observation errors of the BNs to each RNs show the same normal distribution, the average value is zero and the σe2 variance is Gaussian white noise. 2) Assume that the observations of the BNs to different base stations are not correlated with each other. According to assume 2, the covariance matrix of observation error is diagonal matrix, The specific form is shown in Eq. (2). In this paper, four RNs are used, so it is a 4-order diagonal matrix. ⎤ ⎡ 2 σe 0 0 0 ⎢ 0 σe2 0 0 ⎥ 2 ⎥ (2) R=⎢ ⎣ 0 0 σe2 0 ⎦ = σe I 0 0 0 σe2
Nodes Deployment Optimization for Indoor Localization Using FIR Fiter
5
where, I is a 4-order unit matrix. After the above two assumptions, Eq. (1) is further simplified to: −1 −1 2 T T −1 2 σe I H H H = H H H σe (3) Var(Δx) = HT H It can be seen from the Eq. (3) that the covariance of the positioning error −1 is the product of the matrix G = HT H and the variance of the observation error. Therefore, we can improve the positioning performance of the system from the following methods: 1) Observation error, That is, the larger the σe2 , the greater the positioning error. 2) Geometric layout of RNs, The matrix G depends on the number of RNs involved in the solution and its geometric layout relative to the BNs. The smaller the elements in the matrix G, the smaller the amplification of the observation error, and the smaller the positioning error. Therefore, there are two ways to reduce the positioning error: one is to reduce the observation error. The second is to improve the geometric distribution of RNs. For the first point, the error can be reduced by improving the performance of the equipment and improving the mathematical model. This paper will explain how to improve the geometric layout of the RNs from the second point. 2.2
Definition and Calculation of Geometric Dilution of Precision
In order to represent the magnification from observation error to positioning error, the concept of GDOP is proposed. In this paper, it is used as the standard to measure the layout of RNs. That is, under the condition of the same observation error, the larger the GDOP, the greater the positioning error, and the smaller the GDOP, the smaller the positioning error. According to Eq. (3) in three-dimensional positioning: ⎤ ⎡ ⎤ ⎡ 2 σx σxy σxz σxt G11 G12 G13 G14 ⎢ σyx σy2 σyz σyt ⎥ ⎢ G21 G22 G23 G24 ⎥ 2 ⎥ ⎢ ⎥ ⎢ (4) ⎣ σzx σzy σz2 σzt ⎦ = ⎣ G31 G32 G33 G34 ⎦ σe G41 G42 G43 G44 σtx σty σtz σt2 where, σx2 , σy2 , σz2 , σt2 is each positioning error component in the covariance matrix of the positioning error, and Gis (i, s = 1, 2, 3, 4) is the element in the matrix G . The definition formula of GDOP can be obtained as follows: GDOP =
√ σx2 + σy2 + σz2 + σt2 = G11 + G22 + G33 + G44 = tr (HT H)−1 (5) σe
Since this paper is a two-dimensional positioning, there is the following solution to the system observation matrix H [1]:
2 2 (6) ri = (x − xi ) + (y − yi ) + cte , i = 1, 2, 3, 4.
6
R. Yang et al.
where ri is the distance observation, (x, y) is the real coordinate of the BNs, (xi , yi ) is the coordinates of RNs, and cte is the error caused by the time deviation between the BNs and the RNs. The Taylor series linearization expansion of Eq. (6) is performed at the approximate position (ˆ x, yˆ) of the BNs, and the components above the second order are ignored. Because the simulation area in this paper is small, the time delay error between the RNs and the BNs can be ignored, and we can get: ri − rˆi = where, rˆi =
2
ˆ xi − x yi − yˆ (x − x ˆ) + (y − yˆ) rˆi rˆi
(7)
2
(ˆ x − xi ) + (ˆ y − yi ) . Write Eq. (7) in matrix form as follows: L = HX
⎡x −x ˆ 1 ⎢ rˆ1 ⎢ x2 − x ˆ ⎢ ⎢ rˆ 2 where, H = ⎢ ⎢ x3 − x ˆ ⎢ ⎢ rˆ3 ⎣x −x ˆ 4
(8)
y1 − yˆ ⎤ ⎡ ⎤ rˆ1 ⎥ r1 − rˆ1 y2 − yˆ ⎥ ⎥ ⎢ ⎥ ˆ rˆ2 ⎥ ⎥ , L = ⎢ r2 − rˆ2 ⎥ , X = x − x . ⎣ r3 − rˆ3 ⎦ y3 − yˆ ⎥ y − yˆ ⎥ r4 − rˆ4 rˆ3 ⎥ y − yˆ ⎦ 4
rˆ4
rˆ4 −1 G11 G12 Therefore, with G = HT H = , the algorithm for obtaining G21 G22 the geometric accuracy factor in this paper is as follows:
−1 = G11 + G22 (9) GDOP = tr (HT H)
3 3.1
Analysis of Optimal Geometric Layout Model Establishment of Objective Function
From the above analysis, it is known that the value of GDOP is a function of RNs coordinates and BNs coordinates, and is independent of other factors. The GDOP formula obtained from the systematic observation matrix H shows that the GDOP is for each BN. However, the indoor location service space belongs to a region, and in this paper, the trajectory in the location area is solved. Therefore, the minimum GDOP of the mean value of the track is proposed as a criterion to evaluate the optimal geometric layout of the RNs. When the RNs location is known, a GDOP(x, y) value can be calculated for each BN in the location space. Assume that there are N track coordinate
Nodes Deployment Optimization for Indoor Localization Using FIR Fiter
7
sampling points, calculate each track coordinate point through Eq. (9) to obtain GDOP (xj , yj )(j = 1, 2, · · · N ). In order to optimize the RNs geometry layout of UWB positioning system, the minimum mean value of GDOP is used as the criterion, Eq. (10) and Eq. (11) are respectively used to calculate the mean value and the minimum mean value of GDOP. E (GDOP1 , GDOP2 , · · · , GDOPj ) =
F = min
3.2
N 1 GDOPj N j=1
N 1 GDOPj N j=1
(10)
(11)
Control Variable
Taking the RN D coordinate (x4 , y4 ) as the control variable of the optimal geometric layout model, the optimal solution of RN D coordinate, namely (xm , ym ) , can be obtained by solving Eq. (11). 3.3
Restraint Condition
In this paper, the positioning area is a square with an area of 6*6, so RN D is divided into squares with an area of 3*3, which accounts for 1/4 of the area of the positioning area. The constraints of RN D are as follows: −3 ≤ x4 ≤ 0 (12) 3 ≤ y4 ≤ 6 3.4
Finite Impulse Response Filtering Algorithm
KF is usually used to process data in satellite positioning and navigation. However, this kind of filter has a disadvantage. In order to obtain accurate estimation of position information and state variables, it is necessary to have an accurate noise statistic for system noise in the state equation and observation noise in the observation equation to improve the accuracy of the system model. This accurate noise statistics is difficult to achieve in the real environment. The FIR filter selected in this paper is less affected by noise, has stable linear phase, and does not need feedback. Research shows that the robustness of FIR filter is higher than that of KF in the presence of external interference. The state equation of the system is as follows: ⎤ ⎡ ⎤ ⎡ ⎤⎡ Px,t−1 Px,t 1 ∇t 0 0 ⎢ Vx,t ⎥ ⎢ 0 1 0 0 ⎥ ⎢ Vx,t−1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥⎢ (13) ⎣ Py,t ⎦ = ⎣ 0 0 1 ∇t ⎦ ⎣ Py,t−1 ⎦ +Wt Vy,t Vy,t−1 0 0 0 1 xt
Ft−1
xt−1
8
R. Yang et al.
where, Px,t and Py,t are the positions of BNs at time t, and Vx,t and Vy,t are the speeds of BNs at time t, and Wt ∼ N (0, Qt ) is the system noise. The original observation value is distance. The location information of the target is obtained from the original distance data by least squares solution. This location information is used as the observation value in the observation equation, so the following observation equation can be obtained: ⎡ ⎤ Px,t ⎥ Px,t 1000 ⎢ ⎢ Vx,t ⎥ +Vt = (14) ⎣ 0010 Py,t Py,t ⎦ V y,t Mt Yt xt
where Px,t and Py,t are the observation positions of BNs on X and Y axes at time t, and Vt ∼ N (0, Rt ) are the observation noise. Pseudo code with algorithm 1 as FIR filter.
Algorithm 1: FIR filter Input: Yt ,W ,b ˆt Output: X 1 for t = W − 1 : number do 2 m = t − W + 1,s = m + b − 1 Ys , if s < W − 1 ˆ Xs = ˜ 3 Xs , if s ≥ W − 1 4 Gs = I 5 for s = m + b : t do ˜ s|s−1 = Fs X ˜ s−1 6 X −1 −1 Gs = MTs Ms + Fs Gs−1 FTs 7 8 9 10 11 12
4
Ks = Gs MTs ˜ s|s−1 + Ks Ys − Ms X ˜s = X ˜ s|s−1 X end ˜s ˆt = X X end
Experimental Simulation Analysis
In the experimental simulation, four RNs are used to locate the BNs. Firstly, RN A, RN B, RN C are used as fixed points, and RN D is used as moving points. In the given constraints, the best coordinate point of RN D is selected according to
Nodes Deployment Optimization for Indoor Localization Using FIR Fiter
9
the criterion of minimum mean GDOP. Then, the coordinate points of the four RNs are substituted into the least squares algorithm, and the coordinate values of the BNs are calculated. Then, the FIR filter is used to remove the noise of the BNs coordinates to further improve the positioning accuracy. The positioning area is set to a square with an area of 6*6, so the constraining area of base station D is set to a square with an area of 3*3, and the constraining area is gridded evenly into 900 subareas, that is, there are 900 sample points in the constraining area of RN D. By substituting the 900 sample points into Eq. (9), an array of 900*1 is obtained, and then the minimum mean GDOP is filtered out by Eq. (11). The corresponding solution is that we want to get RN D. The software simulation shows that the RN coordinates selected according to the minimum mean GDOP and the addition of FIR filter can improve the performance of the positioning system. You can see the error CDF diagrams in Fig. 3. Table 1 shows the RMSEs for UWB and FIR in different mean GDOPs. Figure 1 shows the simulation when the mean GDOP reaches the maximum. The result is 1.5718, and the coordinate value of the corresponding RN D is (−2.4828, 4.2414). Figure 2 shows the simulation when the mean GDOP reaches the maximum. The result is 1.7497, and the coordinate value of the corresponding RN D is (−0.10345, 6).
Fig. 1. Mean GDOP = 1.5718.
10
R. Yang et al.
Fig. 2. Mean GDOP = 1.7497
Fig. 3. CDFs of UWB and FIR under different mean GDOP.
Nodes Deployment Optimization for Indoor Localization Using FIR Fiter
11
Table 1. The RMSE(m) generated by FIR and UWB under different mean GDOP Mean GDOP RMSE UWB
5
FIR
1.5718
0.11814 0.079218
1.7497
0.14112 0.089583
Conclusion
First of all, this paper analyzes the positioning error of the system. Secondly, on this basis, the concept of minimum mean GDOP is proposed, which is used to measure the geometric layout of RNs, and then the optimal layout is selected. Finally, FIR filter is introduced to filter the observed values. It can be seen that the addition of FIR filter has played a positive role in improving the positioning accuracy.
References 1. Ascher, C., Zwirello, L., Zwick, T., Trommer, G.: Integrity monitoring for UWB/INS tightly coupled pedestrian indoor scenarios. In: 2011 International Conference on Indoor Positioning and Indoor Navigation, pp. 1–6. IEEE (2011) 2. Bu, L., Zhang, Y., Xu, Y.: Indoor pedestrian tracking by combining recent ins and UWB measurements. In: 2017 International Conference on Advanced Mechatronic Systems (ICAMechS), pp. 244–248. IEEE (2017) 3. Feng, G., Shen, C., Long, C., Dong, F.: GDOP index in UWB indoor location system experiment. In: 2015 IEEE SENSORS, pp. 1–4. IEEE (2015) 4. Langley, R.B., et al.: Dilution of precision. GPS World 10(5), 52–59 (1999) 5. Sharp, I., Yu, K., Guo, Y.J.: Gdop analysis for positioning system design. IEEE Trans. Veh. Technol. 58(7), 3371–3382 (2009) 6. Tongyu, Z., Aihua, X., Rui, S.: Evaluation on user range error and global positioning accuracy for GPS/BDS navigation system. In: Proceedings of 2014 IEEE Chinese Guidance, Navigation and Control Conference, pp. 680–685. IEEE (2014) 7. Wang, C., Ning, Y., Wang, J., Zhang, L., Wan, J., He, Q.: Optimized deployment of anchors based on GDOP minimization for ultra-wideband positioning. J. Spat. Sci. 67(3), 455–472 (2022) 8. Xu, Y., Shmaliy, Y.S., Li, Y., Chen, X.: UWB-based indoor human localization with time-delayed data using EFIR filtering. IEEE Access 5, 16676–16683 (2017)
Security Management Method of Power Communication Access Network Based on EPON Technology Chengfei Qi1(B) , Chaoran Bi1 , Yan Liu1 , Tongjia Wei1 , Xiaobo Yang1 , and Licheng Sha2 1 Metrology Center of State Grid Jibei Electric Power Co., Ltd., Beijing 100045, China
[email protected] 2 State Grid Beijing Electric Power Company, Beijing 100031, China
Abstract. Generally, to ensure the security of the network is at the expense of network performance. In view of this problem, this paper attempts to apply EPON Technology to the security management of power communication access network. It is found that EPON Technology can provide a key with a length of 48 bits, with a total number of 248 keys, which is close to DES encryption in key length, which is more suitable for the service carrying requirements of power communication access network. Keywords: EPON Technology · Power Communication · Safety Management
1 Introduction EPON adopts point to multipoint structure and passive optical fiber transmission to provide a variety of services on Ethernet. EPON Technology is standardized by IEEE802.3 EFM working group. In June 2004, the IEEE802.3 EFM working group issued the EPON standard, ieee802.3ah (incorporated into the ieee802.3-2005 standard in 2005) [1]. In this standard, Ethernet and PON technology are combined, PON technology is adopted in the physical layer, Ethernet protocol is used in the data link layer, and Ethernet access is realized by using the topology of PON. Therefore, it combines the advantages of PON technology and Ethernet Technology: low cost, high bandwidth, strong scalability, compatibility with existing Ethernet, convenient management, etc. [2–4]. The concept of passive optical network has a long history. It has the characteristics of saving optical fiber resources and being transparent to network protocols. It plays a more and more important role in optical access network. At the same time, after 20 years of development, Ethernet technology has almost completely dominated the LAN with its characteristics of simplicity, practicality and low price. In fact, it has been proved to be the best carrier to carry IP packets. With the increasing proportion of IP services in Metro and trunk
© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 12–22, 2024. https://doi.org/10.1007/978-3-031-50574-4_2
Security Management Method of Power Communication Access Network
13
transmission, Ethernet is gradually infiltrating into access, Metro and even backbone networks through the improvement of transmission rate and manageability. The combination of Ethernet and PON produces Ethernet passive optical network [5, 6]. It has the advantages of both Ethernet and PON, and is becoming a hot technology in the field of optical access network. EPON is a new broadband access technology. It realizes the comprehensive service access of data, voice and video through a single optical fiber access system, and has good economy. Industry insiders generally believe that FTTH is the ultimate solution for broadband access, and EPON will also become a mainstream broadband access technology [7]. Due to the characteristics of EPON network structure, the special advantages of broadband access and the natural organic combination with computer network, experts all over the world agree that passive optical network is the best transmission medium to realize the “integration of three networks” and solve the “last kilometer” of information highway. Access network is also called user access network. For a long time, the service of communication network is mainly voice service, and the traditional access method is to use twisted pair copper cable to connect the user terminal to the switch of the local exchange. With the continuous improvement of social informatization, the types of communication services also began to develop from a single voice service to data service, image service and comprehensive service. The traditional twisted pair access mode is difficult to adapt to this development trend. Countries are developing a variety of access methods. Combined with the overall development ideas, principles and strategies, this paper analyzes the current situation, existing problems, development objectives, networking technology selection, EPON introduction and evolution, as well as the principles, application scenarios and technical schemes of network planning, and designs the security management method of power communication access network based on EPON Technology as follows.
2 Design of Security Management Method of Power Communication Access Network Based on EPON Technology 2.1 Security Risk Analysis of Power Communication Network Power communication network security is mainly composed of physical security and information security. Physical Security Risks Physical security mainly refers to line security, communication equipment security, communication network structure security and other aspects, mainly including transmission network, service network, access network and other aspects. Link line safety mainly refers to the safety of wired or wireless lines such as communication optical cable, microwave, wireless and power line carrier. The interference factors include lightning strike, external force damage, frequency interference, etc. Communication equipment security refers to the reliability of SDH, ont, microwave, carrier and other transmission
14
C. Qi et al.
equipment in terms of communication equipment configuration, redundancy protection technology, performance index, aging degree, electromagnetic compatibility, etc. in terms of security structure, power enterprises are generally configured as server, switch, working host, dual route backup protection, dual power supply backup protection and dual board backup protection [8]. The physical security risks of general power enterprises include the security protection deployment of the machine room against natural disasters such as fire or earthquake, whether the network cables laid in the machine room have electromagnetic radiation shielding, equipment fire wire grounding and other measures, dual UPS power supply configuration, preventive measures to prevent man-made damage to power communication equipment, and disaster recovery and backup of important data information. Information Security Risks Information security mainly refers to data transmission channel, Internet boundary protection, host equipment and terminal port security, data information security, etc. mainly at the network and business level based on IP protocol such as data network, there are risks such as malicious code propagation, Internet boundary penetration, illegal control of information network equipment, data source camouflage, transmission message tampering, transmission message eavesdropping, etc. [9, 10]. The security risks of IMS and other new technologies and business networks include insufficient protection level of core equipment, unclear application, no security protection, illegal outreach, untimely software version update, unclear security positioning, insufficient protection awareness, etc. The security risks supporting the network include host equipment security, Internet border security, data security and so on. The security risks of access network include illegal authentication, too complex networking, illegal access, too much information, malicious attack, eavesdropping information, intercepting information and monitoring information. 2.2 Zoning and Hierarchical Management Mode According to the above analysis results, the information security defense mode of power communication access network is designed. According to the implementation standards of power grid operation and management and different application directions and use requirements of various professional networks, the defense mode of power communication access network is built into “three layers and four areas”, and the carrying business is divided intothree layers of automation, production management and information management according to the vertical level, It is divided into four areas: real-time control, non controlled production, production management andmanagement information, which correspond to the structure ofpowercommunication access network. See the following Fig. 1:
Security Management Method of Power Communication Access Network
15
Fig. 1. Information protection layer of power communication access network
Hierarchical management: technical means to manage equipment, system and network separately at the three levels of automation, production management and information management. So that the three levels can operate independently without affecting each other. At the same time, network connection is adopted, and isolation devices with different means are adopted between zones. They can be accessed and read by using information technology means, but can not be modified and written. Partition management: network physical equipment, such as authentication system, identity recognition [11] and firewall, are used to isolate each area, focus on the protection of key business nodes such as real-time control area, and adopt security isolation devices with different permissions and different functional strength according to the importance of each area, so as to cover the information security protection of the covered businesses. Among the four regions. The real-time control area includes a sensing and receiving device, which can display the electric quantity value and marketing data index in real time. Non controlled production areas include businesses that can provide data services but do not have control functions in production and marketing, such as simulation, work ticket generation, etc. The production management area needs to realize the coordination and cooperation between disciplines and departments. The management information area needs to realize collaborative office and external contact. Therefore, the area should have the function of logging in to the Internet on the basis of ensuring no disclosure. Different areas are distinguished by physical isolation. For example, the identity verification system, access control, keys, etc. are divided into different office areas, so that the personnel in different areas cannot reach other areas without identity authorization. So as to ensure the isolation of personnel in the geographical environment. Between different levels, we grant different levels of access rights to different users in the management operating system. Access to other professional controlled information systems needs to be confirmed through the firewall and data interface, and the information stored under each user name is subject to technical encryption and certificate authentication, Users without corresponding qualifications cannot access the information possessed by other users [12]. In this way, low-level users will not be able to obtain the data held by high-level users. On the contrary, high-level users can control and manage the
16
C. Qi et al.
identity permissions of low-level users. So as to achieve the ultimate goal of zoning and hierarchical management. 2.3 Physical Security Management Based on EPON Technology According to the requirements of relevant technical standards and specifications of power communication access network equipment, as well as the safety risk, reliability risk, electrical and electromagnetic compatibility safety risk of EPON equipment technical indicators, the following safety protection points are put forward: (1) According to the security risk factors of technical indicators of access network equipment, it is required that the OLT MAC function and performance of EPON equipment, the MAC performance of ONU, uplink bandwidth allocation function, ONU loopback operation, Ethernet performance, QoS function [13, 14], multicast function and performance, service interface function and performance, and optical interface characteristics meet the requirements of relevant standards and specifications for network access detection, Select equipment manufacturers with high reputation. (2) In view of the safety risk factors of electromagnetic compatibility, the electromagnetic compatibility radiation and immunity of access network equipment are required to meet the requirements of standards and specifications for network access detection, including port withstand voltage detection, impulse voltage detection, immunity detection, etc. (3) According to the reliability requirements and safety risk factors, the topology of EPON networking optical cable shall be designed according to the power grid structure, the importance of carrying business and the construction of optical cable, and support three switching modes: trunk optical fiber protection, full protection and hand-in-hand. 10 kV communication access network should adopt ring, tree or star topology, and 0.4 kV communication access network should adopt bus, tree or star topology. For EPON network [15, 16], the networking structure carrying important services of distribution network (such as distribution automation) shall adopt double star backbone ring, double chain backbone ring or hand-in-hand networking mode. Important board redundancy of EPON equipment, such as OLT main control board and power board redundancy protection. (4) Safety management: carry out safety management on environment, assets and network, carry out safety management on network operation and maintenance, and formulate safety incident disposal plan and emergency plan. (5) Personnel safety management: formulate communication personnel management system, including personnel departure, assessment, safety training, etc. 2.4 Information Security Management Based on EPON Technology The EPON network of power communication network carries the data transmission of remote protection service and the transmission of power information collection service, which requires high confidentiality and controllability of information. At the same time, the load-bearing service of power communication EPON network is sensitive to time delay. Therefore, the technical means of encryption authentication, service isolation,
Security Management Method of Power Communication Access Network
17
message filtering, access control and security management are adopted in EPON networking to ensure the integrity, confidentiality, availability and controllability of the transmitted data in the network. (1) Encryption authentication EPON downlink data is subject to triple agitation and de agitation processing to isolate information among users. The uplink MAC (media access control) frame and OAM (operation management and maintenance) frame are processed by triple stirring encryption algorithm to prevent malicious users from forging MAC frame or OAM frame in the data channel to change the original configuration of the system or destroy the system. (2) Service isolation EPON shall support the implementation of VPN based on VLAN technology according to user services [17], and shall support VLAN based on port or MAC address. The number of MAC addresses received by the user port shall be limited, or the user’s MAC address shall be bound with the specified port. At the same time, the Mac and IP addresses of users shall be bound. (3) Message filtering In EPON system, ACL stream filtering mechanism is adopted to enable the system to support the source and destination MAC address frame filtering of ports, and also realize the packet filtering of upper layer protocol types, such as source, destination IP address, vlanid and TCP port number. (4) Access control Access control for servers, equipment terminals, etc. shall be realized in EPON networking, and network access control means such as 802.1× shall be used for authentication before access to the network. (5) Safety management For EPON networking, safe remote management mode and safe network management system shall be adopted for monitoring and management.
3 Experimental Analysis Build a distribution network integrated service communication access test platform, and verify the application effect of EPON Technology in power communication access network security management through the analysis of test data. 3.1 Experimental Platform The integrated service communication access test platform of distribution network is used to simulate the integrated access of real-time and non real-time services such as distribution automation, power consumption information acquisition and video monitoring in distribution network, and to realize the comprehensive monitoring and management of various communication equipment in distribution communication network. The test platform consists of hardware platform and software platform. The hardware platform includes EPON system, distribution automation system, such as data transmission unit, feeder terminal, contactor, power consumption information acquisition system (concentrator, collector), three-layer Industrial Ethernet switch, server, front-end computer,
18
C. Qi et al.
workstation, camera terminal, communication power supply, etc. The software platform includes three parts: distribution network communication network management system, distribution automation system and remote meter reading system. The test platform is composed of master station layer, convergence layer and access layer. The master station layer includes integrated network management system, distribution automation master station system, power consumption information acquisition master station system and integrated access hardware platform; The convergence layer is a ring network composed of three three-layer Industrial Ethernet; The access layer is EPON system. The test platform can realize the remote control, remote signaling and telemetry functions of distribution automation and the remote meter reading function of power consumption information acquisition. At the same time, on the test platform, all communication equipment of the platform can be centrally monitored and managed through the distribution network communication network management, realizing the comprehensive access of a variety of communication technologies, as well as the access of distribution automation and power consumption information acquisition services, and isolating the power consumption information acquisition and distribution automation services through different PON ports. 3.2 Experimental Setup Establish a test platform, log in to the test platform network management, ensure the normal monitoring and control of all equipment on the network, and ensure the normal data transmission of the network. The MAC address of each ONU is bound with the OLT. In the process of data transmission, the data is further processed by using the above design method, the key update cycle is set to 10 s, and the peak bandwidth and minimum guaranteed bandwidth of each ONU are allocated according to the service requirements. Observe the data transmission and record the changes of system throughput and forwarding delay. Secondly, in order to ensure the reliability of power communication EPON networking service, trunk optical fiber redundancy protection or full protection is usually adopted. Therefore, in the test platform, the above two protection measures are adopted for EPON networking carrying distribution automation service and power consumption acquisition service respectively to simulate link or equipment failure, switch transmission service and record protection switching time. First, create backbone optical fiber protection or full protection on EPON equipment, and bind any two PON ports in PON service board on OLT to make the two PON ports have the same configuration. Secondly, the bandwidth of each ONU is allocated according to the bandwidth requirements of the bearer service.
Security Management Method of Power Communication Access Network
19
3.3 Analysis of Experimental Results Throughput Test In the power communication access network, the services with packet length of 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, 1280 bytes and 1518 bytes are transmitted respectively. The uplink and downlink throughput of the network are tested when EPON technical safety protection measures are adopted. The recorded results are as Table 1: Table 1. Throughput Byte
Tradition (down) After EPON technology Tradition (up) After EPON technology processing(down) processing(up)
64
978.4
978.4
989.4
989.4
128
981.8
981.8
974.3
974.3
256
993.5
993.5
988.4
988.4
512
995.2
995.2
986.2
986.2
1024 994.7
994.7
982.4
979.7
1280 995.4
995.4
977.1
978.5
1518 998.3
998.3
977.0
978.4
It can be seen from Table 1 that when EPON Technology is used to process downlink data (including data frame and control frame) and EPON Technology is used to process uplink MAC frame and OAM frame in networking, it has no impact on system throughput. Forwarding Delay Test When EPON Technology is adopted, 90% of the throughput services are forwarded in the network, that is, seven services with different packet lengths of 64 bytes, 128 bytes, 256 bytes, 512 bytes, 1024 bytes, 1280 bytes and 1518 bytes are forwarded respectively. Test whether the forwarding delay is affected by security measures, and record the average forwarding delay results of multiple groups for comparison. It can be seen from Table 2 that in networking, EPON Technology has little impact on data forwarding delay after processing downlink data, including data frame and control frame, that is: Tavg(down) ≈ 0.25(μs)
(1)
Table 3 shows that EPON Technology has little impact on data transmission delay when processing uplink MAC frames and OAM frames, that is: Tavg(up) ≈ 0.05(ms)
(2)
20
C. Qi et al. Table 2. Downlink forwarding delay (µs)
Byte
Tradition
After EPON technology processing
64
17.564
17.621
17.623
17.845
17.816
17.884
128
20.465
20.549
20.654
20.847
20.647
20.716
256
25.504
25.648
25.624
25.843
25.741
25.769
512
36.142
36.249
36.654
36.491
36.287
36.347
1024
57.814
57.864
57.889
57.984
57.915
57.934
1280
68.945
68.951
68.934
69.087
69.051
69.054
1518
79.148
79.201
79.219
79.413
79.148
79.201
Table 3. Uplink forwarding delay (µs) Byte
Tradition
After EPON technology processing
64
1.287
1.301
1.301
1.342
1.331
1.323
128
1.306
1.304
1.306
1.387
1.337
1.314
256
1.325
1.324
1.304
1.399
1.357
1.387
512
1.354
1.351
1.327
1.471
1.367
1.397
1024
1.465
1.475
1.413
1.487
1.447
1.448
1280
1.423
1.416
1.447
1.489
1.466
1.487
1518
1.489
1.476
1.487
1.497
1.504
1.516
Class B Protection Test The tester sends uplink and downlink data through ONU/OLT, the frame length is 1230 bytes, the rate is 100 Mbps, and the data service is normal; Simulate the disconnection or abnormality of the working optical fiber, check whether the service can be automatically switched to the standby optical fiber, check the packet loss, record the uplink and downlink packet loss, and convert it into the service switching time according to the rate: Ta =
f m − fx v
(3)
where Ta refers to the service switching time, and the unit is s; fm represents the total number of bytes sent, in MB; fx represents the total number of bytes received, in MB; v represents the transmission rate, in Mbps. It can be seen from Table 4 that in order to ensure the safety of EPON networking services, when class B protection (backbone optical fiber protection) is adopted for important lines, if the optical fiber is damaged or abnormal conditions occur, the transmission services will be switched to the standby link, the data will be lost during the switching process, and the packet loss rate will not increase with the increase of traffic.
Security Management Method of Power Communication Access Network
21
Table 4. Average switching time of backbone optical fiber protection (MS) The number of ONU
down
up
1
2
6
2
1.5
5.25
3
2
6.25
4
2
6.2
Class C Protection Test Class C protection is full protection, and standby protection is used for trunk optical fiber and branch optical fiber. The tester sends uplink and downlink data through ONU/OLT, the frame length is 1230, the rate is 100 Mbps, and the data service is normal; Simulate the disconnection or abnormality of the working optical fiber, check whether the service can be automatically switched to the standby optical fiber, check the packet loss, record the uplink and downlink packet loss, and convert it into the service switching time according to the rate. It can be seen from Table 5 that in order to ensure the security of EPON networking services, full protection is adopted for important lines. Because EPON adopts broadcast mode in downlink and time division multiplexing technology in uplink, if the optical fiber is damaged or abnormal, the uplink switching time is significantly greater than the downlink switching time in the switching process. The switching time will not increase with the increase of business volume, and the switching time is less than 50 ms. Table 5. Average switching time of full protection (MS) The number of ONU
down
up
1
3.80
24.50
2
3.70
30.10
3
5.80
31.00
4
4.10
30.23
4 Conclusion With the development and application of national, local and large-scale enterprises and institutions’ communication networks, all kinds of communication networks are required to not only provide safe information transmission channels, but also the nodes necessary for transmitting information, because the security equipment providing security protection function is usually located at network nodes such as terminals, routers, gateways and front-end computers, It is a problem that must be considered in the future development of power communication network. I hope the above contents can provide effective reference for relevant research.
22
C. Qi et al.
References 1. Jiang, Y., Zou, X., Yan, X., et al.: Point-to-multipoint phase-stabilized microwave signal transmission in optical fiber links using passive phase compensation. Acta Optica Sinica 39(09), 86–92 (2019) 2. Yan, X., Yu, P., Nan, Y., et al.: Experimental analysis of four-wave mixing effect on next generation Ethernet passive optical network. Opt. Eng. 59(7), 1–14 (2020) 3. Li, Y., Zhang, Z.: Network security risk loss assessment method based on queuing model. Comput. Simulat. 38(04), 258–262 (2021) 4. Zhang, R., Li, Y., Tian, G., Li, T.: Research on the management methods of network security risk in Colleges and Universities. J. Changsha Telecommun. Technol. Vocat. College 20(02), 29–31+56 (2021) 5. Wang, Z., Zhou, J.: Cyber security risk assessment and construction scheme design of community hospital. Inf. Secur. Technol. 12(Z2), 50–56 (2021) 6. Li, X., Guo, T., Xiang, Y., Ning, H.J., et al.: Application of blockchain technology in industrial internet and analysis on its network security risks. Indust. Technol. Innov. 08(02), 37–42 (2021) 7. Jiang, R.: Design of network security risk detection system based on N-gram algorithm. Mod. Electron. Techniq. 44(01), 25–28 (2021) 8. Cui, W., Duan, P., Zhu, H. et al.: Security risk assessment on of attack graph and HMM industrial control network. Comput. Moderniz. 2020(07):32–37+49 (2020) 9. Zhou, S.: Design of novel optical fiber communication electronic system and big data prediction method of its loss. J. Nanoelectron. Optoelectron. 16(8), 1308–1316 (2021) 10. Jin, H.S.: Analysis of network security risk detection based on immunity. Comput. Eng. Softw. 41(10), 201–203 (2020) 11. Tang, F., Luo, Y., Cai, Y., et al.: Arc length identification based on arc acoustic signals in GTA-WAAM process. Int. J. Adv. Manuf. Technol. 118(5/6), 1553–1563 (2022) 12. Huang, M., Qi, H., Jiang, C.: Coupled collaborative filtering model based on attention mechanism. J. South China Univ. Technol. (Nat. Sci. Edn.), 49(07), 59–65 (2021) 13. Kalibatiene, D., Miliauskaite, J.: A dynamic fuzzification approach for interval type-2 membership function development: case study for QoS planning. Soft Comput. 25(16), 11269–11287 (2021) 14. Laki, S., Nadas, S., Gombos, G., et al.: Core-stateless forwarding with QoS revisited: decoupling delay and bandwidth requirements. IEEE/ACM Trans. Netw. 29(2), 503–516 (2021) 15. Roy, D., Dutta, S., Datta, A., et al.: A cost effective architecture and throughput efficient dynamic bandwidth allocation protocol for fog computing over EPON. 4(4), 998–1009 (2020) 16. Thangappan, T., Therese, B., Suvarnamma, A., et al.: Review on dynamic bandwidth allocation of GPON and EPON. J. Electron. Sci. Technol. 18(4), 297–307 (2020) 17. Rayapati, B.R., Rangaswamy, N.: Bridging electrical power and entropy of ONU in EPON. Optoelectron. Lett. 17(2), 102–106 (2021)
Image Recognition Technology of UAV Tracking Navigation Path Based on ResNet Lulu Liu1(B) , Degao Li1 , Junqiang Jiang1 , Shibai Jiang1 , Linan Yang1 , and Xinyue Chen2 1 State Grid Xinjiang Electric Power Co., Ltd., Information and Communication Company,
Urumqi 830001, China [email protected] 2 The Hong Kong University of Science and Technology, Hong Kong 999077, China
Abstract. The image of UAV tracking navigation path is easily affected by noise, and there are problems of low image recognition accuracy and poor image enhancement effect. Therefore, a ResNet-based image recognition technology for UAV tracking navigation path is proposed. Extract image features and analyze image color and shape. The Laplacian operator is used to process the image to enhance the edge of the image. Using the bilinear interpolation method, the image is scaled and grayscale is processed, and the noise is processed by combining the wavelet transform. Build a ResNet-based recognition model, use a multi-resolution octree hierarchical structure, render each node, and output any node image coordinates. Perform global pooling on the input feature map to improve image degradation. The gradient image is binarized using the binarization method. Fully consider the characteristics of the UAV tracking and navigation path, and use the statistical averaging method to obtain the average interference amplitude and phase, and calculate the interference characteristic distance. The iterative threshold selection method is used to obtain image recognition results. It can be seen from the experimental results that this technology can extract comprehensive image information, and has a high signal-to-noise ratio, which can achieve the purpose of image enhancement, and the highest image recognition accuracy obtained is 0.96, with accurate recognition results. Keywords: ResNet · UAV Tracking · Navigation Path · Image Recognition
1 Introduction UAV is a kind of aircraft that can independently complete specific tasks under unmanned conditions. It is the product of continuous development and progress of information technology, mainly involving sensor technology, communication technology, information processing technology, intelligent control technology, aviation propulsion technology and other advanced technologies. At the beginning of the UAV, it was widely used in the military field because of its small size, strong mobility, and the ability to effectively avoid casualties. UAV tracking navigation is a navigation technology based on computer © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 23–39, 2024. https://doi.org/10.1007/978-3-031-50574-4_3
24
L. Liu et al.
vision. Due to the advantages of complete autonomy, anti electromagnetic interference, and high navigation accuracy when approaching the target, UAV tracking navigation has received extensive attention and research at home and abroad, and has become a navigation mode with broad prospects. Some scholars have discussed using UAV tracking navigation to achieve autonomous landing or landing of UAV. UAV tracking navigation is to calculate the motion parameters of the carrier according to the real-time images taken by the camera on the carrier, so as to conduct the navigation and control of the carrier. The premise of navigation solution using target image information is to accurately obtain the features of target image, and then quickly use the obtained features to calculate the position and attitude of UAV. This process requires image recognition [1]. At present, reference [2] has proposed an image recognition method of anti UAV system based on convolutional neural network. Using self-made optical system acquisition equipment, pictures of different types of UAVs and birds have been collected, and convolutional neural network and support vector machine for small sample recognition of UAVs have been designed. The designed convolutional neural network is used to recognize MNIST datasets, UAV images and bird images respectively, and support vector machine is also used to recognize UAV and bird images, and comparative experiments are carried out. Reference [3] proposed a research and application of UAV image recognition technology system. Based on the YOLO v3 algorithm framework, a high-precision YOLO v3-SE target detection algorithm is constructed by introducing the attention module SE, forming a UAV image recognition technology system. It has been successfully applied to the detection and recognition of abnormal features in massive UAV images of many water conservancy projects. Reference [4] proposed a parallel feature pyramid network UAV aerial image target detection algorithm based on Cascade R-CNN. An improved algorithm based on feature pyramid network is studied. Parallel branches are added on the original basis, and feature information of two different sampling methods is used to enhance the expression ability of small target features. At the same time, increase Cascade R-CNN to strengthen small target positioning capability. Reference [5] proposed to apply machine learning algorithm to UAV to obtain high-precision aerial images. The image is divided into training image and experimental image. The trained image is input into different models, and the output of each model is evaluated using independent test data set. Experiments show that the image obtained by this method has high accuracy. Reference [6] proposed an improved UAV identification and detection method based on the integrated learning method of hierarchical concept. The method consists of four classifiers and works in a hierarchical manner. The first classifier is used to check the practicability of the UAV, the second classifier is used to detect the UAV type, and the last two classifiers are used to process the relevant samples. The public data set is used to evaluate the proposed method, which shows good recognition and detection efficiency. As far as the current development situation is concerned, it is still in the initial stage. The image recognition algorithms used have not been well applied to image recognition related fields. Some applied algorithms still have many problems, such as low recognition accuracy, poor generalization performance, low processing speed, etc. Faced with the above problems, the image recognition technology of UAV tracking and navigation path based on ResNet is proposed. The color and shape features of the image are extracted, and
Image Recognition Technology of UAV Tracking Navigation Path
25
the image is processed by edge enhancement, scaling gray scale and denoising. Build a recognition model based on ResNet, input the preprocessed image, render the image with multi-level detail rendering, and perform global pooling processing to improve image degradation. The gradient image is binarized by the binarization method. According to the path characteristics of UAV, the image recognition results are obtained by combining the statistical average method and iterative threshold selection method.
2 Image Preprocessing of UAV Tracking and Navigation Path Before the UAV tracking navigation path image recognition, it is necessary to preprocess the image to improve the image quality, which can effectively reduce the randomness of the UAV tracking process. The tracking navigation path image can be obtained through the acquisition card, and the noise interference factors it receives directly affect the accuracy of the recognition and analysis results. Therefore, the image preprocessing can remove the background and redundant parts that are not recognized targets. Image preprocessing includes image graying, image scaling, image filtering, image edge detection and image binarization. 2.1 Image Feature Extraction and State Analysis Image recognition is the main technology of UAV tracking navigation path image, and feature extraction is the basis of recognition. Image features include color and shape, among which shape features are the most intuitive and can describe basic attributes. In the image shape features, the image matrix has rotation and movement invariant attributes, which can be used as a recognition feature. In the binary image g(m, n), the expression of the image area (a + b) phase matrix and the center matrix is as follows: mb na g(m, n) (1) Eab = Oab =
n m − m n − n g(m, n)
(2)
In the formula, m , n represents barycentric coordinates respectively. The central matrix can be further defined by Formula (2). In the discrete case, for the preprocessed image, in the analysis process, it is necessary to first obtain the target recognition invariant moment feature vector, which is insensitive to noise and has good stability. It is a feature vector, which can effectively identify and analyze the UAV tracking navigation path image. In the field of target recognition, the background of UAV tracking and navigation path image recognition is extremely complex, and the illumination will change at any time, so the recognition accuracy cannot reach a satisfactory level. Therefore, the use of support vector machine has large operation, strong generalization and good realtime performance [7]. When the support vector machine is used for recognition and classification, the core function needs to be selected first. For the mapping of the sample space, it needs to be converted to the space first, and then the hyperplane classification is carried out; According to the classification results, the appropriate hyperplane can
26
L. Liu et al.
be selected and the support vector can be given; The problem of classification and recognition is transformed into the problem of solving functional relations. Because there are many kinds of UAV tracking and navigation path images, different kinds of classifiers need to be combined into multiple classifiers. Although the training time is extended, the constraints between different samples are fully reflected. Image moment features can be used as the input vector to identify and analyze images and ensure the accuracy of recognition results [8]. In the process of image acquisition, the poor focusing of UAV will introduce noise, which seriously affects the image quality and brings difficulties to image recognition and analysis [9]. The UAV obtains the video image from the received image, and first performs preprocessing. The image after preprocessing can improve the image quality, segment the target area where the substation equipment is located, and obtain the main characteristics of the UAV tracking and navigation path image recognition after binary processing. Image invariant moment features are mainly reflected in different image types. Input the feature vector into the support vector machine, and the UAV tracking and navigation path can be recognized after training [10]. The application of artificial intelligence method can greatly improve the level of automatic recognition. Based on this, an image recognition model is established. 2.2 Image Edge Enhancement Processing The Laplacian operator can be used to process the image to improve the image quality. The binary processing method can make the image extremely simple and facilitate the calculation speed. The Formula (3) can be used to convert image g(m, n) into a binary image, as shown below: 0 g(m, n) < α or g(m, n) > β g(m, n) = (3) 1 α ≤ g(m, n) ≤ β In Formula (3), α and β represent upper and lower thresholds respectively. The upper and lower thresholds α and β are determined according to the image attributes. If the gray image histogram is roughly similar to the distribution form of the operating conditions, it can be explained that the image contains both the target and the background, as well as the determined appropriate threshold. Only in this way can the target to be identified and the background be effectively separated. Use the Laplacian operator to enhance the image edge. 2.3 Image Scaling Grayscale Processing When video files are used as image sources, the decoded bitmap frame sequence is a true color image. For unified processing, the acquisition format of the image acquisition card is also set to true color 24 bits. Before further preprocessing, the obtained image is grayed. The graying formula is: f =
G B R + + λ1 λ2 λ3
(4)
Image Recognition Technology of UAV Tracking Navigation Path
27
In Formula (4), λ1 , λ2 , λ3 represents three color component coefficients respectively; R. G and B are the red, green and blue gray scale components of the original image, both of which range from 0 to 255. In order to avoid occupying CPU time by allocating space each time during image processing, static storage space is used to store image data, which requires that the size of the image is fixed. Therefore, it is necessary to reduce the width and height of the image to a fixed value. The size of the image after scaling is 400 * 300. The method used for image scaling is bilinear interpolation. Because the image noise collected by the video capture card is relatively large, it is necessary to denoise the zoomed gray image. These noises are random noises [11]. 2.4 Image Denoising Image recognition technology firstly monitors the recognized image, and intercepts a frame of image as the input sample for processing [12]. Extract the features of the captured input image, and output the features in digital form. However, due to the noise interference of the extracted features, there are many noise scatters in the feature extraction results. Therefore, image recognition technology needs to be used for denoising. Because the images transmitted by UAV tracking navigation path have different frequency characteristics, the recognition result is the result of multiple frequency superposition and coupling. In addition, when the UAV is interfered by noise in the process of tracking navigation path image recognition, a small frequency pulse will appear, and these images are easily obscured by background noise and large amplitude natural frequency components. Therefore, a single wavelet can not meet the needs of image decomposition. Therefore, in various cases, we must use multi wavelet to extract the complementary features of the image from multiple angles to improve the recognition accuracy. Different wavelet bases have great differences in the analysis results of images. In order to get complete reconstruction from wavelet decomposition, appropriate wavelet bases must be selected. By selecting the appropriate wavelet base, the image can be processed by wavelet transform to achieve the purpose of noise reduction. Low pass filter is usually used in conventional noise reduction methods. When wavelet transform is used for noise reduction, since the energy of noise components is mainly concentrated in the wavelet transform area, the real image is all concentrated on the wavelet coefficients. The appropriate wavelet coefficients are determined by the threshold setting method, estimated by the threshold function mapping method, and reconstructed to reduce the noise. For any integrable function, its wavelet transform can be expressed as: ∗ tL dt (5) f (t)ψ ψ(ϕ, L) = |ϕ| ∫+∞ −∞ ϕ In Formula (5), ϕ represents the wavelet scaling factor; L represents the relative displacement factor; ψ ∗ represents the conjugate wavelet basis; t represents the transformation time. For discrete noisy image computation, it can be expressed as: g (t) = r(t) + ς r (t)
(6)
28
L. Liu et al.
In Formula (6), r(t) represents an effective image; r (t) represents a noise image; ς represents a noise coefficient. Using wavelet analysis method, select an appropriate threshold, when the discrete noise image coefficient is less than this threshold, it can be regarded as 0; When the discrete noise image coefficient is greater than this threshold, it indicates that the image is a noise image and needs to be processed immediately, thereby completing the noise reduction processing of the image.
3 Path Image Recognition Technology Based on ResNet The deep residual network Res-Net is a convolution-based network model structure that improves the accuracy by deepening the network layer. The ResNet model parameters are reduced by replacing a large convolution kernel with several small convolution kernels, while increasing the number of nonlinear activation functions of the ResNet network model, which reduces the computational complexity of the ResNet model and improves computational efficiency. If the size of the input and output feature maps in the convolutional layer is the same, the number of filters remains unchanged; if the size of the feature map becomes half the original size, the number of filters will be doubled, and the pooling layer step size of the feature map is set into 2. The main difference between ResNet models with different depths is the number of layers of convolutional layers. Some image recognition classification projects can achieve high recognition accuracy because their network models have a deep network structure. The network structure extracts richer features by increasing the number of convolutional layers, so that the recognition effect will be better. However, if it is just a simple stacking network, the gradient will disappear. To solve this problem, the deep residual network adopts standard initialization and regularization to retain all feature variables, so that the network model will not lose accuracy. When the network structure is deeper, the phenomenon of overfitting will occur, which will lead to poor recognition effect. For the above problems, the ResNet network structure is combined with the residual learning method and optimizes the deep network to improve the recognition accuracy and learning speed. 3.1 Construction of Recognition Model Based on ResNet The characteristics of small differences between classes and large differences within images determine the importance of detailed features in the recognition process. In order to solve the problem that the recognition accuracy is difficult to improve due to this feature, an image recognition model that can enhance the detailed features is proposed. According to the characteristics of the image, it aims to enhance the ability to extract features from local to global, which is composed of deep shallow shared attention modules and enhancement blocks. The deep shallow shared attention modules improve the feature extraction effect by learning the weights of different channels of the image, and embed the network from shallow to deep to maximize the enhancement of global features; The enhancement block is used to improve the ability to extract the local details of the image. It uses asymmetric convolution to construct the enhancement core to replace the ordinary convolution core. At the same time, it establishes the connection between different layers of networks through the idea of jump connection, and realizes
Image Recognition Technology of UAV Tracking Navigation Path
29
local feature enhancement while reducing degradation. Based on this, the recognition model based on ResNet is constructed as shown in Fig. 1.
Input
1
2
n
Attention module Enhancement block Enhancement block
Attention module
Attention module Enhancement block
Fig. 1. Recognition model based on ResNet
When the initial optimal and user mapping becomes the optimal mapping by integrating the nonlinear layer with other mappings, if the original mapping is not as good as the residual mapping, then under special circumstances, the residual may be set to 0, which is easier to approximate the mapping to another mapping. Add an object that can realize identity mapping in the feedforward network. In this way, the number of network parameters will not change, and the calculation will not be cumbersome. The network model can still use back propagation for end-to-end training. 3.2 Detail Visualization Rendering Due to the huge number of point clouds obtained by the UAV tracking the navigation path, it is not easy to achieve image rendering by simply loading all at one time. Therefore, a multi-level detail rendering method based on the ResNet algorithm is proposed. The ResNet algorithm regards each group of networks mapped by the residual network as a building block, which is defined as: D = g(v, t)
(7)
In Formula (7), v represents the input vector, D represents the output vector, and g(v, t), as the residual mapping to be trained, has certain requirements for the dimensions of v and g in the formula, which need to be the same, otherwise, linear projection needs to be added to modify the configuration dimensions. There are some differences between convolutional neural network and ResNet algorithm. ResNet algorithm directly connects the network input with other layers in order to make the network layer learn residuals better. The convolution of convolution neural network will cause data loss during training, which will lead to the decline of recognition accuracy; In this case, ResNet algorithm bypasses the input information to the output in the convolution layer to ensure data integrity, reducing some tedious work and improving the accuracy of the algorithm. The ResNet algorithm has the following network design principles: ➀ Layers
30
L. Liu et al.
with the same feature map size have the same convolution kernel; ➁ When entering the pooling layer, the feature map size becomes half of the original size. In the residual network, the connection of dotted and solid lines is used to distinguish whether the network dimensions match. If there is a dimensional mismatch in the network, there are two options: the first is to directly use zero padding to add dimensions, and the second is to multiply the i matrix and project it into a new space. In the multi-level detail modeling, the hierarchical structure of the multi-resolution octree is used as shown in Fig. 2.
Fig. 2. Schematic diagram of multi-resolution octree construction
The octree is usually used to segment the leading point cloud. The basic idea is: first, get a minimum outer square that contains all the point cloud data and corresponds to the root node of the octree. Then, from different angles, divide the cube into eight smaller cubes, and the cube can continue to split until there is no more cube, or the size of the cube is opposite to the shape of the leaves of the octree. Select the node data to be rendered within the current leaf node range, and automatically adjust the rendering degree of each node through the octree. The image output coordinates for any node can be expressed as: x = Ji
xmax − xmin 2zi −1
(8)
y = Ji
ymax − ymin 2zi −1
(9)
In the above, xmax and xmin represent the maximum and minimum values of the horizontal coordinates respectively; ymax and ymin represent the maximum and minimum
Image Recognition Technology of UAV Tracking Navigation Path
31
values of the vertical coordinates respectively; zi represents the position of the i-th node in the octree; Ji represents the index position of the i-th node. The detailed data to be rendered is stored in the vertex buffer, and the position and size are set by the vertex shader. Each unit must be determined in a three-dimensional space that can be displayed on the screen. If it cannot, it must be cropped. If the primitive is beyond the visible range, it will be removed. After clipping and deletion, the position coordinates of the vertices are transformed. At this point, the primitive has completed the next step of raster processing, and the coordinates of the actual pixel are obtained through raster element mapping. Since the current pixel has no color information, a shader is required for coloring. Finally, in the frame buffering process, the color information of each layer is written into the frame buffer, thereby completing the image visualization enhancement processing. 3.3 Image Degradation Improvement Processing The attention residual module makes the attention module embedded in the deep and shallow layers of the network by combining the attention module and the enhancement module, and improves the learning effect of each stage of the network. The algorithm performs global pooling on the input feature map of size A × B and number C, and obtains the global information between C channels, which can be described as: 1 m(x, y) A×B A
U =
B
(10)
x=1 y=1
In Formula (10), m(x, y) represents the value of the feature map m of the C-th channel at the position coordinate (x, y). Secondly, in order to limit the complexity of the model and ensure the generalization ability of the model, U is excited twice to obtain the weight value of the feature map. Finally, the weights are applied to U , and the recalibrated feature map U is obtained, and the feature map U can be directly fed into the enhancement block. When the attention module is in the shallow layers of the network, it excites informative features in a class-independent manner and reinforces the shared universal features, and when the network is deep, the attention module tends to specialize and respond to different inputs in a class-specific manner. ResNet connects the above-mentioned attention module and enhancement block 4 times alternately, so that the deep and shallow layers of the network are embedded in the attention module, which not only stimulates valuable features, improves the learning effect, but also reduces the phenomenon of image degradation. 3.4 Image Recognition Process The super pixel method is used to segment the UAV tracking and navigation path image to obtain the super pixel points. The gradient image is binarized using the binarization method, and appropriate thresholds are set for different application scenarios. When the threshold value is large, the pixel is regarded as an edge point; If it is below the threshold value, it is considered as a non edge point; The binary gradient image is segmented
32
L. Liu et al.
by morphological extension operator and erosion operator. The expansion operator is calculated as follows: g ⊕ γ = i|(γ )i ∩ g = ∅ (11) In Formula (11), g represents the image to be expanded; γ represents the structural element. The calculation formula of the corrosion operator is: (12) g ⊕ γ = i|(γ )i ⊆ g In Formula (12), g represents the image to be eroded. The target image and the segmented image are masked to obtain the area where the UAV tracks the navigation path. In order to further narrow the area, the adjacent nodes are moved, and the control node is at the minimum position of the domain gradient. Calculate all pixel values in the seed point field, update the node position information, cluster labels according to the node position information, and set the initial distance metric. After that, calculate the minimum distance from each node in the neighborhood to the pixel point, and if the total distance is less than the original distance, update the pixel mark to the current node, and the total distance is equal to the initial distance. After the node traversal is completed, the node is located based on the classification mark of the pixel point. In the grayscale image, the grayscale value and coordinate average of the pixels in each classification mark are used as new nodes; the optimization is repeated until the cluster marks no longer change, or the maximum number of repetitions is reached, which improves the connection degree. The superpixels in the image are scanned by a left-to-right, top-down method, and the connected components of the separated single pixel points or regions that are too small are classified. Fully consider the characteristics of UAV tracking and navigation path, and use the statistical average method to obtain the average interference amplitude. The formula is:
i d (i) μ = i=1 2 Q |d |
(13)
In Formula (13), d represents the distance between the target and the background; Q represents the energy parameter. The formula for calculating the average interference phase is: i (14) ϑ =θ d (i) i=1
In Formula (14), θ represents the phase angle. On the two-dimensional characteristic plane of interference, since the interference amplitude and phase are different dimensional physical quantities, the interference characteristic distance defined by the Gaussian kernel function can be expressed as: d = exp p(H − Hs ) ⊕ G p (15)
Image Recognition Technology of UAV Tracking Navigation Path
33
In Formula (15), H and Hs represent the potential target interference feature and background interference feature respectively; G p represents the bandwidth matrix; p represents the interference eigenvector norm; ⊕ represents the vector outer product. The bandwidth matrix can normalize physical quantities of different dimensions and is suitable for measuring the distance between the target and the background. In the interference space, if the difference between the target and the background is obvious, the farther the vector is from the background feature, the more it represents the path image. Image recognition usually has many different methods for different applications. According to the characteristics of the target and background of the UAV image, grayscale segmentation is used to achieve image recognition. At the same time, an interface is designed to design different recognition algorithms for different target characteristics, so that the system has good scalability. The selection of the grayscale segmentation threshold adopts the iterative threshold selection method. The iterative formula is: ⎤ ⎡ d m−1
i nh h ⎥ nh h 1⎢ h=d +mi ⎥ ⎢ h=0 (16) + di+1 = ⎢ ⎥ m−1 di ⎦ 2⎣
nh nh h=di +1
h=0
In Formula (16), m represents the number of gray levels; n represents the number of pixels whose gray value is h. Take the initial threshold as 0.5 m, and use this iterative formula until the threshold does not change, that is, when the threshold converges to a certain value, it stops, and this threshold is the optimal segmentation threshold. After threshold segmentation, the image is divided into two parts: background and target, and then the center of gravity of the target is taken as the position of the recognition target. The calculation method of the center of gravity is as follows: k
x0 =
(17)
k k
y0 =
xh
h=1
h=1
k
yh (18)
In the above formula, k represents the total number of pixels occupied by the target, thereby obtaining the recognition result. The recognition model is built based on ResNet, the details of the pre-processed image are visualized and rendered, and the global pooling processing is carried out to improve the image degradation phenomenon. The image recognition is completed by the binary method and the statistical average method.
34
L. Liu et al.
4 Experimental Analysis 4.1 Experimental Setup The UAV azimuth target positioning, tracking and navigation system is mainly divided into two subsystems: UAV visual navigation system and UAV flight control system. The visual navigation system includes two parts: camera and raspberry pie. The airborne camera captures the target image information; Raspberry Pie is an independent visual processing system. It not only has image processing function, but also can process the captured image information in real time to obtain the information such as target orientation and angle; At the same time, it can also complete the estimation of target motion state, and then provide reference information for UAV flight. The flight control system carries out control operation according to the information given by the visual navigation system, and controls the UAV to accurately complete target tracking. UAV navigation path tracking and navigation system, as shown in Fig. 3.
Fig. 3. UAV navigation path tracking and navigation system
As shown in Fig. 3, the UAV is equipped with a visual navigation system to track the ground moving targets in real time. The Raspberry Pi and the camera are used as the visual navigation system, and the Raspberry Pi and the onboard camera are installed directly under the UAV as a visual navigation system to detect the moving targets on the ground in real time during the flight of the UAV. Connect the Raspberry Pi and the UAV flight controller through the serial port, the visual navigation system obtains the deviation information of the target and completes the angle solution, and provides navigation information for the flight controller through the target estimation strategy and control strategy. According to the navigation information provided by the visual navigation system, the flight controller is used to perform control operations to realize the real-time tracking of the target by the UAV. The overall block diagram of the experimental device is shown in Fig. 4. The designed visual navigation system mainly includes a Raspberry Pi and an external camera. The camera is used to collect real-time images. As an airborne image processing module, the Raspberry Pi is used to perform target detection and image processing on the collected images.
Image Recognition Technology of UAV Tracking Navigation Path
35
Motor drive module
Inertial Measurem ent Unit
Wireless communicati on module
STM32F main processor
Camera
Remote control module
Fig. 4. Overall block diagram of the experimental device
4.2 Experimental Area and Data The image of the drone tracking the navigation path was taken at 8:00 on April 4, 2020, and the weather was clear but foggy. Using a fixed-wing UAV manufactured by a company, the UAV can quickly and efficiently generate data such as topographic maps, orthophoto maps, and ground digital models. The fuselage is equipped with a Sony DSC-WX220 camera, the lens of which is Sony’s G lens. The image of the UAV tracking the navigation path is shown in Fig. 5.
Fig. 5. UAV tracking and navigation path annotation results
Taking the target annotation results shown in Fig. 5 as the research object, the experimental verification analysis is carried out.
36
L. Liu et al.
4.3 Experimental Results and Analysis In the process of identifying image information, it is not only limited to the collected image data, but also the problem of image distortion. Therefore, the convolutional neural network method, the YOLO v3 algorithm, the Cascade R-CNN method and the ResNet recognition technology were used to compare and analyze the image recognition results of the UAV tracking navigation path, as shown in Fig. 6. It can be seen from Fig. 6 that the images extracted by convolutional neural network method, YOLO v3 algorithm and Cascade R-CNN method are not comprehensive, resulting in partial loss of information. However, the images extracted by ResNet recognition technology are comprehensive, consistent with the selected areas shown in Fig. 5, and can obtain all image information. In order to prove that the researched technology has a good anti-interference effect, the signal-to-noise ratio of the interference images of these four methods is compared and analyzed again, and the comparison results are shown in Fig. 7. It can be seen from Fig. 7 that the convolutional neural network method, YOLO v3 algorithm and Cascade R-CNN method have low signal-to-noise ratio, while the ResNet recognition technology has introduced the ResNet technology to improve the signal-tonoise ratio of the interfering image, of which the highest signal-to-noise ratio is 30.5dB, enriching the important information of the image and obtaining the image enhancement effect. Based on this, these four methods are used to compare and analyze the image recognition accuracy, as shown in Table 1. It can be seen from Table 1 that the image recognition accuracy using the convolutional neural network method, the YOLO v3 algorithm, and the Cascade R-CNN method is the highest at 0.52, 0.56, 0.69, and 0.96. The highest image recognition accuracy using the ResNet method is 0.96, indicating that the image recognition accuracy using this method is high.
Image Recognition Technology of UAV Tracking Navigation Path
(a) Convolutional neural network method
(b) YOLO vz algorithm
(c) Cascade R-CNN method
(d) ResNet recognition technology
Fig. 6. Comparative analysis of image extraction results by different methods
37
38
L. Liu et al. ResNet recognition technology Convolutional neural network method YOLO v3 algorithm Cascade R-CNN method 31.00 30.00
SNR/dB
29.00 28.00 27.00 26.00 25.00
2 4 6 8 10 Number of images/frame
12
14
Fig. 7. Comparative analysis of signal-to-noise ratio of different methods
Table 1. Analysis of recognition accuracy and loss degree of recognition process by different methods Iterations/time
Convolutional neural network methods
YOLO v3 algorithm
Cascade R-CNN method
ResNet
200
0.39
0.49
0.65
0.92
400
0.32
0.48
0.67
0.93
600
0.48
0.45
0.69
0.93
800
0.45
0.49
0.67
0.95
1000
0.45
0.53
0.69
0.96
1200
0.52
0.56
0.69
0.93
5 Conclusion ResNet-based UAV tracking and navigation path image recognition technology uses the actual image to understand the position and attitude information. The technique avoids overfitting, converges quickly, and the residual learning network makes deep network training easy. The highest image recognition accuracy is 0.96. The ResNet algorithm has good recognition effect and robustness, and has a certain promotion effect in other machine vision application fields. Although the ResNet algorithm achieves the expected image recognition effect, there are still shortcomings. The number of images used in this experiment is small and the relative noise is small. There may be some differences in the actual recognition process. In the future research, it is necessary to continuously improve the recognition effect of the proposed method to cope with the recognition of massive path images and the interference of greater noise.
Image Recognition Technology of UAV Tracking Navigation Path
39
References 1. Li, G.: Design of UAV navigation system based on image recognition. J. Agricult. Mech. Res. 43(01), 114–118 (2021) 2. Xue, S., Zhang, Z., Lv, Q., et al.: Image recognition method of anti UAV system based on convolutional neural network. Infrared Laser Eng. 49(07), 250–257 (2020) 3. Zhao, X.: The study and application of a UAV image recognition technology system. China Rural Water Hydropower 05, 195–200 (2022) 4. Liu, Y., Yang, F., Hu, P.: Parallel FPN algorithm based on cascade R-CNN for object detection from UAV aerial images. Laser Optoelectron. Prog. 57(20), 302–309 (2020) 5. Lou, X., Huang, Y., Fang, L., et al.: Measuring loblolly pine crowns with drone imagery through deep learning. J. Forest. Res. 33, 1–12 (2021) 6. Nemer, I., Sheltami, T., Ahmad, I., et al.: RF-based UAV detection and identification using hierarchical learning approach. Sensors 21(6), 1947–1955 (2021) 7. Wan, J., Ma, L.: Optimal generation method of GPS navigation UAV transportation path. Comput. Simulat. 38(11), 37–41 (2021) 8. Wang, Z., Peng, H.: Voronoi diagram-based optimized traversal path of unmanned aerial vehicle for acquisition data of nodes in wireless sensor networks. Mod. Mach. Tool Automat. Manuf. Techniq. 2021(09), 67–70+74 (2021) 9. Shang, K., Zheng, X., Wang, M., et al.: Image semantic segmentation-based navigation method for UAV auto-landing. J. Chin. Inertial Technol. 28(05), 586–594 (2020) 10. Jia, H., Wang, L., Fan, D.: The application of UAV LiDAR and tilt photography in the early identification of geo-hazards. Chin. J. Geolog. Hazard Control 32(2), 60–65 (2021) 11. Triet, N.A., Phuong, N.D., O’Regan, D., et al.: Approximate solution of the backward problem for Kirchhoff’s model of parabolic type with discrete random noise. Comput. Math. Appl. 80(3), 453–470 (2020) 12. Yuan, H., Zhang, S., Chen, G., et al.: Underwater image fish recognition technology based on transfer learning and image enhancement. J. Coast. Res. 105(SI):124–128 (2020)
Intelligent Extraction of Color Features in Architectural Space Based on Machine Vision Zhengfeng Huang1(B) and Liushi Qin2 1 Polytechnic Institute, Guangxi Agricultural Vocational and Technical University,
Nanning 530000, China [email protected] 2 College of Art and Design, Nanning University, Nanning 530000, China
Abstract. Architecture itself has artistic characteristics, and architectural color will change with the change of environmental parameters. The analysis of the color characteristics of the building space is conducive to the color analysis and auxiliary design of the building. The current color feature extraction methods are prone to a sharp increase in signal-to-noise ratio, resulting in a low extraction rate. In order to solve the above problems, a new intelligent extraction method of building space color features is designed based on machine vision. The color information transmission channel model is established, and the transmission path function is introduced to obtain the mathematical model of intelligent extraction of building space color features. The intelligent extraction is realized through the reliability calculation model. Using machine vision, color information and texture information are organically combined to extract construction land information. The experimental results show that in the process of extracting color features, the extraction results of this method will not appear mutation points, the stability is well guaranteed, and the extraction accuracy can reach more than 99%, indicating that this method has good application effect. Keywords: Machine Vision · Architectural Space · Space Color · Color Characteristics · Feature Extraction · Intelligent Extraction · Extraction Method
1 Introduction Architectural color is a complex theoretical system, involving many dynamic and variable environmental parameters. In the real environment, the architectural color depends on the architectural form and is affected by various factors, such as the nature of the sky, green space, water, sunshine intensity and changes in sunshine angle, and the purity of the air. In addition, the regional color background is also a factor that cannot be ignored. Architecture itself has artistic characteristics, and architectural color will change with the change of environmental parameters. The analysis of the color characteristics of the building space is conducive to the color analysis and auxiliary design of the building. However, because architecture itself has the characteristics of art discipline and involves a large number of perceptual knowledge and understanding, some factors © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 40–56, 2024. https://doi.org/10.1007/978-3-031-50574-4_4
Intelligent Extraction of Color Features in Architectural Space
41
that affect architectural color are difficult or have no better method to quantify [1, 2]. To solve this problem, this research is based on machine vision, and extracts the color features of architectural space. This paper focuses on the innovation of research ideas and methods, focusing on the color of single building itself. The general research ideas are as follows: ➀ Establish color information transmission channel model; ➁ The transfer path function is introduced to establish the color feature extraction model of building space, and the intelligent extraction is realized through the reliability calculation model. In this study, in the process of collecting research materials, it is inevitable to encounter the following situations: the image content not only includes the building itself, but also includes the surrounding environment images of the building. These environmental elements are very different from buildings in terms of color perception. If they are not distinguished and included in the calculation of architectural color, it will undoubtedly have a greater impact on the analysis results. In order to highlight the main target position of the building in the image and reduce the interference of other images, an image preprocessing link should be added before using computer-aided methods to study the collected materials. The main operation of image preprocessing is divided into two steps: first, extract the content of research value in the image, and eliminate irrelevant factors and interference factors; The second is to select an identifier that is easy to identify in program calculation and replace it on the excluded content in the picture, so as to exclude the marked content in the calculation process.
2 Building a Mathematical Model for Intelligent Extraction of Architectural Space Color Features The mathematical model of intelligent extraction of building space color features established in this paper is based on the principle of machine vision. In the principle of machine vision, the intelligent camera sends network data to the data network according to the image detection results. The intelligent camera is the server of the communication network, which can save the building space color information to different memory spaces in the communication network. When configuring the building space color information, The network data parameters in the memory space need to be allocated to different offset addresses. The operating terminal in machine vision is used to detect the building space color information parameters in the offset address. After the detection is completed, the detection results are output. The communication terminal in machine vision reads and writes the building space color information detection results, displays the detection results on the intelligent camera, and obtains the configuration information of communication data [3]. In order to calculate the reliability of architectural space color information of machine vision, it is necessary to build a mathematical model for intelligent extraction of architectural space color features. Through this reliability mathematical model, the data processing capability of the transmission path of architectural space color information can
42
Z. Huang and L. Qin
be evaluated, so as to realize the extraction of architectural space color information. Reliable transmission, first of all, a model of the transmission channel model of architectural space color information is established: PL = A + 10γ lg(d /d0 ) + Xf + s
(1)
where, γ represents the data transmission capacity coefficient; d represents the distance between the operator and the communication terminal in the machine vision system; s indicates the buffer size of the transmission path; A represents packet loss rate of color information in building space during data transmission; Xf is the time interval of data transmission path, γ , A, Xf are respectively expressed by the following formula: ⎧ ⎨ γ = a − bhb + c/hb A = 20 × lg(4π d0 /λ) ⎩ Xf = 6 × lg(f /2000)
(2)
Among them, a, b, c, hb represent natural numbers between 0–10. Usually, the loss of the color information transmission path in the building space is closely related to the communication time of the communication terminal in the machine vision system. The loss of the color information transmission path in the building space affects the stability of the color information transmission channel in the building space. The transmission path is used to calculate the rate at which the building space color information is sent to the communication terminal, detect the transmission status of other building space color information transmission paths, evaluate the ability of different building space color information transmission paths to process data, and dynamically configure the building space color information according to the level of capability. Flow, receive discrete components in the data transmission path, and use the following formula to calculate the transmission path function of building space color information: rr r 2 + rd2 r d (3) I f (r) = K 2 exp − 0 σ 2σ 2 σ2 where, rd represents the loss of the transmission path of color information in the building space; σ represents the delivery rate of color information packets in the building space; r represents the color information of the building space without loss; I0 is the attenuation coefficient of data transmission channel; K represents the instantaneous power generated by transmitting the color information of part of the building space [4]. The detection process of communication terminal is shown in Fig. 1. Observing Fig. 1, we can see that the data transmission capability of this path is evaluated according to the established building space color information transmission path function, and buffer spaces with different capacities are allocated to communication terminals. Different data blocks in the building space color information leave the same data transmission path. In order to avoid storing excessive architectural space color information in the communication terminal, the uplink and downlink in the data transmission channel are set to be independent of each other to reduce the path loss rate caused by the attenuation of the transmission path, so that the architectural space color
Intelligent Extraction of Color Features in Architectural Space
Transmission path function
Monitoring Center
Data transfer direction Machine vision
43
U
W=AW+BW
Buffer space
Fig. 1. Communication terminal detection process
information The transmission path characteristics conform to the Rice distribution, and the time interval of the current data transmission path is determined, and the reliability mathematical model is obtained by combining the building space color information transmission path function: √ r (4) F(r) = 1 − Q 2K, σ where, Q represents the reliability coefficient of color information in building space, and the relationship between the transmission path of color information in building space and the loss rate of data packets can be intuitively expressed through the reliability mathematical model [12, 13].
3 Intelligent Extraction and Calculation of Architectural Space Color Features Based on Machine Vision According to the mathematical model of intelligent extraction of architectural space color features, the intelligent extraction of architectural space color features based on machine vision is calculated [5, 6]. The Monte Carlo statistical technology is used to obtain the distribution of building space color information transmission channel and data transmission path distribution, the distribution analysis method is used to calculate the success probability of building space color information transmission in the transmission path, and Rice distribution is used to calculate the transmission time of building space color information. For description, the single transmission data volume of building space color information follows the N (μ, σ 2 ) distribution, and the transmission rate of building space color information is determined. The ability of spatial color information is good, which can ensure the stable and safe transmission of architectural spatial color information in the communication network. Under the condition that the communication link maintains normal connection, the probability of normal transmission of architectural spatial color information to the communication terminal is: alink = P(t ≤ td )
(5)
44
Z. Huang and L. Qin
Assume that the distribution of effective components of building space color information communication link is fTR (tR ), and B fr (t) represents the distribution of communication channels for single transmission of building space color information. According to the distribution characteristics of building space color information transmission channels, the relationship between the distribution of communication links and the transmission time of building space color information is exponential. The distribution between them can be expressed by the following formula: f (tR , t) = fTR (tR )fT (t)
(6)
The following formula is used to express the successful transmission probability of the communication channel where the architectural space color information is located: t al = ∫∞ 0 ∫0 RT (t)fTR (tR )dtR
(7)
where, tR represents the probability of communication link congestion. When calculating the transmission capacity of the communication channel where the building space color information is located, it is necessary to obtain the busy probability of the communication link, and combine it with the probability of successful transmission of the communication channel to obtain the transmission efficiency of a data node in the building space color information. According to the transmission performance of the building space color information transmission channel, Obtain the transmission performance index of the communication link and data transmission path, which can calculate whether the path is connected or busy when the communication link transmits the color information of the building space, and the probability that the color information of the building space can be reliably transmitted to the communication terminal, which can be calculated through the main channel in the data transmission channel [7]. The reliability calculation model is shown in Fig. 2.
Average model
threshold test
Statistics module
Communication link congestion
data node
Fig. 2. Reliability calculation model
Through this calculation model, the ability and effect of the communication channel to transmit architectural space color information can be calculated. During the communication process of network data, the time interval samples for the transmission of
Intelligent Extraction of Color Features in Architectural Space
45
architectural space color information on the unit transmission channel are x1 , . . . , xn , i is the transmission channel, and N is the time. The total number of interval samples, use the following formula to calculate the average value of the probability of successful transmission on this transmission channel: XN =
N
xi /N
(8)
i=1
where, xi represents the time taken for building space color information to be transmitted in the communication link, and XN represents the average value of the probability of successful transmission of building space color information in the transmission channel, In order to fairly calculate the reliability of building space color information, it is necessary to fuse the probability of successful transmission of building space color information in the communication link and transmission channel. The average iterative value of successful transmission probability is: XN +1 =
XN × N + xN +1 N +1
(9)
In the fusion process, the transmission path of building space color information conforms to normal distribution, and the transmission channel conforms to Rice distribution. After fusion, the reliability of building space color information obtained in different paths and different transmission channels is:
N 2 xi − XN /(N − 1) (10) SN = i=1
where, SN represents the reliability of the color information of the building space. After the reliability calculation of building space color information is completed, the transmission path and channel of building space color information with the highest transmission success rate are selected as the optimal data transmission link [8–10].
4 Analysis on Color Characteristics of Architectural Space The recognition principle of the GRB color space positioning method is to use the color difference of the image to use the HIS color space visual perception model to convert the data of each frame of the image into a signal for output. The conversion principle is shown in Fig. 3. It can be seen from Fig. 3 that for the color signal of the building designed in this paper, the signal lamp will be affected by the external light, weather and mirror at every moment, making the correct color transmission of the signal lamp deviate. GRB color space positioning method is the most effective color signal recognition method to solve the above problems. Red, green and yellow are set as the most relevant colors in the GRB color space signal recognition side, so as to achieve the highest sensitivity of the method. After the GRB color space positioning method is used to identify the color
46
Z. Huang and L. Qin
Fig. 3. Effect of space conversion of color signal
signal of the building space, the signal feature is extracted with the following formula, as shown below: 2μ (11) y =p× 3k Among them, p represents the space vector represented by various colors; μ represents the signal conversion factor; k represents the color signal component.
5 Building Abnormal Signal Alarm Judgment The accuracy of machine vision is higher than that of traditional human vision. The signal captured by machine vision is mainly completed through camera calibration and image edge testing [11]. In order to achieve the effect of machine vision signal conversion, first of all, the signal centralized monitoring transmits the alarm information of the equipment in the mechanical room to the patrol robot in real time. The patrol robot automatically arrives at the location of the alarm equipment according to the received alarm information, and automatically recognizes the alarm board card, Check the accuracy of alarm information through image contrast intelligent analysis, and transmit the review results and confirmation pictures to the centralized signal monitoring system; Then, through the centralized signal monitoring terminal, it can realize remote manual control of the patrol robot to reach the designated position, take videos or pictures as required, and transmit them to the monitoring and reading terminal through the centralized signal monitoring network channel in real time. The machine vision signal conversion process is shown in Fig. 4 below.
Intelligent Extraction of Color Features in Architectural Space
47
Start
Image data acquisition
Y Convert color features
Color feature analysis
Y Geometric Feature Analysis
Data transformation
N
End
Fig. 4. Flowchart of machine vision signal conversion
6 Building Space Color Information Reconstruction Algorithm Firstly, the reconstruction algorithm of building space color information is established, and the data reconstruction of the key index system of building space color information is realized through the relevant analysis technology, so as to extract the key data features of building space color information. On this basis, based on the data center, the important big data of building space color information is distributed and mined. The adaptive adjustment of big data mining is realized by adjusting the data stream content and data package ID [12]. Then analyze the state vector of the building space color information data subsystem, realize the design of the critical state mathematical model, complete the clustering analysis of the important index data mining of the building space color information under the optimal objective function condition, and obtain the reconstruction result of the unstructured high-dimensional large data building space color information a, such as formula (12) shown: 2 (12) a= w I (x, y) − I (x + x1 , y + y1 ) Among them, x, y, x1 and y1 are the coordinate values of the change of color information sampling in the building space before and after the change; I is the frequency corner; w is the coefficient of function calculation. The feature partition technology is used to realize the optimal mining of color information in building space. The fuzzy C-means clustering technology is used to establish the classification query objective function of unstructured multi-dimensional big data. The fuzzy feature extraction technology is used to mine the data sets generated by different dimensions layer by layer to realize the reconstruction of big data of building space color information and complete the feature extraction of building space color information.
48
Z. Huang and L. Qin
In order to improve the effect of data analysis feature extraction algorithm in multidimensional environment, this paper designs a multi-dimensional image data acquisition model combined with entropy method. First, use the sliding window to realize dynamic update data analysis to reduce the adverse interference of abnormal data on incremental primary metadata. At the same time, the reverse k-nearest neighbor algorithm is used to monitor the real data outliers in the sliding window, and the abnormal value data information that affects the calculation can be effectively removed; Judgment extraction is implemented for data analysis. In the acquisition process, in order to avoid the divergence matrix of data analysis, a decomposition operation is performed on its eigenvalues: the space is obtained by detecting all data analysis information and its characteristics in the main element, and according to the entropy value method Detect the effective characteristic space of the pivot element, and project the multi-dimensional big data analysis of the current window to determine the data type, thereby realizing incremental image data acquisition. The dimension feature extraction process is shown in Fig. 5.
Start
Modify the feature vector N
Is it the best projection orientation?
Y Determining Image Extraction Capability
N Does the eigenvalue meet the threshold requirement?
Y Get secondary incremental properties
End
Fig. 5. Dimensional feature extraction process
Step 1: use the principal component analysis method to add and modify the feature vector of the internal covariance matrix data, substitute the feature vector into the benchmark function and measure its recognition ability. In order to select the best projection
Intelligent Extraction of Color Features in Architectural Space
49
orientation, the standard model is established by using the datum function to select the best projection orientation. Step 2: use the entropy method to determine the weighting and judgment ability of image data extraction, and the overall contribution of its main components. The sample matrix of the network with m indexes and n targets is set to f , the system is evaluated after normalization, and the entropy value and entropy weight of the i index are counted. The calculation process is as formula (13) shown: h = −w
n
fma
(13)
i=1
Step 3: Complete the acquisition of level 2 incremental attributes. First, detect and obtain all the principal component information attributes with large comprehensive scores and overall contribution values from the database system. At the same time, perform subspace projection, transformation and update operations. The execution object is the main element. When high-dimensional data streams appear in the window data containing high-dimensional features, control the data type to make it far lower than the type of dimensions. At this time, the value of the intra class divergence matrix is the largest, That is, the highest dimension subspace can be solved under the operation order. At this time, the maximum function value can be obtained from the window data. For the calculation of high-dimensional representation value and representation vector resolution in the database, since matrix J is included, and the eigenvectors corresponding to matrix J are also the same, the method of directly solving matrix sample f can be converted into matrix J . To solve, that is, to directly find the eigenvalues of the matrix. The scatter matrix improvement method is used, assuming a time sample, the instantaneous average of each type in this time sample is p, and the total data flow at t time is p(t), then the updated value of the average is as shown in formula (14): n(t + 1) (14) p(t) = n(t − 1) + 2 Machine vision method can realize intelligent acquisition on the basis of in-depth study on the structural characteristics of unstructured multi-dimensional big data building space color information. Therefore, this paper proposes an optimization model for extracting color information from unstructured multidimensional big data architecture space based on machine vision. The machine vision extraction optimization model is shown in Fig. 6. According to Fig. 6, the weights are trained in the machine vision feature optimization model to obtain the relevant model configuration, and the corresponding sequence is generated according to the model configuration. Use association rule mining technology to achieve useful mining of architectural space color information, use machine vision methods to achieve deep learning, and use big data analysis technology to enrich machine vision training programs. Under the training of machine vision, the training task of machine vision is set to T , the color information extraction task of building space is set
50
Z. Huang and L. Qin
Load pretrained weights
Save model weights and model configuration
Generate a sequence by outputting a specific decoding strategy Fig. 6. Machine vision extraction optimization model
to d , and the time required for the extraction task is allocated. The completion time of the feature extraction scheme of building space color information is shown in the formula (15) shows: n (t − 1) (15) T (x) = max d =1
Using the parallel data stream reconstruction method, combined with the depth learning calculation of machine vision, the key index features of building space color data analysis are extracted to achieve the feature extraction and analysis of the acquired data at the time limit level, and the temporal feature extraction algorithm of big data analysis color information flow is established. Assuming that the time limit sequence scalar of the image data extraction part of big data analysis association rule information at the time limit level is o, Using the characteristic quantification association rules, the maximum likelihood estimation method is used for the obtained data, the range of the modified association rule feature extraction method is locked, and then the maximum difference G generated by the feature extraction method for the data flow content at the time limit level is obtained through the inference of the fuzzy inference method, as shown in formula (16): 1 t mn n
G=
m
(16)
i=1 i=1
Through the method of automatic correlation coefficient, the correlation rules of nodes are extracted, and the automatic correlation coefficient is calculated. Under the
Intelligent Extraction of Color Features in Architectural Space
51
framework of time dimension data, data flow rules related to nodes of distributed network structure are provided. Data flow models are obtained from the time sequence level through nonlinear time series method. Then, through the search of building space information, fuzzy C clustering algorithm is used to complete the allocation and arrangement of information under linear time conditions. The flow field distribution function of the association rule data structure represents the fuzzy nodes under the data attribute distribution. Based on the rule data, the combination form after analysis and matching through the nonlinear time series method can be represented as follows: C = min{max Z}
(17)
In the formula, C represents the reliability of data access; Z represents the order of sampling samples according to the association rules. It provides a primary spatial framework for the analysis information flow of the acquired data from the perspective of time, and is implemented according to the clustering calculation C in the primary spatial structure. Feature extraction optimization from time perspective of color information analysis in architectural space.
7 Experiment and Result Analysis In order to verify the effectiveness of the intelligent extraction method of building space color features based on machine vision proposed in this paper, the effectiveness of this method is verified by comparing with the traditional intelligent extraction method of reliability color features. Starting with a set of building space color information packets from the communication link, to selecting the optimal transmission path, and then to the whole process of successfully transmitting to the communication terminal, the reliability model of building space color information based on machine vision constructed above is used and proposed. The reliability evaluation method based on this method is used to calculate the packet loss rate and packet delivery rate of building space color information. After the calculation method is used in this paper, the traditional calculation method is used to calculate the packet loss rate and packet delivery rate of traditional building space color information. The experimental parameters as follows: the transmission rate of building space color information is set to 2216 Mbit/s, the success rate of building space color information packet transmission is 85%, the standard length of building space color information packet is 140284 bit, and the number of building space color information transmission nodes is 150. Use formula (18) to calculate the packet loss rate of each building space color information extraction node: Packet loss rate = 1 −
Total number of packets received by relay node Total number of packets sent by node
(18)
The results of packet loss rate calculated by this method and traditional methods are shown in Fig. 7.
52
Z. Huang and L. Qin
Packet Loss Rate %
The method of this paper
35 30 25 20 15 10 5
50 40 30
0
5 Ite 10 rat i tim ons/ e
20
15
Practical method
e/s Tim
10
20
Fig. 7. Comparison of the method in this paper and the actual calculation of the packet loss rate
It can be seen from Fig. 7 that after using the reliability calculation method of building space color information proposed in this paper, with the increase of the transmission time of building space color information, the packet loss rate generated by building space color information nodes gradually increases, and tends to be stable near the standard loss rate, which is highly consistent with the standard loss rate. However, with the increase of the transmission time of the color information in the building space, the packet loss rate generated by the color information node in the building space has been increasing, far exceeding the standard loss rate, and the color information in the building space has been in an unbalanced state, which can verify that the packet loss rate generated by the color information node in the building space is far less than the traditional method, and the packet loss rate is more real stable. In the absence of data packet loss, the transmission delays generated by the transmission of architectural space color information at different times are shown in Fig. 8. Actual calculation
Propagation delay/mm
Method of this paper
70 60 50 40 30 20 10
50
0
100
20
200
300 unt o fs data/i ensitive tem
Amo
10
40 of e 30 mber s/tim Nu ment i r e exp
400
Fig. 8. Data transmission delay experimental results
Intelligent Extraction of Color Features in Architectural Space
53
It can be seen from the analysis of Fig. 8 that in this method, with the continuous increase of the amount of color information in the building space, the delay generated in the transmission process of the color information in the building space is small. During the single transmission, the packet delivery rate of the color information in the building space is gradually reduced. The calculated result has a small difference with the real result and a high degree of coincidence, This is because when analyzing the influencing factors of the reliable transmission of color information in building space, this paper introduces variables such as transmission path loss and network data buffer space, substitutes variables into the reliability model of color information in building space, calculates the successful transmission probability of the communication link of color information in building space, and conducts effective multipath attenuation operations on the transmission path, The transmission path distribution and communication link distribution of color information in building space are obtained by using Monte Carlo technology. The reliability model of color information in building space is constructed by using the principle of machine vision communication, and it is effectively evaluated and calculated. The factors affecting the reliability of color information in building space are comprehensively analyzed, so that the final calculation results are highly consistent with the standard results. In the traditional method for calculating the reliability of building space color information, the transmission delay of building space color information is relatively high, and the packet delivery rate increases with the increase of building space color information, and the calculated results are quite different from the standard results. On this basis, the accuracy of the extraction results is verified with the average relative error as the indicator. The calculation formula of the indicator is as follows: M 1 ym − yw∗ (19) ε= M ym m=1
where, ε represents the calculated relative error result; M represents the number of spatial color samples to be calculated; ym represents the actual value obtained when calculating the m model; yw∗ represents the extraction result of the extraction model. In order to form an experimental comparison, the traditional graph-based extraction method and PSOEM-LSSVM extraction method are used as the comparison method to complete the performance verification together with the method in this paper. The color feature extraction results of building space obtained by the three methods are shown in Fig. 9. The relative error is calculated according to the figure above, and the results are shown in Table 1. According to Table 1, with the increase of data volume, the extraction errors of traditional graph model based extraction methods and PSOEM-LSSVM based extraction methods are increasing. The extraction ability of the machine vision based extraction method proposed in this paper is significantly higher than that of traditional early warning methods. The extraction process is not affected by external factors, and the extraction relative error is always less than 0.02%, which has high applicability in real life.
Z. Huang and L. Qin
Color data volume/GB
25.0 22.5
forecast result
20.0
actual results
17.5 15.0 12.5 10.0 7.5 5.0 2.5 0
1
2
3
4
5
6
7
8
9
10
Extraction time/min
(a) Graph model extraction results
Color data volume/GB
25.0 22.5
forecast result
20.0
actual results
17.5 15.0 12.5 10.0 7.5 5.0 2.5 0
1
2
3
4
5
6
7
8
9
10
Extraction time/min
(b) PSOEM-LSSVM extraction results 25.0
Color data volume/GB
54
22.5
forecast result
20.0
actual results
17.5 15.0 12.5 10.0 7.5 5.0 2.5 0
1
2
3
4
5
6
7
8
9
10
Extraction time/min
(c) Extraction results of this method Fig. 9. Results of color feature extraction in architectural space
Intelligent Extraction of Color Features in Architectural Space
55
Table 1. Relative error calculation results Sample number
Relative error/% Graph model
PSOEM-LSSVM
Machine vision
1
1.43
2.58
0.01
2
1.58
3.02
0.02
3
1.61
3.69
0.01
4
1.66
4.28
0.01
5
1.67
4.99
0.01
6
1.82
5.22
0.02
7
1.94
5.74
0.01
7
1.99
6.10
0.02
8
2.01
6.66
0.01
9
2.20
6.93
0.01
10
2.25
7.24
0.02
8 Conclusion This paper analyzes the current situation of architectural color analysis and auxiliary design, and proposes the optimization and innovation of the existing analysis methods and research methods. Through research and analysis, this paper puts forward a new idea and breakthrough point for expanding the digital analysis method of architectural color, that is, using the fuzzy cluster analysis method to make the architectural color analysis more intelligent and efficient. In this paper, RGB images of single building color are selected as research samples, combined with computer image processing and recognition technology and fuzzy clustering method, and with the help of MATLAB 7.0 software tools. In the research work, the RGB color model most commonly used and the HSV color model most suitable for human eye recognition pattern are selected to facilitate the specific algorithm design of feature extraction and fuzzy clustering program. The example proves that the color feature extraction program can be well applied to the analysis and research of color digital images of buildings, to extract each feature data of single building color, thus preliminarily realizing the color quantization by computer aided means. In the future research, we will consider to further optimize the research in this paper from the perspective of reducing the extraction time.
References 1. Zhao, Y.F., Xie, K., Zou, Z.Z., et al.: Intelligent recognition of fatigue and sleepiness based on InceptionV3-LSTM via Multi-feature Fusion. IEEE Access 8, 144205–144217 (2020) 2. An, Z., Li, S., Jiang, X., et al.: Adaptive cross-domain feature extraction method and its application on machinery intelligent fault diagnosis under different working conditions. IEEE Access 8, 535–546 (2020)
56
Z. Huang and L. Qin
3. Lin, X., Wang, X., Li, L.: Intelligent detection of edge inconsistency for mechanical workpiece by machine vision with deep learning and variable geometry model. Appl. Intell. 50(7), 2105–2119 (2020) 4. Liu, Z., Zhang, X., Zhang, L., Liu, Q.: Research on the progress and trend of architectural color in China from the perspective of mapping knowledge domain. Decorate 2022(01), 136–138 (2022) 5. Hao, Y., Chen, F.: Application of color aesthetics in architectural facade design. Build. Sci. 37(11), 177–182 (2021) 6. Gao, Z., Zou, G.: Study on the facade color design of urban residential buildings based on extension data mining. Math. Pract. Theory 49(09), 11–19 (2019) 7. Xin, Y., Cui, M., Liu, C., et al.: A bionic piezoelectric tactile sensor for features recognition of object surface based on machine learning. Rev. Sci. Instrum. 92(9), 95–103 (2021) 8. Cao, J., Ye, L.: Multi view 3D reconstruction method of virtual scene in building interior space. Comput. Simul. 37(09), 303–306+381 (2020) 9. Mehdizadeh, S.A.: Machine vision based intelligent oven for baking inspection of cupcake: design and implementation. Mechatronics 82(1), 102–116 (2022) 10. Wang, C., Zhou, J., Wu, H., et al.: Research on the evaluation method of eggshell dark spots based on machine vision. IEEE Access 16(8), 1–11 (2020) 11. Gao, J., Wang, Z.: The artistic embodiment of color aesthetics in modern architectural design. Ind. Constr. 51(07), 287 (2021) 12. Jiang, S.: Rational selection of color elements in architectural design. Ind. Constr. 51(02), 218–219 (2021)
Stability Tracking Detection of Moving Objects in Video Images Based on Computer Vision Technology Ningning Wang1(B) and Qiangjun Liu2,3 1 Aba Teachers University, Wenchuan 623002, China
[email protected]
2 Krirk University, Bangkok 10220, Thailand 3 Guangxi Vocational and Technical College, Nanning 530226, China
Abstract. In order to accurately collect images of moving targets and improve the accuracy of target tracking and detection, this paper proposes a new grayscale image moving target stability tracking and detection method based on computer vision technology. Set camera parameters and light source parameters, and accurately collect moving target images through computer vision technology to improve the accuracy of moving target tracking and detection. The video image is preprocessed, and the specific preprocessing steps include image enhancement and edge detection. The random forest is used as the classifier to eliminate the background, generate a rough target ROI map, and implement the corresponding scale recognition on the ROI area to realize the recognition of moving objects in video images. Combining the Camshift algorithm and the Kalman filter algorithm, the existing moving target tracking method is improved, and the stable tracking of the moving target is implemented. The stability detection of moving targets is implemented by the background difference method, and the selected background difference method is the ViBe algorithm. The test results show that the design method correctly handles more frames and has a higher accuracy rate. It processes more frames per second and has lower tracking and detection errors. The average tracking detection time of this method is less than 3000 s. Keywords: Computer Vision Technology · Video image · Edge Detection · Moving Target · Tracking Detection
1 Introduction Human perception of the external world is mainly obtained through people’s five senses, of which more than 80% of information is obtained through people’s eyes, which is what we call vision. As we all know, the human eye is an important sensory organ, and it is one of the main sources of human information from the objective world [1]. It can be seen that image visual information plays a vital role in human life and work. For this reason, image processing has naturally become one of the most popular research fields. Among © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 57–72, 2024. https://doi.org/10.1007/978-3-031-50574-4_5
58
N. Wang and Q. Liu
them, the application and research of high-tech image processing technology play a vital role in the realization of security detection technology, aerospace technology, robot vision technology, vehicle navigation, global positioning and even medical and military aspects [2]. In real life, we mainly divide images into moving and stationary ones. In most cases, people tend to be more interested in moving objects. Therefore, it is of great significance and broad application prospects to study the tracking and detection of moving objects. With the rapid development of digital signal detection and transmission processing technology and computer science and technology, as well as the continuous maturity of theory and the continuous improvement of hardware, people began to use cameras, video cameras, digital cameras and other high-precision image acquisition tools to obtain various image information and convert it into digital signals, then, computer algorithms and digital image processing methods are used to achieve the processing of image visual information, so as to meet people’s various needs. With the rapid development of research in recent years, intelligent video surveillance system is a new application direction rising in recent years. It has become a more advanced technology with high-tech content in the world. It is a frontier topic in the field of image research in the world today. It has the characteristics of interdisciplinary integration and technical integration. The intelligent video monitoring system is simply to understand, analyze and process the video signal, and further control the video monitoring system. When necessary, it can also alarm and other processing. Intelligent monitoring technology involves computer science and technology, psychology, physics, applied mathematics, etc., as well as sensor technology, detection technology, signal processing technology, image processing technology and other disciplines. The tracking and detection of moving objects in video is not only one of the core technologies of intelligent video surveillance system, but also a solid foundation of artificial intelligence research, and also one of the key technologies to realize intelligent robots and artificial intelligence. It has potential economic value and broad application prospects in intelligent monitoring systems, advanced human-machine interfaces, human motion analysis, virtual reality, and content-based image retrieval and storage. Therefore, the research focus is on the stability tracking and detection technology of moving objects in video images, and a stability tracking and detection technology for moving objects in video images based on computer vision technology is designed. For the research of this problem, the current research on the detection and tracking methods of moving targets at home and abroad is still limited by the particularity of environmental factors. Most tracking algorithms or tracking systems still have tracking loss and drift problems [3] when the moving scene environment is complex or the target scale changes greatly, resulting in the reduction of the image acquisition accuracy of moving targets. Therefore, the detection and tracking of moving objects in complex environments is still a major problem being studied in this field.
Stability Tracking Detection of Moving Objects in Video Images
59
2 Gray-Scale Image Moving Target Stability Tracking Detection 2.1 Computer Vision Acquisition of Moving Target Images Design a computer vision detection system based on computer vision technology to implement image acquisition. The hardware of the designed computer vision inspection system consists of VT-LT2-HR1260W ring light source and DFK23G445 camera. In order to meet the lighting needs of the CCD camera and obtain high-quality images, the light source is selected. In the selection, according to the characteristics of various lighting methods and the application of the subject, it is believed that the use of flat LED scattered light sources for lighting can improve most of the lighting. Display force for parts with irregular appearance. The illumination light source of the selected flat LED array is VT-LT2-HR1260W ring light source. In this light source, the light emitting diodes are closely arranged on the ring-shaped light source housing, which is divided into seven areas with a total of seven circles. The LEDs in any circle and any area can be controlled to turn on and off through the control buttons. If a single area is lit, its optical characteristics are similar to parallel light in a certain direction; When all lights up, it is uniformly scattered light vertically irradiated. It is helpful to avoid part shadow; The diameter of the inner circle of the ring shape is larger than the diameter of the CCD lens. The CCD lens can be moved up and down to expand and contract into the light source, so that the distance between the lens and the workpiece can be adjusted. When using, fix the light source on the bracket frame of the light source frame. The light source frame is mainly composed of a chassis, a bracket and a bracket frame. The bracket and the chassis form an angle of about 70 degrees. The chassis and the bracket frame are parallel to the horizontal plane. The light source is installed on the bracket frame and is vertical. The assembly line is flat and facing above the camera lens, the bracket frame moves up and down to adjust the distance between the light source and the camera lens, and the light source controller is connected to the light source line port. The selected charge-coupled device CCD device is a DFK23G445 camera. When in use, support the camera on the camera bracket. The camera bracket is mainly composed of a chassis, a bracket and a runner. The bracket and the chassis form an angle of about 70°. The bracket chassis is parallel to the horizontal plane. After the CCD camera is installed on the bracket, it is perpendicular to the assembly line. It is flat and just above the parts flowing on the assembly line. The camera terminal mesh is connected to the PC mesh, and the camera power supply is connected to the power socket. The runner is used to adjust the focal length between the CCD camera lens and the parts to be inspected. In order to obtain better. For the image, the position of the light source and the CCD is very important. Therefore, the position of the CCD and the light source can be determined by visual inspection before the measurement, and then accurately adjusted according to the actual shooting situation. DFK23G445 camera volume is height: 29 mm, width: 29 mm, length: 57 mm. 2.2 Image Processing The gray-scale image is preprocessed, and the specific preprocessing steps include image enhancement and edge detection [4].
60
N. Wang and Q. Liu
The processing method used in image enhancement is gray-scale transformation. A gray-scale transformation function based on particle swarm optimization is designed to implement gray-scale transformation of gray-scale images. 1) Normalized original image The image to be processed and the enhanced image are represented by d and r, and the image pixels with coordinates (X , Y ) are represented by d (X , Y ). Normalize d to obtain image D: D(a, b) =
d (X , Y ) − Kmin Kmax − Kmin
(1)
In formula (1), Kmax refers to the maximum gray value of the d image; Kmin refers to the minimum gray value of the d image. 2) Using the particle swarm algorithm PSO, initialize the speed aj and position bj of each particle, and update the optimal position Lbest and the optimal position Ubest passed by the group according to the fitness function. At the same time, update the corresponding aj and bj until the optimal solution c and d [5] are found. When contrast transformation is used for image enhancement, the measure of similar variance shown in the following formula is used as the fitness function of particle swarm optimization algorithm to determine the optimal position of particles. 1 2 K (X , Y )− E(Da ) = ∗ m X =1 Y =1 Q P 1/m∗ K(X , Y ) X =1 Y =1 Q
P
(2)
In formula (2), Q is the width of the image; P is the height of the image; m is the number of particles; K(X , Y ) is the gray value of the d image. The larger E(Da ), the higher the image quality and the better the visual effect of the image. 3) Using particle swarm optimization algorithm to find After the two optimal parameters c and d of the Bata function, use formula (3) to transform the image D to obtain O, and use formula (4) to inversely transform O back to the gray space to obtain the final enhanced image R. O = O(c, d ; D)
(3)
R(X , Y ) = (Kmax − Kmin ) ∗ O + Kmin
(4)
Edge detection uses an improved edge detection method. 1) Extension of Sobel operator In view of the defects of the conventional Sobel operator, the edge detection algorithm can be improved. First, Sobel operator lacks the correlation information of adjacent points outside the horizontal and vertical directions, resulting in the loss of
Stability Tracking Detection of Moving Objects in Video Images
61
image edge information. In order to ensure the integrity of image edge information, Sobel operator should be extended to calculate image gradient information in multiple directions. Sobel operator is extended in 45° direction and 135° direction. There are 4 expanded operators in total, which increases the gradient information of adjacent points, making the gradient operation more complete. Since the gradient is a vector, it contains both magnitude and direction information. However, if it is synthesized strictly according to the vector synthesis rule, the amount of computation is extremely large, so the gradient synthesis is approximated as follows: in the amplitude, the maximum value in the four directions is selected as the gradient value of this point; in the direction, the direction with the largest amplitude is determined is the gradient direction of this point. The gradient direction matrix is D(x,y). D(x,y) = 1 means the gradient magnitude direction is 0°, D(x,y) = 2 means the gradient magnitude direction is 45°, D(x,y) = 3 means the gradient magnitude direction is 90°, D(x,y) = 4 indicates that the gradient magnitude direction is 135°. Where the gradient is represented by a two-dimensional vector composed of amplitude and direction. 2) Edge refinement In order to accurately locate the image edge information, non maximum suppression can be used. Because the gray value jump at the edge of the original image is not an ideal step signal, but there is a transition interval. When the Sobel operator is used to calculate the transition interval, all the pixels in the interval will be judged as edge parts, resulting in a wide edge detected in the image. Non maximum suppression can effectively solve this problem. Non maximum suppression is a process of finding the local maximum in the direction perpendicular to the image edge, that is, in the same direction as the gradient maximum. 3) Binarization threshold that conforms to the visual characteristics of the human eye Aiming at the problem of adaptive binarization threshold setting, a fitting scheme that conforms to human vision is proposed on the basis of consulting relevant materials and conducting a large number of tests. There are two aspects of human vision that need to be considered, one is the sensitivity of the human eye to the image structure, and the other is the sensitivity of the human eye to the gray value. According to the noise structure mask theory, the human visual system has a low sensitivity to noise at the edge of the image or where the image structure is fine, and a high sensitivity to noise in the smooth area of the image. The gradient image can better reflect the structure of the image. If the gradient image has a large value in an area, the structure here is relatively fine, and the human eye has a low sensitivity to noise; While the area with small gradient value is the smooth area of the image, and the human eye has high sensitivity to noise [6]. Therefore, the gradient image of the image to be processed should be considered when setting the threshold. In view of the above problems, the mean value of the data in the 3 × 3 window of the gradient image can be judged. If the mean value in the window is large, it means that the original image has a large change in this area, which belongs to the structured area or the edge of the image. The human eye is not sensitive to noise in this area. To ensure the edge integrity of the image, a small binarization threshold is set. If the mean value in the window is small, it means that this area of the original image is a
62
N. Wang and Q. Liu
smooth area, and the human eye is more sensitive to noise here. A larger threshold should be set. Because the sensitivity of human eyes to different gray values is not the same, even if two pixels have the same gradient, some pixels will be considered as image edges, while others will not. Therefore, in order to better fit the human visual characteristics, a smaller threshold should be used in the gray-scale interval with high visual resolution, and a larger threshold should be used in the gray-scale interval with low visual resolution. 2.3 Moving Target Recognition The problem of target recognition can be regarded as the problem of target and background classification, which separates the target from the background. A random forest is used as a classifier to eliminate the background, so as to generate a rough target ROI map, and then carry out corresponding scale recognition on the ROI area to realize the recognition of moving targets in gray-scale images [7]. First, in the training process, a large number of positive and negative samples can be added to the random forest for training, and the random forest can be generated according to the FHOG-Lab features; Then, the FHOG-Lab features are extracted for the video frame images to be recognized. If the number is lower, the execution speed of the random forest is accelerated; Then the random forest target probability is obtained through the trained random forest, and the target probability map is generated; Finally, the target center ROI area is enhanced by threshold segmentation and morphological filtering. Let the training set be α and the total number of samples be M . extract V -dimensional FHOG-Lab features from each training sample, where the training data is represented by a matrix β M ×V with M rows and V columns. A random forest is composed of multiple decision trees. Each decision tree can be viewed as a set of root nodes, branch nodes and leaf nodes. The root node can be regarded as a special branch node; Branch nodes can be regarded as a weak classification, mainly including feature number v ∈ {1, 2, · · · , V } and threshold information; The leaf node is used to calculate the target probability, mainly including information such as various samples of the node. During training, the data in column V ∗ of V will be randomly selected, and the best column (with the smallest class purity) v∗ will be selected according to the corresponding classification mechanism as the feature number of the current node and the threshold will be calculated. First, calculate the category purity of the current node. If the value is greater than zero, it means that the current node belongs to a branch node and can continue to split; Where samples smaller than the threshold are split to the left child node, otherwise they are classified to the right child node. When the class purity is close to zero or the sample is less than the minimum sample number, the termination condition is met, and the current node is the leaf node. Generally, the expected value of information (entropy) is used to calculate the category impurity of branch points: Z =−
l q χj , δ log2 q χj , δ j=1
(5)
Stability Tracking Detection of Moving Objects in Video Images
63
In formula (5), l refers to the number of categories; q χj , δ refers to the probability that δ belongs to class j. the trained random forest consists of mf decision trees that Assuming Rf f ∈ 1, mf , and each detection window is represented by a vector feature ti , the target probability generated by a leaf node passed to each tree is: wg (ti ) = q χt arg et , δleaf (6) In formula (6), q χt arg et , δleaf refers to the probability that the training set reaching the target node belongs to the leaf like node sample. The final target probability obtained is: w(ti ) =
mf
wg (ti )
(7)
g=1
Among them, w(ti ) is the probability that the recognition window belongs to the target, and thus the target probability map Yw can be obtained. The threshold t is set, and the area larger than the threshold has a greater probability of belonging to the target, and the position smaller than the threshold will be regarded as the background. Then, the closing operation in morphological filtering is used to fill the small space in the target binary image object segmented by the threshold value, connect adjacent objects and smooth the target boundary, enhance the target ROI area, and ensure that the real target area is not missed. Through the simple classification of random forest, the approximate center position of possible targets in the image can be quickly obtained. Therefore, the target ROI region can be used to provide the target central ROI region for subsequent target recognition, and the corresponding scale recognition of the classifier can be directly carried out in the target central ROI region, avoiding a large amount of time consumption required to search for targets in the whole image. In recognition, the background is first separated by the trained random forest, and the ROI area of the target is obtained; Then the LIBLINEAR offline training of the linear SVM is used to generate the basic classifier. Set the window to browse the image sequence, calculate the corresponding scale on the ROI area, use the basic classifier trained in the learning stage to classify the obtained feature vector, and mark the target area with a rectangular frame [8]. Since the random forest can be trained in a large number of training samples, the problem of insufficient training samples for SVM is solved. When training the LIBLINEAR operator, the target samples and the difficult samples of random forest recognition errors can be used to complete the offline training of the classifier, and finally the best robust recognition operator can be obtained. The specific steps of offline training of classifier are as follows: Step 1: Prepare the samples for training the linear classifier, including the positive sample set and the negative sample set; Step 2: Extract the 34 channel FHOG-Lab features of the positive and negative samples used for training, and label the samples;
64
N. Wang and Q. Liu
Step3: Train the features of the positive and negative samples according to the format required by the LIBLINEAR of the linear SVM, and generate the initial model ϕ0 on the initial positive sample set and negative sample set ε0 by means of Bootstrapping, and then execute ϕ0 to negate all the sources. Scanning and detection of samples, and classify the detected wrong sub-images as difficult negative samples J (ϕ0 , ε0 ), and then perform secondary training on ϕ0 with all positive samples, negative samples and difficult negative samples to obtain model ϕ1 , and then detect the source negative samples. The difficult negative sample J (ϕ1 , ε1 ) is obtained, and this cycle is repeated 3 times to reduce the false detection rate, and finally the final training result is saved. In order to speed up the removal of background by random forest, the image is processed by 4 × 4 interval sampling, and the target probability value of the unused point is obtained by taking the nearest neighbor among the 4 neighboring points of the interpolation point. This can reduce the operation time by 16 times and make the random forest classification operation consume a small amount of time, so that the recognizer can quickly identify the target. 2.4 Moving Target Stability Tracking Detection Combining the Camshift algorithm and the Kalman filter algorithm, the existing moving target tracking method is improved, and the stable tracking of the moving target is implemented. The Camshift algorithm obtains the color probability histogram of the target area by counting the color histogram of the target area, so as to realize the tracking of the target object. Camshift can solve the occlusion problem of the target object during the moving process, maintain continuous tracking of the object, and the tracking effect is better [9]. If the target to be tested moves too fast and the target position exceeds the Camshift search window, the target tracking will diverge. The idea of adding Kalman filter on the basis of Camshift is to predict the coordinates of the target in subsequent frames through the motion state of the target object. The tracking of the target through the Camshift tracking algorithm alone, or the tracking of the target through the Kalman filter alone cannot achieve the accurate tracking effect in practical applications. Therefore, the Camshift algorithm and the Kalman filter algorithm can be combined to improve the existing moving target. tracking method. Suppose that in the K-th frame scene, the target object is located at the coordinate A, and then moves to the coordinate B in the K+1-th frame scene. When the Camshift tracking algorithm implements target tracking alone, it first creates a search window at the A coordinate, then iteratively searches along the path L, and finally obtains the actual coordinates of the target object in the next frame of image. In the process of Camshift processing, Kalman filter prediction is added, and the actual coordinates of the target in the K+1 frame image are predicted by the information of the moving target in the K-th frame. The actual location B is closer than the location A. The initial search window is first established in the range of point C, and the Camshift algorithm is performed to track, and the actual coordinate B can be found as long as the iterative search along the path G is performed.
Stability Tracking Detection of Moving Objects in Video Images
65
The improved algorithm flow of Camshift fused with Kalman is as follows: (1) First, read the recognized first frame image, frame select or automatically detect the target object in the image, and calculate the center coordinate point of the target object. (2) The initial state color distribution histogram of the tracking target area is statistically calculated, and the color probability projection is obtained through formula calculation, and then the initial Kalman filter is established. (3) Obtain the actual coordinates of the center point of the target object in the current frame image, estimate the coordinate position of the center point in the subsequent frame according to the Kalman filter prediction equation, then create a search window at the predicted coordinates, and iteratively search for the real coordinates of the moving target through the Camshift tracking algorithm until successful matching. (4) The Kalman filter takes the calculated coordinate position as the observation value, updates the Kalman filter parameters, maintains the accuracy of the Kalman tracking algorithm, then extracts the subsequent frame images, and continues to track the target object according to the above steps. The occlusion problem of the target is an unavoidable problem when dealing with target tracking in engineering applications, which directly affects the accuracy of moving target tracking. If the moving target encounters occlusion, the Kalman filter predicts its coordinates in the current frame through the target object position information of the previous frame image, and the target size calculated by the Camshift tracking algorithm at this time is very small. If the moving target is still in the occlusion area in the next frame, use the predicted value of the Kalman filter as the observation value, update the parameters of the Kalman filter, calculate the predicted center position as the initial coordinate of the Camshift iterative algorithm, and then use the target tracked by Camshift. The size is corrected to the original size of the moving target. Then, the stability detection of moving targets is implemented by the background difference method, and the selected background difference method is the ViBe algorithm. The focus of the algorithm mainly includes three aspects: model selection, model initialization, model update mechanism. (1) Pixel background modeling The basic idea of ViBe background modeling method is to model the background of each pixel, store a set of samples for each pixel, and the sample values in each sample set record a series of values displayed by the pixel at the same or adjacent positions in history. Read the current frame image and compare the similarity of all new pixel points with the sample set one by one. If the sampling value is relatively close to the sample set, it can be considered that it belongs to the background with a high probability. Otherwise, the observation value with a large difference is judged as a moving target. The pixel value of the marked point y is η(y), and ηi is used to record the sample value of the background with the serial number i, and the background model formula is represented by the set of n background sample values: ι(y) = {η1 , η2 , · · · , ηn }
(8)
66
N. Wang and Q. Liu
Define a sphere with η(y) as the center and r as the radius, and calculate the number of sample points in the union of the sphere and the background model formula, denoted by #. A minimum threshold #mim is set in advance, and then # and #min are compared. If #>#min, the pixel is judged as the background, otherwise it is judged as the foreground target. The radius r of the sphere and the threshold #in directly affect the accuracy of model detection. (2) Model initialization Many traditional background modeling methods require a long period of video image sequence in the model initialization phase, analyze a long period of data, and estimate the time distribution. The vibe modeling algorithm can use one image to complete the initialization of the background model. In the background model, each pixel contains λ sample values. These λ model sample values are obtained by sampling the neighboring pixels of this point in an image, and the neighboring pixels can be selected in a random manner, and these pixels have similar space-time distribution characteristics. When selecting domain pixel values, the selection range should not be too large, which can reduce the statistical relationship between pixels, and also avoid the stored background model sample set being too large. (3) Model update mechanism The update strategy of ViBe modeling method is a combination of conservative update strategy and foreground point count method. Conservative update strategy: do not use foreground points to fill the background model, assuming that the stationary area is judged as the moving foreground, then this area is always regarded as the moving target foreground. Foreground point counting: Count and count the judgment results of each pixel point. If a pixel point is judged as a foreground point for consecutive times, the pixel point is judged as the moving target foreground, and if the condition is not established, it is classified as a background pixel point. Assuming that the judgment result of the current pixel belongs to the background point, for this pixel, the probability of updating its own background model is 1/ς , and the probability of updating the model sample value of its domain point is also 1/ς . Randomly select the replaced sample values to ensure that the smooth life cycle of the sample values is exponential decay. The detection of ViBe background modeling method is relatively stable, which can effectively adapt to interference such as changes in light brightness or camera rotation jitter. The algorithm is simple in calculation, low in space complexity, faster in processing than other background modeling methods, and takes up less system resources [10]. When using the ViBe algorithm for background modeling, if there is a moving target in the first frame image, the target is judged as a background point. Misidentified as foreground points of motion, a phenomenon commonly referred to as “ghosting”. ViBe’s update strategy can solve the “ghosting” problem caused by the first frame background image. However, the speed of this update is slow, and it takes a long period of time to eliminate the ghost area. If there are other moving targets in the process, the extracted foreground target will be wrong. At the same time, the method of vibe background modeling can not deal with the problem of false background introduced by the stop of moving objects, and it must be further studied and improved. For the first σ frames of the video image sequence, the moving area of the
Stability Tracking Detection of Moving Objects in Video Images
67
foreground object can be obtained first by the inter frame difference method. When using vibe background modeling, reduce the update probability of vibe algorithm of pixels within the range of moving targets and increase the update probability of pixels outside the range. This method can speed up the elimination of ghost areas. After the moving object detection process, the extracted binarized foreground image area often contains holes or narrow connecting lines, etc., and the image sequence obtained during the moving object detection process is morphologically processed by selecting the operation method of closing first and opening later. The opening operation is the process of first etching and then expanding the set ζ through the structural element ξ , which can break small discontinuities and eliminate small protrusions. The closing operation is the opposite of the opening operation, in which ζ is first expanded and then eroded through the structural element ξ , which can fill small gaps and voids. Assuming that there is a set ζ and a structural element ξ , the open operation of the set ζ through the structural element ξ can be expressed as ζ ◦ ξ , as follows: ζ ◦ ξ = (ζ ξ ) ⊕ ξ
(9)
The closing operation of ξ to ζ can be expressed as ζ ·ξ , as shown in the following formula: ζ · ξ = (ζ ⊕ ξ )ξ
(10)
In this way, the morphological processing of the moving target detection image is realized, and the stability tracking and detection of the moving target in the video image is completed.
3 Tracking Detection Performance Test 3.1 Experimental Platform and Experimental Data The experimental platform is Intel Core i5 @ 2.4 GHz, 4 GB memory, Windows7 operating system, and the software environment is: Matlab2012b, Visual Studio2010, Opencv2.4.8. Before conducting the experiment, the target box initialization operation is performed on the target to be tracked in the first frame of the video sequence. The following tests are based on manually initializing the target area first. In order to test the tracking detection effect of the design method, the performance of the design method is tested by selecting some moving target video sequences with relatively complex environments. The complex environment includes a series of situations, such as partial occlusion and total occlusion of moving objects, changes in the scale of moving objects, interference of light intensity on moving objects, and too fast moving speed of moving objects. A total of 6 videos containing various situations are used for performance testing. The video sequences (a) and (c) are the tracking of moving vehicles in environments with large changes in light intensity. Video sequence (b) is motion tracking of a fast car in an environment where leaves are disturbed.
68
N. Wang and Q. Liu
The content of video sequence (d) is the tracking of a fast moving ball. Video sequences (e) and (f) track pedestrians on the road. Video sequence (e) includes object occlusion and lighting changes. In the video sequence (f), the target is occluded, but there is basically no obvious change in lighting. The specific data of the experimental video sequences are shown in Table 1. Table 1. Specific data of experimental video sequences Experimental video sequence
(1)
(2)
(3)
(4)
(5)
(6)
Name
Car1
Car2
Car3
Ball
Manl
Man2
Total frames
542
635
852
185
365
755
Contains targets
452
512
632
154
326
701
Number of frames
have
nothing
have
nothing
have
nothing
Illumination change
nothing
have
nothing
have
have
have
The performance of the design method is evaluated from the following parameters: the correct number of frames processed, the correct rate of stable tracking detection, the number of frames processed per second, the tracking detection error, and the tracking detection time. 3.2 Test Results The test results of the correct processing frame number of the design method are shown in Table 2. Table 2. Correct processed motion video frame test results Experimental video sequence
Number of correctly processed frames (frames)
(1)
448
(2)
509
(3)
630
(4)
152
(5)
325
(6)
700
According to the test results in Table 2, for the six experimental video sequences, the correct processing frames of the design method are high.
Stability Tracking Detection of Moving Objects in Video Images
69
Table 3. Testing accuracy of video image moving target stability tracking and detection Experimental video sequence
Accuracy rate of stability tracking detection (%)
(1)
86.32
(2)
84.63
(3)
87.54
(4)
86.32
(5)
81.20
(6)
79.32
The test results of the stability tracking detection accuracy of the design method are shown in Table 3. The test results in Table 3 show that because the six video sequences contain a variety of situations, the accuracy of the stability tracking detection of the designed method is already high, indicating that its drift suppression ability is better. The test results of the number of frames per second processed by the design method are shown in Fig. 1.
Average frames processed per second (frames)
1000
900
800
700
600
500 (1)
(2)
(3)
(4)
(5)
(6)
Video sequence
Fig. 1. Test results of the video frames processed per second
According to the test results in Fig. 1, the number of frames processed by the design method per second is higher than 800 frames.
70
N. Wang and Q. Liu
The tracking detection error test results of the design method are shown in Table 4. The tracking detection error refers to the distance between the center coordinate of the tracking window and the actual center coordinate of the target in each frame image. The smaller the error, the higher the tracking accuracy. Table 4. Tracking detection error test results of the design method Experimental video sequence
Frame number
The coordinates of the real center point of the target
Tracking detection error of the design method (mm)
(1)
Frame 10
(563,48)
4.62
Frame 20
(525,46)
4.20
(2)
Frame 10
(521,41)
4.75
Frame 20
(504,43)
4.71
Frame 10
(574,35)
4.62
Frame 20
(570,42)
3.85
(4)
Frame 10
(965,40)
3.95
Frame 20
(658,41)
4.20
(5)
Frame 10
(704,51)
4.69
Frame 20
(745,30)
4.71
Frame 10
(685,52)
3.52
Frame 20
(685,62)
3.69
(3)
(6)
According to the error test results in Table 4, in the six experimental video sequences, the tracking and detection errors of the designed method are all lower, which proves that the tracking and detection performance of the designed method is better. The average tracking and detection time-consuming test results for six experimental video sequences are shown in Fig. 2. The test results in Fig. 2 show that, for the six experimental video sequences, the average tracking and detection time of the design method is less than 3000s, indicating that the design method has low time consumption. In summary, because the method in this paper preprocesses the edge of the video image first, the image quality is enhanced. This saves time for image background culling mentioned below. After generating a rough target ROI map, the ROI area is identified with a corresponding scale, thereby realizing the stable tracking of moving objects in the image.
Stability Tracking Detection of Moving Objects in Video Images 4000
(1) (2) (3) (4) (5) (6)
3500
Average tracking and detection time/s
71
3000 2500 2000 1500 1000 500 0 10
20
30 Data volume/GB
40
50
Fig. 2. Average tracking detection time-consuming test results
4 Conclusion Moving target tracking and detection in grayscale images is a very hot topic at present. It has very broad application prospects in national defense, military, health, industrial intelligence, intelligent control, video surveillance, etc. Researching this technology has high practical application value. However, due to the complexity of the actual situation, there is still a big gap between the actual result and the ideal situation. Based on the existing results, this topic proposes a more effective method. On the basis of preprocessing the video image, the image background is proposed by using the random forest algorithm. And corresponding scale identification is performed on the ROI area. At the same time, it combines the Camshift algorithm and the Kalman filter algorithm to perform stable tracking on moving targets. It can track and detect moving targets more accurately and efficiently in complex environments. According to the experimental results, this method can correctly process more frames and lower tracking and detection errors.
References 1. Yoo, K., Chun, J.: Analysis of optimal range sensor placement for tracking a moving target. IEEE Commun. Lett. 24, 1700–1704 (2020) 2. Andriyanov, N.A.: Application of computer vision systems for monitoring the condition of drivers based on facial image analysis. Pattern Recogn. Image Anal. Adv. Math. Theory Appl. USSR 31(3), 489–495 (2021) 3. Xu, S., Wang, J., Shou, W., et al.: Computer vision techniques in construction: a critical review. Arch. Comput. Methods Eng. State Art Rev. 28(5), 3383–3397 (2021) 4. He, Z., Li, H., Wang, Z., et al.: Adaptive compression for online computer vision: an edge reinforcement learning approach. ACM Trans. Multimedia Comput. Commun. Appl. 17(4), 118.1–118.23 (2021) 5. Xu, X., Yuan, Z., Wang, Y.: Multi-target tracking and detection based on hybrid filter algorithm. IEEE Access 8, 209528 (2020) 6. Zheng, Y., Li, Q., Wang, C., et al.: Magnetic-based positioning system for moving target with feature vector. IEEE Access 8, 105472–105483 (2020)
72
N. Wang and Q. Liu
7. Wang, W., Pei, Y., Wang, S.H., et al.: PSTCNN: explainable COVID-19 diagnosis using PSO-guided self-tuning CNN. Biocell 47(2), 373–384 (2023) 8. Wang, W., Zhang, X., Wang, S.H., et al.: Covid-19 diagnosis by WE-SAJ. Syst. Sci. Control Eng. 10(1), 325–335 (2022) 9. Xiong, Y., Liu, Y., Cheng, W., et al.: Research on vision system of grasshopper bionic robot. Comput. Simul. 38(12), 345–348, 361 (2021) 10. Cao, J., Sun, Y., Zhang, G., et al.: Target tracking control of underactuated autonomous underwater vehicle based on adaptive nonsingular terminal sliding mode control. Int. J. Adv. Rob. Syst. 17(2), 451–468 (2020)
Virtual Display Method of Garment Design Details Based on Computer Vision Shu Fang1(B) and Fanghui Zhu2 1 College of Art and Design, Guangdong Industry Polytechnic, Guangzhou 510300, China
[email protected] 2 School of Education, Xi’an Siyuan University, Xi’an 710038, China
Abstract. In order to facilitate customers to see the details of clothing design more intuitively on the Internet, and select fitting clothes truly and effectively. A virtual display method of garment design details based on computer vision is studied. This paper introduces computer vision technology, designs the virtual expression scheme of garment design details, scans garment details with 3D laser scanning technology, obtains laser point cloud data and conducts three kinds of processing. Based on laser point cloud data, 3D reconstruction is carried out to obtain 3D clothing image. Design display page to realize virtual display of costume design details and virtual interaction. The experimental results show that the proposed method has good virtual display effect of clothing design details, and the information entropy of 3D image is greater than that of 2D image. This proves that the method studied in this paper can show more details of fashion design. Keywords: Computer Vision · Clothing Design · 3D Laser Scanning Technology · Design Details · Virtual Display
1 Introduction At present, most online clothing displays are based on text and two-dimensional pictures [1], which can only provide a flat effect, and there is a certain gap with the real object; The information that can be provided to customers is limited, and there is a lack of human-computer interaction and visibility, which is not conducive to the application and development of e-commerce; At the same time, the traditional demand analysis is generally submitted in written form or obtained through discussion between marketing personnel and customers. However, due to the gap in professional fields and the difference in demand expectations, the demand is often unclear and changeable, and the change in customer interest and value cannot be grasped, the value of existing customer relationships to the company cannot be effectively analyzed, and potential customers or markets cannot be found, As a result, it is unable to provide new products or services to valuable customers in a timely and correct manner. Therefore, it is very necessary to conduct online virtual dynamic clothing display with information technology [2]. Reference [3], based on the analysis of the feature elements of the special-shaped structure template model of the full-formed double-layer clothing, carries out parametric © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 73–87, 2024. https://doi.org/10.1007/978-3-031-50574-4_6
74
S. Fang and F. Zhu
design of the double-layer template model after the component classification, establishes the topological geometric mapping relationship between the two-dimensional template and the three-dimensional template model, and applies the spring-mass model to realize the three-dimensional component modeling of the inner and outer two-dimensional template. Through the use of digital yarn simulation and texture mapping technology, Calibrate and align the process structure feature points of the two-dimensional template after texture mapping, and finally realize the virtual modeling design and rapid virtual simulation display of the seamless parts of the fully formed double-layer knitted clothing. Reference [4] introduces in detail the integration of holographic projection technology and virtual clothing display in order to promote the in-depth application of holographic projection technology in the clothing industry. Reference [5] has developed a personalized virtual clothing display system based on adaptive deformation of clothing. First, a three-dimensional human body reconstruction framework based on body shape parameters is established; Then, the hybrid skin algorithm is used to establish the relationship between the vertices of the human body model and the vertices of 3D clothing; Finally, the dual quaternion is interpolated to realize the human body deformation driving clothing deformation and complete the adaptive adjustment of clothing. A virtual clothing display system for advanced customization is developed based on Unity platform. However, the above methods are limited by the mapping relationship between twodimensional clothing pieces and corresponding three-dimensional clothing in the process of clothing virtual display. Facing this situation, this paper studies a virtual display method of clothing design details based on computer vision. Use 3D laser scanning technology to obtain laser point cloud data and conduct three kinds of processing. Based on this, 3D reconstruction is carried out to obtain 3D image of clothing, and the display effect is optimized.
2 Computer Vision Technology The core problems in computer vision technology include: segmentation; 3D reconstruction; Motion analysis. Stereoscopic vision refers to obtaining the three-dimensional geometric information of the target object from the process of reference and comparison of multiple images. In biology, almost all biological vision systems are composed of two eyes. When two eyes observe objects from different angles at the same time, they will have a sense of distance and depth. At present, the very popular 3D stereoscopic movies imitate the principle of stereoscopic vision in biology, so that the two-dimensional pictures have a realistic sense of depth. The essence of 3D stereoscopic film is: in the process of shooting, two cameras are used to take pictures together at the same time from different angles; During playback, the images played by two cameras at different positions are simultaneously superimposed and projected onto the large screen. The polarization theory of light in physics is borrowed, so that the audience’s eyes can see the pictures taken by the left and right cameras at the same time, just like the picture information seen by people’s eyes in the real scene, as if they were in the scene. Therefore, in the computer vision system, if you want to simulate a three-dimensional picture, you only need to use two cameras to capture two images of the same object from different angles at the same time [6]. Then, according to the 3D reconstruction principle, the 3D image
Virtual Display Method of Garment Design Details
75
in the real scene is reconstructed by the computer program from the 2D information, so as to recover the real spatial position information of the object. For customers, what they see in the interface layer are specific clothing display effect pictures, fabric drawings and specific attribute information and parameters expressed in the form of text. For the clothing display effect picture, because the image information is relatively rich in the virtual clothing display process, and the customer hopes to be able to watch the realistic details display results in real time, if the image information is directly stored in the database, it will increase the storage pressure of the database server, and the interaction time will increase, which will become the bottleneck of the real-time clothing display. In order to effectively ensure the reliability and efficiency of data transmission. In this paper, the images related to clothing silhouettes and styles are directly stored in the WEB server in the form of files, while the fabric drawings, due to size constraints, occupy relatively small space, so they are stored in the database server together with attribute information and other parameter information. After the designer completes a series of operations such as style definition, outer contour definition, fabric selection and other parameters, and then through the selection of fabrics in different regions, the system will return the fabric texture mapping effect map of the selected clothing style, and then smooth the clothing mapping effect map to obtain a realistic rendering effect map of clothing [7]. In order to ensure the real-time and integrity of system data interaction, the system needs to conduct unified and effective management of clothing style image data and attribute data. Therefore, a virtual expression scheme of clothing design details is proposed, as shown in Fig. 1.
Attribute information
Size
Texture
Colour
Laser point cloud data
Texture of material
Pretreatment
3D reconstruction
Display page
Virtual display of clothing design details
Interactive technology
Fig. 1. Virtual Expression Scheme of Costume Design Details
The virtual display of clothing design details based on computer vision technology is mainly divided into three parts: clothing design details scanning; 3D reconstruction; Virtual display. The following is a detailed analysis.
76
S. Fang and F. Zhu
3 Scanning of Garment Design Details The purpose of clothing design detail scanning is to collect materials of clothing design details. The scanning method selected in this chapter is 3D laser scanning technology [8]. Different scanners have different working principles, but they all need to obtain the 3D point cloud data on the surface of the scanning object, and then process the point cloud data to obtain the 3D mesh model of the scanning object through denoising, simplification, triangle network model conversion, etc. The key of 3D scanning technology is how to quickly obtain the three-dimensional information of objects. According to the size, surface fineness and scanning site environment of the scanned clothing, the hand-held Artec Eva 3D laser scanner is used, which is faster and more suitable for scanning clothing artifacts. In the early stage of scanning, you can choose to load the texture information of the clothing itself, which is convenient and fast, with the accuracy of 0.1 mm and the resolution of 0.2 mm to meet the scanning requirements. The 3D model and various data obtained from the scanning shall be one-to-one in accordance with the solid scale. The scanning process is fast and can capture accurate measurement data with high resolution, without providing additional equipment for almost any application. The scanning process of costume design details is as follows: Open the Artec Studio software and click the “Start” button to start scanning. Since the costume is a three-dimensional object with front and back, the method of moving scanning first and then multi side scanning is adopted. During the scanning process, you can intuitively view the scanning on the computer screen in real time. If the scanning is not complete, you can conduct additional scanning. When one side is scanned and the other side is scanned and both sides are scanned, you can choose to align all scans to form a whole, and finally save to complete the scanning [9]. After 3D scanning, the generated data information file cannot be used as a clothing model in virtual reality. After being processed by 3D auxiliary software, clothing can be used in other software. The models that need to be obtained by scanning objects with scanners are prone to the phenomenon of holes, noise and repeated points. Faced with this situation, we need to fill holes, denoise and reduce. (1) Void filling In the scanning work, due to the scanning speed and angle, the scanning data obtained after the scanning is not complete, such as the texture display on the clothing is incomplete and hollow. Therefore, it needs to be filled. First, construct a kd tree of point cloud data, obtain its k-nearest neighbor for each data point of all data points, then solve the least squares fitting plane of the data point and its k-nearest neighbor, project the data point and its adjacent area to the least squares fitting plane, then detect the boundary points according to a certain algorithm, and then draw the boundary line in a clockwise (counterclockwise) direction, Finally, based on moving least squares, the method of repairing point cloud holes [10]. (2) De-noising When the laser scanner is used to obtain the sampling point data of the object surface, there will inevitably be noise points due to the influence of measuring instruments and other environmental factors. The noise points are mainly caused by the following three factors: first, the errors caused by the surface factors of the
Virtual Display Method of Garment Design Details
77
scanned object, such as the errors caused by different roughness, surface materials, ripples, color contrast and other reflection characteristics. When the surface of the subject is dark or the reflected light signal of the incident laser is weak, it is also easy to generate noise under poor lighting conditions. The second is accidental noise, that is, point cloud data errors caused by some accidental factors during scanning should be deleted or filtered out. The third is the error caused by the measurement system itself. For some contact measuring equipment, the sensitivity of the measuring equipment and the experience judgment of the measuring personnel have a great impact. The system error and random error of the measurement are the main reasons for the noise points. For the current common non-contact three-dimensional laser scanning equipment, it is more affected by the nature of the object itself. Through the analysis of the original scanning data, it is found that if the point cloud is not denoised, the shape of the constructed entity is very different from the original research object [11]. The statistical results show that 0.1%–5% noise points need to be removed from the measured point cloud data. Data denoising is generally discussed according to ordered point cloud and scattered point cloud. The point cloud data after noise removal by average filtering is relatively uniform, so the average filtering method is used for noise removal. The denoising formula is shown in formula (1): n
si =
sˆi
i=1
n
(1)
where, si represents the point cloud data after noise removal; sˆi represents the original noisy point cloud data; n stands for window size. For the latter, the scattered point cloud data is first gridded, and the noise removal of point cloud data is realized on the basis of gridding. The process is as follows: Step 1: Determine the minimum outer envelope box of the point cloud data, compare the maximum and minimum values of all points in the x, y, z directions, and determine the coordinates of the eight vertices of the point cloud data outer envelope box. Step 2: Calculate the average density of the point cloud in the outsourcing box, as shown in Formula (2): ρ=
(max x − min x)(max y − min y)(max z − min z) m
(2)
where, max x, max y and max z represent the maximum values in x, y and z directions; min x, min y and min z represent the minimum values in x, y and z directions; m represents the total number of point cloud data. Step 3: Specify the number of point clouds in each small cube, and determine the side length H of the small cube, as shown in Formula (3): H=
ρ 1/ 3 M
where, M represents the number of point clouds in each small cube.
(3)
78
S. Fang and F. Zhu
Step 4: Given the threshold, delete the noise points near the feature point surface. Step 5: Calculate and determine the center position of the small cube, and then calculate the distance from all points in the small cube area to the center point, as shown in Formula (4): 2 2 2 xi − xˆ + yi − yˆ + zi − zˆ (4) d= where d represents the distance. Step 6: Determine the point with the shortest distance as the center of gravity. Step 7: Calculate the distance from each small cube point cloud to the center of gravity, and record it as Di . Step 8: Calculate the average value D and standard deviation A of the distance, such as Formula (5) and Formula (6): M
D=
A=
Di
i=1
M
M 2
i=1 Di − D M
(5)
(6)
where, M represents the total number of points. Step 9: Delete the points corresponding to the distance that does not meet the conditions according to different limit errors, and complete the noise removal of the small cube. Step 10: Determine whether all small cubes have been processed? If yes, complete point cloud data denoising; If not, complete the noise removal of the next small cube. (3) Reduction For duplicate point cloud data, reduction is required. The process is as follows: Step 1: Create a kd tree to build a spatial index of point cloud data, so as to obtain the neighborhood of each data point. Step 2: According to the K-nearest neighbor obtained in Step 1 and the principal component analysis method, the fitting plane of the local data point can be obtained, so that the normal vector of each data point can be calculated. Step 3: Calculate the dot product of the normal vector of the data point and the normal vectors of all points in its adjacent area (to avoid the problem of inconsistent normal orientation, the absolute value is taken here), and the result is the dot product of K pairs of vectors. Step 4: Average the K values in step 3, and use the value as the characteristic value of the data point. Step 5: The point cloud data is divided into a fixed number of blocks according to the characteristic value in step 4 and its value from small to large, and the corresponding number of blocks is N.
Virtual Display Method of Garment Design Details
79
Step 6: According to a series of preset reduction ratios, use the random reduction method to reduce the point cloud data of each block accordingly, and finally consolidate the reduced point cloud data of all levels, and the reduction is completed. In the process of point cloud data collection for clothing, due to human factors, sunshine and other experimental errors. Even if preprocessing methods such as 3D point cloud simplification and denoising are used, the reconstructed 3D point cloud will inevitably have missing points, especially in the collar, cuffs and other details. In order to ensure that the missing points do not affect the calculation of subsequent parameters, this study makes detailed compensation for the model based on Geomagic Wrap 2014 [12].
4 3D Reconstruction of Garment Design Details In this paper, 3D StudioMax software is selected for 3D modeling. 3D StudioMax is the most widely used modeling software today, with high-quality graphic output, very fast operation speed, and a large number of special effects. And because 3D StudioMax runs on an open platform, it can integrate nearly 1000 kinds of plug-ins developed by third parties, greatly enriching the creative means. 3D StudioMax provides a variety of modeling methods. Flexible choices of modeling methods such as basic stacking, mesh editing modeling, surface modeling, and graphite modeling provide users with more convenient means and possibilities to create. After continuous version upgrading, more and more functions are added, and the software becomes more and more perfect. In particular, the graphite modeling method in 3D StudioMax2010 and later versions further simplifies and improves the modeling method. The animation function has also been further enhanced, which is unparalleled in 3D modeling software. 3D reconstruction of garment design details based on 3D StudioMax is as follows: Step 1: Extract garment phenotype parameters; Step 2: import the phenotype parameters to SubstancePainter for material painting; Step 3: triangulate the point cloud. Triangulation of the model is to divide the surface of the model into triangles on the premise of maintaining the characteristics of the original model. Triangulation of discrete points is to connect all points into triangular patches according to certain rules. In two-dimensional space, the triangulation of discrete point clouds is shown in Fig. 2. Considering that the greedy projective triangulation algorithm based on PCL is fast in mesh reconstruction, and the reconstruction effect is good, this method is used in this paper to grid point cloud data. The greedy projection triangulation algorithm based on PCL projects the point cloud data to a two-dimensional plane through the normal, and then uses the space region growth algorithm based on De launay to select triangles to connect the points in the neighborhood that meet the expansion criteria, so that the boundary will continue to expand to form a mesh surface, and then restore to the threedimensional space according to the topological relationship of the point cloud to form a three-dimensional mesh surface. During mesh reconstruction, the parameters to be set are: the product coefficient mu of the distance from the sample point to the nearest neighborhood, the maximum number of neighborhood searches nnn, the maximum side length radius of the triangle, and the maximum triangle after triangulation_Angle and minimum angle min_Angle, neighborhood search method, etc.
80
S. Fang and F. Zhu
Fig. 2. Schematic Diagram of Triangulation of Discrete Point Cloud
Step 4: Texture mapping. The so-called free texture mapping is that the texture camera can move and place freely under the condition of keeping the focus unchanged, and the texture mapping process can be completed by taking the texture image. To complete the free texture mapping, it is necessary to find the exterior orientation parameters of the texture camera and the binocular system when the position of the texture camera changes. For the proposed method, the internal parameters of the texture camera need to be calibrated in advance. When the texture camera moves freely to take pictures, the texture of the object being measured and the information of the marked points can be obtained at the same time. The marked points can reflect the current position and attitude of the texture camera in time. By using the coordinate position transformation relations of the marked points in the world coordinate system, the texture camera image pixel coordinate system and the binocular left camera image pixel coordinate system, the coordinate position transformation of the freely moving texture camera and the binocular system can be completed. Then, according to the imaging model, the corresponding relationship between the three-dimensional point cloud and the texture pixel can be obtained, and finally the mapping of the free texture can be completed [13]. Step 5: Render the rendering. After polygon modeling, material assignment, UV mapping, texture mapping and other links, the designer optimizes the details of the virtual clothing model, including light settings, details baking and so on. (1) Lighting layout: light is the basis of interface visual information and visual modeling, and can reflect the shape, texture and color of the target. The lights in Unity can simulate different light source types in real life, and enhance the realism of the scene by arranging lights for the scene. In the lighting layout of the entire Han costumes hall, a directional light is used to simulate the sunlight, a point light source is used to simulate the bulb on the ceiling, and a spotlight is used to simulate the flashlight
Virtual Display Method of Garment Design Details
81
to separately light the exhibits, and the light position, quantity and parameters are properly adjusted to achieve the desired effect. (2) Baking lights: The main purpose of baking is to optimize the scene and improve the running speed of the system. Baking technology can be used to pre render lighting effects into mapping effects, simulating lighting and shadow effects. In the baked scene, even if the lights are deleted, the scene will also have strong lighting and shadow effects, reducing the pressure of real-time lighting and shadows on computer operation. After the baking step is set to static for all objects except lights, set Lightmapping to Bake Scene. Create a Light Probe Group in the Hierarchy panel, then select Baked in Ambient GI in the Lighting panel, check Baked GI, and click Build to bake. After that, you can achieve the real-time lighting effect. Step 6: Color the material. Use the Miss – st skin maya shader in the mental ray renderer to simulate the color of cloth, and then render and output the virtual clothing model.
5 Virtual Display of Costume Design Details 5.1 Display Interface Design Content When browsing clothing, users can click the clothing introduction button to listen to the relevant text content of the clothing while watching. They can also click the auto display button to automatically rotate the model. At the same time, the zoom function is also provided. Users can slide the mouse wheel forward and backward to zoom the model. If the wheel slides forward, the model will be enlarged. If the wheel slides backward, the model will be reduced. Click the clothing introduction button, a text box will pop up, which is a text introduction of the clothing, and there will be an automatic voice introduction. Click the detail introduction to display the texture pattern and detailed photos of the dress, or click the line drawing to mark the 3D model. 5.2 Real Time Clothing Animation The current simulation models have been able to reproduce the dynamic effects of clothing realistically, but for the real-time requirements, the calculation of these models is still too much. This complexity comes not only from the complexity of formulas in the model, but also from collision detection. At present, only the real-time dynamic simulation of small fabrics can be realized. To realize the real-time dynamic simulation of clothing, interpolation, geometric modeling and other technologies can be used to simplify the simulation process context. Weil was the first person to carry out real-time research. He used a model based on geometry. Hindst and Grimsdale and others later made further research in this area. Although the geometric model can meet the realtime requirements in terms of speed, the effect is not very ideal because the physical characteristics of clothing are not taken into account. Another disadvantage of geometric models is that they require human interaction. In the physics based model, in order to improve the operation speed and meet the realtime requirements, the simplest and fastest method is to treat the surface of the clothes as
82
S. Fang and F. Zhu
a mesh composed of vertices, and each vertex is connected with the neighboring vertices through a damping spring to form a “mass spring” model. This model has been widely used to simulate clothes simply and quickly before 1997. Here, a hybrid technology of explicit/implicit Euler integral proposed by Meyer and Desbrun is used to realize realtime clothing animation, and a collision detection algorithm based on Voxel is adopted. Explicit Euler integral is shown in Formula (7): ⎧ T t Ci ⎪ t+1 ⎪ ⎪ + vit dt ⎨ vi = t=1 λ (7) T ⎪ ⎪ t+1 t+1 t ⎪ = v + g dt ⎩g i
t=1
i
i
where, vit+1 and git+1 represent the speed and position at time t + 1 respectively; vit , git represents the speed and position at time t respectively; t represents time; λ represents the integration step; C represents an explicit operator. The basic idea of implicit Euler integration method is to replace the force at time W with the force at time t + dt. 5.3 Virtual Interaction You can roam around the scene at will to view the internal layout of the scene in an all-round way. Click on a dress with the mouse to view it freely, or manually rotate to view the details, which can be zoomed in and out. Click a button to display the text introduction of the dress, and use C# language programming to achieve the interactive function initially set. It is found in the research that it is more reasonable and resource efficient to dynamically display the model. When the camera is close to the object, it displays the model with high accuracy, while when the camera is far away, it displays the model with low accuracy. When running in the scene, click any dress on the wall to switch to the perspective of camera 2, and instantiate a high-precision dress model. Automatically rotate in the scene, and the rest of the background is in a fuzzy state. Click the auto display button on the right side to stop the auto display and switch to manual display. A text introduction box will automatically enter on the right side, which will introduce the style and history of the dress. At the same time, there will be voice broadcast. Click the detail button to introduce, you can view each high-definition texture picture on the dress, and you can click the line marking button to mark and paint the picture at will. Click the Close button to exit the view angle of camera 2 and switch back to the view angle of the main camera.
6 Method Application Test 3DStudio MAX and SoftImage are used to reconstruct the scene of the virtual display of clothing design details. Combined with Laser projection information fusion technology, image information is sampled. The number of samples collected from the laser scanning image of the clothing design details display is 140, and the pixel of the laser scanning image is set to 500 × 600, the similarity coefficient is 0.55, and the test is conducted according to the above parameter settings.
Virtual Display Method of Garment Design Details
83
6.1 Two Dimensional Clothing Calculation Example Based on several garments, the hand-held Artec Eva 3D laser scanner is used to scan and generate basic data information files. Some clothing examples are shown in Fig. 3.
Fig. 3. Example of clothing
6.2 3D Virtual Image of Clothing Based on the 3D reconstruction technology and the scanned point cloud data, a 3D mesh model is established, as shown in Fig. 4. Then, texture mapping, rendering of renderings and material coloring are performed on the 3D mesh model to obtain the 3D clothing image, as shown in Fig. 5. 6.3 Virtual Display Effect Information entropy represents the richness of information. The information entropy formula is as follows (8): ψ =−
R−1
φi lg(φi )
(8)
i=0
where, ψ represents information entropy; R represents the maximum gray level; φi is the probability that the pixel gray value is i. The greater the information entropy, the more detailed and comprehensive the detailed information displayed in the image.
84
S. Fang and F. Zhu
Fig. 4. Three dimensional grid model
(1) Peak Signal to Noise Ratio PSNR 2552 PSNR = 10 × lg MSE
(9)
where, MSE is the mean square error between images. The larger the PSNR, the higher the resolution, and the better the image quality. (2) MTF MTF is calculated by finding the contrast between the maximum brightness point max and the minimum brightness point min in the line pair, as shown in Formula (10): MTF =
max − min max + min
(10)
MTF calculation will not be greater than 1. And the closer to 1, the better. The information entropy, peak signal-to-noise ratio and MTF of 2D and 3D images of 200 pieces of clothing in 10 categories were counted, and the results are shown in Table 1. It can be seen from Table 1 that the information entropy of 3D images is higher than 8.422 in 10 groups of different types of clothing, which is higher than the maximum value of 8.213 in 2D images. Therefore, the detailed information displayed in 3D images is more detailed and comprehensive. The peak signal-to-noise ratio of 3D image is higher than 17.213, which is higher than the maximum value of 14.872 of 2D image, that is, 3D image resolution is higher and image quality is better. The MTF of 3D image is higher than 0.924, which is higher than the maximum value of 0.887 of 2D image, that is, the MTF of 3D image is closer to 1, which has better display effect. To sum up, the
Virtual Display Method of Garment Design Details
85
(a) Overall
(b) Details Fig. 5. Three dimensional image of clothing
information entropy, peak signal-to-noise ratio and MTF of 3D images are greater than those of 2D images, which proves that the information displayed by the method studied is more comprehensive, rich and clear.
86
S. Fang and F. Zhu Table 1. Comparison of Information Entropy, Peak Signal to Noise Ratio and MTF
Project
ψ
PSNR
MTF
3D image
2D image
3D image
2D image
3D image
2D image
Clothing1
9.236
7.235
18.254
10.154
0.954
0.841
Clothing2
8.422
7.142
17.262
12.546
0.935
0.836
Clothing3
9.214
6.325
17.625
10.154
0.924
0.864
Clothing4
8.874
8.213
18.876
14.872
0.947
0.821
Clothing5
8.695
7.123
17.542
12.213
0.933
0.871
Clothing6
8.472
6.233
16.987
10.545
0.925
0.847
Clothing7
8.469
7.014
17.213
9.875
0.947
0.887
Clothing8
9.012
7.320
17.245
11.201
0.966
0.836
Clothing9
9.045
6.685
18.687
10.335
0.970
0.814
Clothing10
8.712
6.755
19.546
10.872
0.957
0.798
7 Conclusion From the perspective of application innovation, this research has strengthened the exploration of the cross field of clothing design and computer vision technology, and applied computer vision technology to clothing design. The whole implementation process is completed, and the practicability of the design method is evaluated. Some technical achievements of this paper are as follows: (1) Through in-depth analysis of computer vision technology, we successfully introduced computer vision technology into the display and design of clothing details, integrated the existing clothing in a file, re developed the design, and completed the entire virtual clothing design process. And this display is not to let users passively accept information, but to actively search and consult. Interactive display is realized. (2) The complex process from 2D garment design to 3D virtual stitching to generate virtual garment model is effectively overcome by selecting 3D scanning garment model. Although this research has done a lot of work, 3D clothing display still faces many problems to be solved. The next research will focus on the physical properties of fabrics, not only on the appearance of clothing, but also on the accurate representation of local structure, showing the drape and texture of clothing.
References 1. Wang, Y.-X., Liu, Z.-D.: Virtual clothing display platform based on CLO3D and evaluation of fit. J. Fiber Bioeng. Inform. 13(1), 37–49 (2020) 2. Memarian, B., Koulas, C., Fisher, B.: Novel display design for spatial assessment in virtual environments. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 65(1), 1437–1442 (2021)
Virtual Display Method of Garment Design Details
87
3. Zhan, B., Li, Y., Dong, Z., et al.: Research on virtual display of fully-formed double-layer knitted clothing parts. J. Text. Res. 43(8), 147–152 (2022) 4. Yong, J., Lin, F., Fan, J.: The integration and application of holographic projection technology and virtual clothing display. Video Eng. 46(4), 211–213, 220 (2022) 5. Lin, J., Chen, M., Shi, Y., et al.: Personalized virtual fashion show for haute couture. J. Zhejiang Univ. (Sci. Edn.) 48(4), 418–426, 434 (2021) 6. Sharma, S., Koehl, L., Bruniaux, P., et al.: Development of an intelligent data-driven system to recommend personalized fashion design solutions. Sensors 21(12), 4239 (2021) 7. Qiao, W., Ma, B., Liu, Q., et al.: Computer vision-based bridge damage detection using deep convolutional networks with expectation maximum attention module. Sensors 21(3), 824 (2021) 8. Wu, Z., Chen, Y., Zhao, B., et al.: Review of weed detection methods based on computer vision. Sensors 21(11), 3647 (2021) 9. Zhu, H., Pang, J., Wen, Z.: Study on outliers suppression denoising method based on Kalman filter and least square fitting. Comput. Simul. 39(7), 366–370 (2022) 10. Amabilino, S., Bratholm, L.A., Bennie, S.J., et al.: Training atomic neural networks using fragment-based data generated in virtual reality. J. Chem. Phys. 153(15), 154105 (2020) 11. Kim, S., Lee, S., Jeong, W.: EMG measurement with textile-based electrodes in different electrode sizes and clothing pressures for smart clothing design optimization. Polymers 12(10), 2406 (2020) 12. Mosleh, S., Mosleh, S., Abtew, M.A., et al.: Modeling and simulation of human body heat transfer system based on air space values in 3D clothing model. Materials 14(21), 6675 (2021)
Reliability Testing Model of Micro Grid Soc Droop Control Based on Convolutional Neural Network Zhening Yan, Chao Song(B) , Zhao Xu, and Yue Wang Dalian University of Science and Technology, Dalian 116000, China [email protected]
Abstract. In order to avoid the fatigue operation of microgrid and ensure the application reliability of equipment components, a reliability detection model of micro grid soc droop control based on convolutional neural network is proposed. The convolutional neural network architecture is constructed. By defining small target parameters, the real-time tracking of target samples is realized, and the micro grid charging state operation target is identified. Improve the polarity detection conditions of capacitor equipment, according to the data acquisition and calibration processing results, match the micro grid operation data with the detection template, and achieve the design of micro grid soc droop control reliability detection model based on convolutional neural network. Comparative experimental results: under the effect of the convolutional neural network detection model, when the fatigue curve value reaches 0.18, the indicator device will flash abnormally, which can avoid the fatigue operation state of the microgrid, and has a prominent role in ensuring the reliability of the application of the micro grid soc droop control. Keywords: Convolutional Neural Network · Micro grid soc droop control · Reliability Testing · Small Target Object · Target Tracking · Capacitance Polarity · Data Calibration · Detection Template
1 Introduction Convolutional neural network (CNN) is a type of deep feedforward neural network that utilizes convolutional calculations. It is regarded as one of the prominent algorithms in the field of deep learning. CNN is capable of representation and learning, and it can classify input information in a translation-invariant manner through its hierarchical structure. Because of this property, it is often referred to as a “translation-invariant artificial neural network”. The input layer of a CNN can process multi-dimensional data. In the case of a one-dimensional CNN, the input layer receives one-dimensional or two-dimensional arrays, where one-dimensional arrays typically represent time or spectrum samples. A two-dimensional array may contain multiple channels. For a twodimensional CNN, the input layer receives a two-dimensional or three-dimensional array. As for a three-dimensional CNN, the input layer receives a four-dimensional array [1]. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 88–103, 2024. https://doi.org/10.1007/978-3-031-50574-4_7
Reliability Testing Model of Micro Grid Soc Droop Control
89
Due to the widespread application of CNNs in the field of computer vision, many studies assume the presence of three-dimensional input data, which consist of two-dimensional pixel points and RGB channels on a plane. The hidden layers of a CNN usually consist of convolutional layers, pooling layers, and fully connected layers. In more modern algorithms, complex structures such as inception modules and residual blocks may be used. However, convolutional and pooling layers are common and unique to CNNs. The convolutional layer contains weight coefficients in its convolution kernel, whereas the pooling layer does not. Therefore, the pooling layer is not considered as an independent layer. The application of CNNs in reliability detection of microgrid SOC droop control holds great significance. A microgrid refers to a compact power generation and distribution system consisting of distributed generation, energy storage devices, energy conversion devices, loads, monitoring and protection devices, among others. Its purpose is to enable the flexible and efficient utilization of distributed generation and address challenges related to connecting numerous distributed generation sources in various forms to the grid. The advancement and expansion of microgrids can greatly facilitate the integration of distributed power generation and renewable energy on a large scale, providing reliable and diverse energy supply to loads. Microgrids play a crucial role in enabling active distribution networks and facilitating the transition from traditional power grids to smart grids. With the increasing demands of social development, the complexity of microgrid SOC droop control is growing, necessitating SOC droop control devices that possess traits such as reliability, fast response, low power consumption, lightweight, compact size, and cost-effectiveness. In the context of microgrids, compactness leads to higher integration capability, shorter response times enable faster computing speeds, and higher transmission frequencies accommodate larger amounts of information transfer. This research proposes a convolutional neural network-based reliability detection model for microgrid SOC droop control. The proposed method aims to prevent fatigue operation of microgrids and ensures high reliability in SOC droop control for microgrids.
2 Target Recognition of Micro Grid Soc Droop Control The construction of the micro grid soc droop control reliability detection model is based on the micro grid operation target recognition, with the support of the convolutional neural network architecture, to solve the small target definition expression, so as to achieve accurate tracking of the micro grid soc droop control operation target. 2.1 Convolutional Neural Network Architecture When compared to traditional neural networks, convolutional neural networks (CNNs) include a convolutional layer for feature extraction and a downsampling layer to maintain spatial consistency. In CNNs, the convolution layer is typically connected to the input layer and extracts information from the input image. The intermediate layers of the network consist of alternating convolution and downsampling layers. The output layer usually leverages label information and adopts supervised training methods to achieve
90
Z. Yan et al.
network convergence. In addition to their distinct structure, CNNs utilize weight sharing among neurons in the same plane to reduce computational complexity. Different convolution kernels are used to extract diverse features. The original information is transformed according to the network input requirements, and the connected convolution layer single neuron is connected to the local sample area of the input information. By sensing the different response behaviors in the digital sub area, the underlying features in the neural node structure, such as points, edges and corners, can be extracted. It approximates the application mechanism of neural networks. From the current development, computers obtain input data samples through neural nodes. The perception of information is from the motion of points to edges, and different textures and shapes are perceived through the motion of edges [2]. Convolution networks use different subunits to generate feature maps corresponding to such features through these features. The complete convolutional neural network architecture is shown in Fig. 1. The input data Z
A1
A2
A3
AN
B1
B2
B3
BN Downsampling layer
C1
C2
C3
CN
D1
D2
D3
DN Downsampling layer
E1
E2
E3
EN Full connection layer
Convolution layer
Convolution layer
F Output layer
Fig. 1. Convolutional neural network architecture
Convolutional neural network is deep feed-forward neural network that incorporate convolution calculations. They possess representation learning capabilities and classify input information based on their hierarchical structure. By sharing convolution kernel parameters in the hidden layers and utilizing sparse inter-layer connections, CNNs efficiently capture features while minimizing computation requirements. A convolution layer obtains four types of data information from multiple convolution kernel nodes. When the convolution kernel convolutes the data samples in turn, the response value of the lowest convolution kernel reaches the maximum, and the corresponding features are extracted. If there is no relative position change in the features, the spatial position relationship is extracted from the data samples by convolution check during the convolution process to ensure the invariance of the spatial position. The down sampling layer is connected with the convolution layer in turn, and each convolution layer is connected
Reliability Testing Model of Micro Grid Soc Droop Control
91
with a down sampling layer. The down-sampling layer is a down-sampling process. The sampling frequency is less than twice the maximum frequency of the signal. The nearest neighbor method is mainly used, and the down-sampling is usually performed at the receiver side. After the digital down conversion at the receiver, the sampling rate of the signal is still high and the data volume is large. Therefore, under the premise of ensuring the Nyquist sampling theorem, down-sampling the signal with high sampling rate can reduce the sampling frequency and the amount of computation. Downsampling the feature samples generated by the convolution layer is equivalent to the network node performing fuzzy processing on the data samples of the previous layer. When the information parameters are slightly displaced, the eight possible motion directions before downsampling change to three possible motion directions, which greatly reduces the sensitivity of the network to the displacement of the target information parameters [3]. After that, the convolution layer and the downsampling layer are connected alternately, so that more features can be learned through training. Let emin denote the minimum value of the information parameter of the data sample, χmin denotes the convolution coefficient based on the parameter index emin , emax denotes the maximum value of the information parameter of the data sample, χmax denotes the convolution coefficient based on the parameter index emax , α denotes the coding coefficient of the neural node, and δ denotes the hierarchical coefficient of the convolutional neural network architecture. By combining the above physical quantities, the connection expression of the convolutional neural network system can be defined as:
E=
1 δ
+∞ α=1
2 2 emax − emin
|χmax − χmin |2
(1)
In the convolutional neural network, each neuron can sense the local information of the input data sample, and remove the convolutional neural nodes from the same convolutional kernel, and each convolutional kernel shares the same weight parameter. Thus, while extracting the local features of data samples, the number of parameters in the training process of convolutional neural network is greatly reduced. 2.2 Small Target Definition The definition of small targets can be categorized into relative size and absolute size definitions. Relative size is determined based on the width and height of the original data sample space. Small targets are typically considered to have dimensions less than or equal to one tenth of the width and height of the original space. Convolutional neural networks are a type of multi-layer perceptron in artificial neural networks. Neurons in CNNs are interconnected and responsible for information transmission, resembling the visual cortex in animals. Each individual cortical neuron responds to stimuli within its receptive field. As there is overlap between the receptive fields of different neurons, they collectively cover the entire data sample space [4]. Currently, small target detection algorithms based on CNNs aim to strike a balance between speed and accuracy by sacrificing a small degree of accuracy.
92
Z. Yan et al.
Let r1 , r2 , r3 represent three unequal sample accuracy parameters, and the inequality conditions of r1 ≥ 1, r2 ≥ 1, r3 ≥ 1 and are satisfied at the same time. Rˆ represents the overlapping characteristics of the data samples, ε represents the repetition parameter, wr1 represents the absolute target based on the accuracy parameter r1 , wr2 represents the absolute target based on the accuracy parameter r2 , and wr3 represents the absolute target based on the accuracy parameter r3 . With the support of the above physical quantities, the solution result of the small target transition condition can be expressed as: 2 E × Rˆ
+∞ ε=1
q=
wr1 +wr2 +wr3 wr1 ×wr2 ×wr3
(2)
Small target objects perform two key functions within the convolutional neural network. Firstly, they classify based on the combination of different detailed features extracted from the convolutional layer. Secondly, they effectively mitigate the impact of feature position shifts on classification, acting as a classifier in this regard. The presence of small target objects affects various parameters of the model, including the total number of layers in the fully connected layer, the number of neurons in each individual fully connected layer, and the activation function. When the width of the small target remains unchanged, increasing the number of layers in the fully connected layer enhances the model’s ability for nonlinear expression. Similarly, if the number of layers remains constant, increasing the width of the fully connected layer also strengthens the model’s nonlinear expression capacity [5]. However, an excessively powerful nonlinear capacity may lead to overfitting and increased computation time. To address this issue, convolutional neural networks are employed to alleviate the occurrence of overfitting. During the forward propagation process, the introduction of neuron dropout, wherein a neuron ceases to function with a certain probability, enhances the model’s generalization ability. Set γ to represent a fitting parameter greater than zero in the convolutional neural network, and establish formula (2) to derive the definition expression of small target based on the convolutional neural network as follows: Q=
1−q 1+q
−
1 γ2
−
1 γ2
(3)
Batch normalization is commonly applied after the convolutional layer and before the activation layer for small target objects. In neural network training, correlations and interactions occur between connected convolutional layers, whereby changes in upper-layer parameters affect the associated lower-layer convolutional layers. As each convolutional layer produces both linear and nonlinear activation maps, these changes can be magnified with increasing depth. Consequently, adjustments in lower-layer convolutional layer parameters necessitate continuous adaptation by upper convolutional layers, resulting in a slower learning rate for the overall network model.
Reliability Testing Model of Micro Grid Soc Droop Control
93
2.3 Target Tracking Target tracking is used to obtain more reliable data sample features while maintaining the parameter quantity and the same receptive field. Its essence is to fill the convolution kernel. Each conventional convolution kernel can directly track and process data samples, and then each branch is connected, followed by a 1 * 1 convolution kernel to combine features under the same convolution layer feature parameter, while reducing channel parameters, forming a pattern similar to neural network perception behavior [6]. That is, the closer to the center, the higher the contribution; otherwise, the smaller the contribution. Finally, the connected characteristic parameter is multiplied by the key value. Under the condition of keeping the parameter quantity unchanged, more reliable feature parameters of data samples are generated, and the perceptual information of different convolution layers is added through the perceptual field module, so that the feature extraction network can obtain more context information. By connecting the prediction layers, the high-level semantic information and the low-level semantic information can be better integrated. Because compared with the low-level semantic information, the deeper convolutional neural network can detect more semantic information, and these semantic information are translation invariant, which is more effective for the detection of target categories. The larger the target weight, the more similar the area represented by the representative particles and the area tracking the target. However, after many iterations, the weights of most particles will become small, the weights of a few particles will become large, and the variance of the weights will continue to increase, resulting in difficulty in convergence, that is, the degradation phenomenon in convolutional neural networks. Let β denote the degradation behavior coefficient of the data sample, i denote the initial assignment of the particle weight, ϕ denotes the number of iterative transmission of the data sample in the convolutional neural network, T denotes the single execution ˙ denotes the iterative characteristics of time of the iterative transmission instruction, W ˙ = 0 is always true, φ the data sample to be processed, the inequality condition of W denotes the convergence index, and I denotes the translation vector of the data sample. With the support of the above physical quantities, the degradation performance intensity can be expressed as: +∞ −1 1 |ϕ| Q × | T |×W˙ β i=1 (4) u= φ × 1 − I2 On the basis of formula (4), let p1 , p2 denote the operating parameters of two randomly selected micro grid soc droop control, and the value conditions of p1 > 0, p2 > 0, p1 = p2 , y˜ denotes the fluctuation characteristic index of the data sample in the convolutional neural network space, and y denotes the fluctuation coefficient. The target tracking expression of micro grid soc droop control operation data based on convolutional neural network is: U =
O× +∞ −∞
u |p1 −p2 |2
y × y˜
(5)
94
Z. Yan et al.
When degradation occurs, resampling is performed according to the weight of the current particle. Particles with too low weight are eliminated, and more particles are regenerated through particles with high weight. The tracking method of shape matching represents the target by the shape of the target in the data sample, and tracks the target similar to template matching by calculating the difference between the candidate target shape template and the tracking target shape template. Because the target shape needs to be modeled, it is more suitable for tracking the operation data of micro grid soc droop control.
3 Reliability Detection of Micro Grid Soc Droop Control With the support of convolutional neural network, according to the processing flow of capacitance equipment polarity detection, data acquisition and calibration, detection template matching, the design and application of micro grid soc droop control reliability detection model are completed. 3.1 Polarity Detection of Capacitor Equipment After the target recognition of micro grid soc droop control is completed, the electrolytic capacitor equipment is positioned and the polarity of the electrolytic capacitor is detected. The polarity organization of most electrolytic capacitors is white fan-shaped area, which is very different from the color of other places. By processing the operating data samples of electrolytic capacitors through threshold segmentation, it is possible to judge whether electrolytic capacitor elements are defective according to the metal characteristics of the middle circular white area, and then complete the accurate detection of the polarity of capacitor equipment according to the polarity relationship between the edge area and the core capacitance. The specific detection principle is shown in Fig. 2.
Fig. 2. Polarity detection principle of capacitor equipment
In the polarity detection device shown in Fig. 2, the display screen can display different values according to the specific charging conditions of the capacitive devices. Generally speaking, the detection result of the positively charged capacitive devices is positive, and the detection result of the negatively charged capacitive devices is negative. Regardless of government symbols, the larger the displayed value, the greater the current charging amount for detecting the sag control of the micro-grid soc.
Reliability Testing Model of Micro Grid Soc Droop Control
95
If the data sample is occluded, the modulus of the gradient vector of the information parameter in this area will be very small, and the inner product of the template gradient vector corresponding to it will be a very small value, and the impact on the total can be ignored; If there is confusion in the data sample, the modulus of the gradient vector of the corresponding template in this area will be very small, and the influence of their inner product on the total can be ignored. However, this does not mean that the similarity measure of the formula can meet the changes in the reliability of micro grid soc droop control [7]. This is because the modulus of the gradient vector has the following relationship with the reliability parameter: when the total number of data samples is large, the modulus of the gradient vector is small; When the total number of data samples is small, the modulus of the gradient vector is large. According to the simultaneous formula (5), the polarity detection expression of the capacitor device is: +∞ 2 s × s 2 f A −1 −A −∞ × 2 P= √ι 2 Smax − Smin U2
(6)
In the formula, f represents the polarity discrimination condition of the electrical signal parameter, A represents the total storage amount of the data sample, A represents the output average value of the data sample in the unit detection period, s represents the random value of the similarity index, s represents the supplementary explanation condition of the coefficient s, Smin represents the minimum value of the polarity constraint parameter, and Smax represents the maximum value of the polarity constraint parameter. Although the template matching algorithm based on convolutional neural network relies on similarity measures, the similarity measures it considers are different from the template matching of micro grid soc droop control reliability data. It considers the gradient vector of data samples in the template. And the obtained matching position is determined by calculating the sum and minimum of the inner product of the gradient vector, so it has superior stability and reliability. 3.2 Data Acquisition and Calibration In the aspect of target detection, the algorithms used are convolutional neural network based methods and reliability modeling methods. The region recommendation is a twostage method, that is, the candidate region evaluation method is used to generate potential micro grid soc droop control operation data package files, and then the classifier is used to identify these data files and classify the required target objects. When building the micro grid soc droop control reliability detection model, the traditional image processing method has made some achievements, effectively promoting the development and application of the defect detection system, but there are also complex detection processes [8]. With the breakthrough of convolutional neural network technology in data processing, data analysis and other aspects, using convolutional neural network to collect and calibrate the operation data of micro grid soc droop control has become the main research direction.
96
Z. Yan et al.
In the manufacturing process of micro grid soc droop control, there will be various types of defects. The previous detection methods usually use manual vision or traditional image processing methods. Although these methods have achieved certain results, they still have great shortcomings. With the rise of convolutional neural network technology, various target detection models have achieved good results in the field of reliability detection. Inspired by this, convolutional neural network is used to detect target samples in the process of designing micro grid soc droop control. In the research object, the electrolytic capacitor is taken as the detection object. In the selection of network structure, a multi-layer network cycle structure of neural network nodes is designed to enhance the feature extraction ability of the network, and good results are achieved in the detection of electrolytic capacitor. The reliability detection model for microgrid SOC droop control is fundamentally based on the gradient descent method, aiming to optimize the objective function, which is the square of the discrepancy between the expected and actual output of the entire network [9, 10]. Structurally, the convolutional neural network consists of an input layer, hidden layer(s), and an output layer. The number of nodes in the input and output layers can be determined based on the size of the input feature vector and the number of classification categories. Unlike the input-output layer, the hidden layer(s) does not directly connect with the input or output layers. However, changes in its state have an impact on the relationship between the input and output, allowing for customization as needed. Let l1 , l2 , · · · , ln and l represent the value taking results of n non-zero micro grid soc droop control operation data samples, and the definition formula is as follows: ⎧ λ1 ⎪ ⎪ l1 = ⎪ ⎪ 1 + g1 ⎪ ⎪ ⎪ ⎪ λ ⎪ 2 ⎪ ⎨ l2 = 1 + g2 (7) ⎪ .. ⎪ ⎪ ⎪ . ⎪ ⎪ ⎪ ⎪ ⎪ λn ⎪ ⎩ ln = 1 + gn In the formula, λ1 , λ2 , · · · , λn represent the collection coefficients of n randomly selected data samples, and the inequality condition of λ1 = λ2 = · · · = λn is always true, and g1 , g2 · · · , gn represent the calibration indicators of n data samples, whose values belong to the numerical range of [1, +∞), and the above physics is established simultaneously. On the basis of formula (7), select a data sample defect value parameter dˆ , which is required dˆ to be not less than the natural number "1". In conjunction with formula (6), calculate the collection and calibration processing expression of the operation data of micro grid soc droop control as follows: dˆ × P D= l12 + l22 + · · · + ln2
(8)
Reliability Testing Model of Micro Grid Soc Droop Control
97
The data acquisition and calibration principle greatly reduces the number of data samples in the convolutional neural network by using local connection, weight sharing, multi-core convolution and pooling, so that the number of layers of the network can become deeper and the features can be extracted implicitly. Convolution nodes are mainly used to identify data sample parameters with distortion invariance. The socalled distortion invariance means that the original expression content of data will not be changed after displacement, scaling and other operations. Figuratively speaking, convolution neural networks do not have separate steps and processes to extract features like traditional methods. It implicitly learns from training data sets. In addition, convolution neural networks can learn in parallel, This is also a bright spot. 3.3 Detection Template Matching Detection template matching is to achieve the purpose of classifying data, that is, to obtain a set of usable weights of the perceptron, and to randomly initialize the weights of the perceptron. The role of using the perceptron is to hope that the desired output can be classified through the judgment of the weights of the perceptron. For the output of each training sample, the weight of the perceptron can be adjusted by constantly calculating the difference between the output and the label, Through many times of training, the purpose of correct classification can be achieved. The process is shown in the following formula: Template marking coefficient: 2 η × H − j12 + j22 + · · · + jn2 (9) k= D where, η represents the transmission efficiency of the operation data of micro grid soc droop control in the convolutional neural network, and its value belongs to the numerical range, H represents the secondary value of the average value of the operation data samples, and j1 , j2 , · · · , jn respectively represent the detection parameters corresponding to l1 , l2 , · · · , ln , and. Matching characteristics between operation data of micro grid soc droop control and convolutional neural network: =
μ × | J | k
(10)
In the formula, J represents the accumulated amount of data samples in the unit detection interval of the convolutional neural network, and μ represents the sample matching coefficient. On the basis of formula (9) and formula (10), let Vmin represent the minimum value of the reliability definition parameter, Vmax represent the maximum value of the reliability definition parameter, b˜ represent the data sample extraction parameter based on convolutional neural network, and κ represent the detection coefficient.
98
Z. Yan et al.
The matching expression of reliability detection template of micro grid soc droop control operation data based on convolutional neural network is: 2 +∞ 2 − Vmin κ × Vmax 2 (11) M = ( × k) × 1 − b˜ 2 −∞
The reliability detection model of micro grid soc droop control based on convolutional neural network is a lightweight algorithm for fast detection of small targets. It is different from many current complex feature extraction framework algorithms in that it uses a lightweight skeleton network [11, 12]. Through data acquisition, calibration expressions and feature fusion results, the detection accuracy is increased and the detection effect of small target parameters of micro grid soc droop control is further enhanced under the premise of sacrificing a few speeds. Through the expression of feature fusion, the problem of insufficient information interaction between different convolution layers is solved.
4 Case Analysis To assess the practical significance of the reliability detection model for microgrid SOC droop control based on convolutional neural networks, a series of comparative experiments have been designed. 4.1 Environment Construction X7R capacitive element is selected as the experimental object (as shown in Fig. 3), placed in the experimental circuit as shown in Fig. 4, closed the connecting switch, and recorded the numerical changes of relevant index parameters. Based on Fig. 3, the capacitor material identified as X7R is known as a temperaturestabilized ceramic capacitor. X7R capacitors exhibit a 15% change in capacity when exposed to temperatures ranging from -55 °C to 125 °C. It is important to note that the capacitance change in X7R capacitors is nonlinear under these conditions. Moreover, the capacitance of X7R capacitors can vary based on voltage and frequency conditions, as well as change over time at a rate of approximately 5% every 10 years. These capacitors find common use in industrial applications that have lower requirements, where the acceptable capacity change is dependent on voltage variation. In order to avoid the influence of other interference conditions on the experimental results, when replacing the detection model, the indication of the experimental equipment must be returned to the initial state. 4.2 Experimental Procedure The specific process of this experiment is as follows: (1) Use the reliability detection model of micro grid soc droop control based on convolutional neural network to control stc-12c5a equipment, close the control switch, and record the numerical change of fatigue value index within the given experimental time;
Reliability Testing Model of Micro Grid Soc Droop Control
99
Fig. 3. X7R capacitance GN D
+VC C
L1 SHT11
STC-12C5A
L2 X7R capacitance
switch Electric quantity sensor
Fig. 4. Experimental circuit
(2) Reset the indication of the experimental equipment to zero to make it present the state before the experiment; (3) Control the stc-12c5a equipment with the detection model based on the small fundamental wave algorithm and the detection model based on the traditional neural network, repeat the above experimental steps again, and record the numerical change of the fatigue value index; (4) Compare the recorded experimental data and summarize the experimental rules; The following table records the specific models of relevant experimental equipment (Table 1).
100
Z. Yan et al. Table 1. Selection of experimental equipment
Equipment / parameters
Model / value
Micro grid soc droop control
X7R capacitive element
Main control element
Stc-12c5a equipment
Electric quantity sensor
Electric quantity sensor
Motor
SHT11
Connecting wires
Twisted pair
Rated voltage
220V
Rated current
Up to 35.7A
Circuit internal resistance
9.7 × 107
During the experiment, all experimental equipment were directly controlled by stc12c5a equipment elements. 4.3 Results and Discussion The fatigue curve can reflect the fatigue operation state of the microgrid, and can be used to describe the application reliability of the selected equipment components. Without considering other interference conditions, the lower the value level of the fatigue curve, the lower the operating fatigue level of the micro grid soc droop control, that is, the stronger the application reliability of the selected equipment components. The Fig. 5 reflects the specific numerical changes of fatigue curve under the action of convolutional neural network detection model, small fundamental wave algorithm detection model and traditional neural network detection model. It can be seen from the analysis of Fig. 5 that the standard value of fatigue index shows a trend of increase and re stabilization. During the whole experiment, the maximum value reaches 0.24; Under the action of the detection model based on convolutional neural network, the fatigue curve also maintains a numerical change state of first increasing and then stabilizing, but its maximum value can only reach 0.17, which is lower than the standard maximum value; Under the action of the detection model based on the small fundamental wave algorithm, the change trend of the fatigue curve is consistent with the standard curve, and its maximum value reaches 0.22, which is also lower than the standard maximum value; Under the action of the traditional neural network detection model, the change value of the fatigue curve is also consistent with the standard curve, but its average level is high, and the maximum value reaches 0.25, which is higher than the standard maximum value. The reason is that the designed model transforms the original information according to the network input requirements, and the single neuron of the convolution layer connected with it is connected with the local sample area of the input information. By sensing the different response behaviors in the digital sub-area, the underlying features in the neural node structure can be extracted. To that extent, clothing is conducive to effectively control the value level of fatigue index and ensure the application reliability of equipment components.
Reliability Testing Model of Micro Grid Soc Droop Control
101
Detection model based on traditional neural network Standard value Detection model based on small fundamental wave algorithm Detection model based on convolutional neural network 0. 0.24 0.21
Fatigue index
0.18 0.15 0.12 0.09 0. 0.03 0
0
3
6
9
12 15 18 Time/min
21
24
27
30
Fig. 5. Fatigue curve
To sum up, the conclusion of this experiment is: (1) Small fundamental algorithm based detection model and traditional neural network based detection model have limited control ability for fatigue index, which cannot effectively solve the fatigue operation problem of micro grid soc droop control; (2) The inspection model based on convolutional neural network can effectively control the value level of fatigue index, avoid the fatigue operation state of microgrid, and play a promoting role in ensuring the application reliability of equipment components.
5 Conclusion This paper studies the reliability detection model of micro grid soc droop control based on convolutional neural network, and focuses on the application of convolutional neural network technology in the reliability detection of micro grid soc droop control. The main research results are as follows: (1) The traditional defect detection methods are investigated, and the specific methods used in defect detection are introduced. (2) The micro grid equipment detection based on convolutional neural network is realized. The proposed polarity detection results are used to expand the data space. Experiments based on the detection template matching standard show that the data expansion improves the classification performance and classification ability of the convolutional neural network.
102
Z. Yan et al.
(3) The reliability of micro grid soc droop control operation data is determined by using data acquisition and calibration expressions, and then the convolutional neural network is used to correlate the completed data sample parameters, so as to maximize the integrity of the information to be detected. A lot of research has been carried out on the reliability detection algorithm of micro grid soc droop control based on convolutional neural network. The classification method of micro grid equipment defects and the detection method of basic micro grid equipment have been studied respectively, but there are still many deficiencies that need to be further studied and improved, as follows: (1) In terms of data set expansion, further research can be carried out to expand the data set by strengthening the cooperation of industry, university and research projects, or continue to expand the data set through data expansion methods. For micro grid equipment data, further improvement can be made on the basis of convolutional neural network structure, and preprocessing operations can be added at the input of the network to expand the diversity of generated samples. (2) Only the experimental research of theoretical algorithm is carried out, which can further expand the application research and better transform the research content into productivity. (3) Based on the micro grid soc droop control equipment, we have made in-depth research, and can continue to expand the types of micro grid components, such as chip defects, resistance defects, inductance defects, etc., which can be taken as the research direction, to further expand the application value of convolutional neural network in the establishment of micro grid equipment reliability detection model. Aknowledgement. 1. Research on Energy Storage Control of DC Microgrid under the Background of “Dual Carbon” 2. Source: Undergraduate Innovation and Entrepreneurship Training Program of Dalian University of Science and Technology. 3. Power Distribution Strategy and Simulation Analysis of Multi-group Hybrid Energy Storage System in DC Microgrid. 4. Source: Basic Scientific Research Project of the Education Department of Liaoning Province in 2021 (Supported Project), Item No.: KYZ2141. 5. Application research of fractional-order gradient descent method in neural network control. 6. Source: The Education Department of Liaoning Province, Item No.: L2020010.
References 1. Li, K., Annapandi, P., Banumathi, R., et al.: An efficient optimal power flow management based microgrid in hybrid renewable energy system using hybrid technique: Trans. Inst. Meas. Control. 43(1), 248–264 (2021) 2. Pascual, J., Arcos-Aviles, D., Ursua, A., et al.: Energy management for an electro-thermal renewable-based residential microgrid with energy balance forecasting and demand side management. Appl. Energy 295(1), 1–15 (2021)
Reliability Testing Model of Micro Grid Soc Droop Control
103
3. Dey, B., Basak, S., Pal, A.: Demand-side management based optimal scheduling of distributed generators for clean and economic operation of a microgrid system. Int. J. Energy Res. 46(7), 8817–8837 (2022) 4. Luo, X., Shi, W., Jiang, Y., et al.: Distributed peer-to-peer energy trading based on game theory in a community microgrid considering ownership complexity of distributed energy resources. J. Clean. Prod. 351(1), 1–12 (2022) 5. Zhao, C., Sun, W., Wang, J., et al.: Distributed robust secondary voltage control for islanded microgrid with nonuniform time delays. Electr. Eng.. Eng. 13(6), 625–2635 (2021) 6. Neves, A.C., González, I., Karoumi, R., et al.: The influence of frequency content on the performance of artificial neural network–based damage detection systems tested on numerical and experimental bridge data. Structural Health Monitoring 20(3), 1331–1347 (2021) 7. Mahmoudabadbozchelou, M., Caggioni, M., Shahsavari, S., et al.: Data-driven physicsinformed constitutive metamodeling of complex fluids: A multifidelity neural network (MFNN) framework. J. Rheol. 65(2), 179–198 (2021) 8. Yilmaz, C., Koyuncu, I.: Thermoeconomic modeling and artificial neural network optimization of Afyon geothermal power plant. Renewable Energy 163(6), 1166–1181 (2021) 9. Zhang, T.T., Fang, Y.Q., Han, L.: Automatic modulation recognition with deep residual network. Computer Simulat. (1), 178–180, 379 (2021) 10. Konduru, H., Rangaraju, P., Amer, O.: Reliability of miniature concrete prism test in assessing alkali-silica reactivity of moderately reactive aggregates: Transp. Res. Rec. 2674(4), 23–29 (2020) 11. Naderipour, M., Zarandi, M., Bastani, S.: A type-2 fuzzy community detection model in large-scale social networks considering two-layer graphs. Eng. Appl. Artif. Intell. 90(4), 1–21 (2020) 12. Lawal, O.M., Zhao, H.: YOLOFig detection model development using deep learning. IET Image Proc. 15(13), 3071–3079 (2021)
Pedestrian Detection in Surveillance Video Based on Time Series Model Hui Liu1(B) and Liyi Xie2 1 College of Information Engineering, Fuyang Normal University, Fuyang 236041, China
[email protected] 2 Shandong Sport University, Jinan 250014, China
Abstract. To solve the problem of low detection accuracy caused by the occlusion of downlink people in complex scenes, a pedestrian detection method based on time series model in surveillance video is proposed. The gray values of pixels at the same location are regarded as a time series, and the mixed Gaussian model is used to recognize the pedestrian foreground. The threshold segmentation method is used to segment the image. The threshold segmented image is projected vertically to the X axis, and the overlapped pedestrian trough is used as the segmentation point. The bimodal feature window is projected vertically to segment the region of interest. Mark the feature points of the real image dataset, build an enhanced feature point detection network model, and obtain the descriptor detection results. The time-domain and frequency-domain information is represented as a symbol sequence, and the target is clustered using the equal length overlapping time window segmentation method, so that the location center of gravity does not change and the specified convergence degree is achieved. Balance the data fusion features to determine the pedestrian detection results in the surveillance video. The experimental results show that this method can detect all pedestrians, the maximum accumulated error of target recognition is 21%, and the maximum average accuracy of target matching is 90%, which proves that the detection effect is good. Keywords: Time Series Model · Monitoring Video · Pedestrian Detection · Region of Interest
1 Introduction Pedestrian detection is the first step in a large number of applications, such as intelligent video surveillance, auxiliary driving system, human-computer interaction, military applications and intelligent digital management. Due to the differences in light, color, scale, posture and dress, pedestrian recognition is a challenging problem. Pedestrian detection in images has a long history. in the past decade, people have great interest in pedestrian detection. Now we have entered a video surveillance society, and video surveillance phenomena can be seen everywhere in life. People are increasingly interested in pedestrian detection algorithms in video surveillance. The Internet of Things © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 104–118, 2024. https://doi.org/10.1007/978-3-031-50574-4_8
Pedestrian Detection in Surveillance Video Based on Time Series Model
105
technology is introduced into surveillance, and an intelligent monitoring system integrating identification, positioning, tracking, monitoring, early warning and management is established. In practical application, the system can accurately and comprehensively identify the changes of people, objects and environment within the monitoring range, and conduct intelligent analysis to achieve the identification of intrusion behavior. At present, most of the public security video monitoring systems and special video monitoring systems are used in daily operations and public security business, which is difficult to adapt to the new police needs [1]. In addition, the narrow scope of monitoring, low degree of intelligence, and incomplete auxiliary forensics system are also the main problems that lead to the monitoring system being unable to meet the new demand for security. The video surveillance system uses the method of computer automatic detection, tracking and recognition to obtain useful information from a large number of surveillance videos, and to understand and analyze it without relying on people as much as possible. With the development of video surveillance technology, the problem of single camera should be solved first, that is, multiple cameras should be used to replace a single camera, so as to solve the problem of large-scale surveillance. However, in the video surveillance system, how to judge the consistency of moving objects is a new difficulty. Among the current research methods, reference [2] proposes a multi-target tracking algorithm combining target detection and feature matching. Determine whether there is an obstacle according to the feature difference between the target itself and the current frame, and then track and detect it according to the remaining feature information after occlusion; Reference [3] proposes a real-time tracking system for moving objects based on binocular vision. The binocular stereo matching method is used to determine whether there is occlusion, then determine whether there is occlusion through matching error, and finally use gray correlation matching for follow-up tracking. Reference [4] proposes an end-to-end anomaly behavior detection network, which takes video packets as input and outputs anomaly scores. After the spatiotemporal encoder extracts the spatiotemporal features of video packets, it uses the attention mechanism based on hidden vectors to weight the packetlevel features, and finally uses the packet-level pool to map the video packet scores to achieve behavior tracking. However, these three methods are vulnerable to the impact of dynamic targets, resulting in poor detection results. Therefore, a pedestrian detection method based on time series model in surveillance video is proposed. In the first section of this paper, based on the time series, the mixed Gaussian model is used to recognize the pedestrian foreground. The second section of the article marks the feature points of the real image data set, establishes an enhanced feature point detection network model, and realizes the region of interest division based on Gaussian background modeling. In the third section of the article, the time domain and frequency domain information are expressed as a time series, and the target is clustered using the equal-length overlapping time window segmentation method, so that the positioning center of gravity remains unchanged and reaches the specified convergence degree. The fourth section of the article balances the data fusion features and realizes the image feature balance processing based on data fusion. The last section obtains the pedestrian detection results in the surveillance video to achieve pedestrian detection in the surveillance video.
106
H. Liu and L. Xie
2 Pedestrian Foreground Recognition Based on Time Series Model A pedestrian foreground recognition method based on time series model is proposed to solve the problem of pedestrian background brightness change and repetitive motion in video. For a video image, the gray value of a pixel at the same position is regarded as a time series, and the probability observation value of the pixel at time t can be expressed as: G(It ) =
n
2 ωi,t × ρ ti , si,t , σi,t
(1)
i=1
In formula (1), ωi,t is the weight of the i Gaussian distribution at time t, and ρ is the Gaussian probability density function; ti is a time series; si,t is the expected value of 2 is the standard deviation of the i-th Gaussian the i-th Gaussian distribution at time t; σi,t distribution at time t. Using formula (2) to mine surveillance video pedestrian information, which can be described as: xn = β0 +
n
vi ti +
i=1
n
vj tj
(2)
j=0
In formula (2), β0 represents the dimension level information; vi represents the information mining speed; vj means the monitoring operation and maintenance management speed; ti represents the information mining scalar time series; tj represents the operation and maintenance management scalar time series; and n represents the mining times [5]. The association rule mining algorithm is used to feature mine the surveillance video pedestrian information and analyze the abnormal data mined by the surveillance video pedestrian information. Mining abnormal data using association rules to locate the surveillance video pedestrians. The time series model of the surveillance video pedestrian information collection constructed based on this is: s(t) =
t
hxn gn (t)
(3)
n=1
In formula (3), amn represents the potentially useful information amplitude; gmn represents the multi-layer conjugate authentication coefficient. According to the above mixed Gaussian model and time series model, whether a pixel is the background can be judged. The process is: first initialize the Gaussian function, and then analyze a new pixel. If the pixel observation value is within 2.5 standard deviations of a Gaussian function in the mixed Gaussian model, the pixel matches the corresponding Gaussian function; If there is no match, replace the Gaussian function with the lowest probability with a new one. If it matches, update the weight of all Gaussian functions [6]. Then, conduct normalization processing, normalize the newly generated ownership value, and update its matched Gaussian function parameters to: si,t = (1 − α)st−1 + α(ti )
(4)
Pedestrian Detection in Surveillance Video Based on Time Series Model
107
2 2 σi,t = (1 − α)σt−1 + α(ti − st )t (ti − st )
(5)
In the above formula, α represents the learning parameters. Set the background threshold f0 and take the first j Gaussian functions with the sum of weights greater than f0 as the background model: ⎛ ⎞ j B = arg min⎝ fi > f0 ⎠ (6) j
i=1
In formula (6), fi represents the number [7] of Gaussian functions included in a mixed Gaussian model. After determining the background model, the pixels in the image can be classified. If the observed pixel matches one of the first j Gaussian functions, the pixel is considered as the background, otherwise it is the foreground.
3 Monitoring Video Pedestrian Detection 3.1 Region of Interest Division Based on Gaussian Background Modeling The luminance vertical projection curve of two independent pedestrians is also independent of each other; The vertical projection curve of brightness of pedestrians with occlusion also has adhesion [8]. The head of the adherent pedestrian is located at the peak of the two brightness projection curves respectively, and the adherent part is located at the valley bottom between the two peaks in the projection curve. Therefore, the specific steps of the direction projection method based on Gaussian background modeling to segment the region of interest are as follows, and the process is as follows: Step 1: divide the video sequence image into foreground. Step 2: threshold segmentation is performed on the foreground target. At this time, the foreground contains multiple pedestrians occluding each other. The threshold should be set in an adaptive way, and the corresponding segmentation threshold is: u = λ × mmax + (1 − λ) · m
(7)
In formula (7), λ represents the weighting coefficient; mmax is the image gray maximum; m represents the image gray mean. Step 3: vertically project the thresholded image to the X axis, and vertically project the windows with bimodal features by taking the wave valley of the adherent pedestrian as the segmentation point to obtain the gray level vertical projection curves of the three images. The first is that the gray-scale area of the human body shows a convex peak without adhesion [9]; The second is to determine the midpoints of the ascending and descending curves on both sides of the convex peaks in the projection curve for quantitative calculation, and take these two midpoints as the starting and ending points of a brightness band respectively, so as to obtain a series of brightness bands perpendicular to the X axis, while the possible areas of the human body are included in the brightness band; The third is the adhesion of vertical projection curves, which needs to be handled separately.
108
H. Liu and L. Xie
Step 4: horizontally project the brightness band obtained from vertical projection to the Y axis. The starting and ending points of the brightness band are selected in the same way as for vertical projection. Step 5: Put the brightness band obtained from vertical projection and horizontal projection into the corresponding position in the original image at the same time. At this time, the original image can be divided into many high brightness rectangular areas. The above method is suitable for independent and unobstructed object interest determination. The pedestrian can be detected by inputting the interest directly into the trained convolutional neural network. However, this method cannot accurately deal with the situation of pedestrians blocking each other, and the detection rate of the system is low [10–12]. Therefore, the directional projection method based on Gaussian background modeling is used to determine the interest of occluded pedestrians. The region of interest is determined. The two people who occlude and adhere to each other are respectively included in two different rectangular brightness boxes, that is, they are divided into two regions of interest. 3.2 Construction of Enhanced Feature Point Detection Network Model Segmenting occluded and non occluded regions of interest, outputting multi view data fusion features, and balancing pedestrian detection network can achieve pedestrian detection in video. Firstly, the input multi view images are matched to form a complete image, and then the target detection network is used to train the fused image to improve the accuracy of occlusion and long-distance small pedestrian detection. 3.2.1 Real Image Dataset Feature Point Annotation The workflow of the multi view data fusion model of self supervised learning is as follows: image acquisition, self supervised feature point and descriptor extraction, feature matching, and finally multi view image fusion. In the process of multi view data fusion, it is difficult to use manual annotation to extract feature points from data sets. For the annotation of traditional detection and segmentation tasks, given an image, the semantic truth value can be determined by marking the rectangular box or the outline of the object. However, for the feature point detection task, it is difficult to determine which pixel can be used as the feature point manually. Therefore, the basic dataset containing only simple geometric shapes and the self collected dataset are used to automatically label the dataset. The specific process is as follows: Model pre training using simple geometric shape data set simple geometric shape data set is composed of some images whose feature points are easy to determine, such as line segments, polygons, cubes, etc. The true values of data sets and feature points can be obtained by using scale invariant feature transformation to extract feature points from basic data sets. Because the feature points of the basic geometric shape images such as line segments and triangles are subsets of the real image feature points, a primary feature point detection network is obtained by training the feature point detection network using the labeled simple geometric shape data set. Compared with traditional algorithms such as scale invariant feature transformation, the primary feature point detection network trained in simple geometric shape data sets has certain advantages in accuracy, but
Pedestrian Detection in Surveillance Video Based on Time Series Model
109
when extracting feature points from real image data sets, there will be some missing feature points, and the detection accuracy is low. Therefore, a new model is obtained by using homography adaptive transformation and primary feature point detection network training to improve the accuracy of feature point extraction of real images. The input image is processed by multiple composite geometric transformations, and the super parameter is set to 80 frames, that is, the super parameter is the original image without composite geometric transformation, and the remaining 79 frames are the images formed by the original image through randomly generated composite simple geometric transformation. The generated primary feature point detection network extracts the pseudo feature points of the real image data set, maps the 79 frames of images corresponding to the source image back to the feature points of the original image, and accumulates them to form a new source image feature point, thus completing the feature point annotation of the real image data set. 3.2.2 Construction of Enhanced Feature Point Detection Network Model In the compound simple geometric transformation, 79 frames of the source image transformation image formed by the known transformation matrix are obtained, so 79 sets of image pairs of known pose transformation of the source image and its corresponding 79 frame images are obtained. In this way, the true value of the mapping relationship between the original image and the transformed image is obtained. The final self collected data set contains feature points and feature point descriptor truth values, which are used for joint training of feature point detection and descriptor detection network branches in the feature point detection network. In order to realize the joint training of feature point detection sub network and description sub network in the primary feature point detection network, the loss function values of the two detection sub networks are weighted and added to obtain a unified loss function. In order to fuse information from different perspectives, it is necessary to find the corresponding relationship between different perspectives. The adaptive homography transformation is used to solve the corresponding relation matrix of different perspectives. The composite simple geometric transformation matrix learned through self-monitoring is not all useful and needs to be selected. In order to select a composite simple geometric transformation matrix with good performance, truncated normal distribution is used to sample translation, scaling, in-plane rotation and symmetric perspective transformation within a predetermined range. Based on this, the constructed enhanced feature point detection network model is shown in Fig. 1. After obtaining the true value of the mapping relationship between the original image and the real image of the data set according to Fig. 1, the automatic annotation of the real data set is completed, which realizes the automatic annotation of the real image data set that is difficult to label manually. The enhanced feature point detection network is used to train the previously acquired automatically labeled image dataset to improve the accuracy of feature point extraction. Multilevel encoder: in order to give consideration to real-time and accuracy, the enhanced feature point detection network is designed into two branches to handle different tasks. The upper branch extracts the deep feature points of the original image through an asymmetric encoding and decoding network. The feature descriptor of the original
110
H. Liu and L. Xie image
Feature point retrieval
Descriptor generation On the sampling
Fig. 1. Enhanced feature point detection network model
single view image is generated, and the surface feature description of the original image is extracted through a multi-channel, low-level encoder network. Feature point detection: when detecting the network part of the feature point, the feature point of the image is obtained through a deep, few channel, asymmetric encoding and decoding network. Fusion network: because the feature map of the network does not have the same channel and size, the features extracted by the descriptor generation network are shallow and contain a lot of location information, while the feature point detection network obtains deep feature points after multi-layer encoder, including information such as arm and face. In order to fuse features at different levels, the fusion network first realizes simple fusion of feature maps at different levels through Concatmate operation. In order to balance features of different sizes, the BatchNorm operation is used after Concatmate. The connected features are pooled globally × 1 Convolution to get a new weight. The purpose of this is to make a new feature selection and combination for the connected features. So far, the descriptor detection results are obtained. 3.3 Target Clustering Based on Time Series Segmentation The method of piecewise symbolic linear representation is to segment the original time series and express the time domain and frequency domain information as symbol sequences. This method can not only ensure that the data pattern with long duration can be completely separated, but also maintain the dependence of the original time series data on time sequence. The specific implementation of the equal length overlapping time window segmentation method is as follows: set containing N multidimensional time series be marked as T =
1Let2 the sample a single sample of length T can be represented T , T , ·· · , T n , · · · , T N , in which as T n = t0n , t1n , · · · , t1n , · · · , tTn . Each sample T n in the sample set applies a sliding window with window size d and step length l. The division of the sample is represented as follows:
Pedestrian Detection in Surveillance Video Based on Time Series Model
n ⎤ n T : T1+m n1 n ⎥ ⎡ n⎤ ⎢ T1+l : T1+l+m ⎢ ⎥ T1 ⎢ ⎥ . .. ⎥ ⎢ Tn⎥ ⎢ ⎥ n ⎢ 2 ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ = f T = ⎢ . ⎥ = ⎢ xn n ⎥ : x ⎣ .. ⎦ ⎢ 1+(i−1)l 1+(i−1)l+m ⎥ ⎢ ⎥ . ⎢ ⎥ .. Tmn ⎣ ⎦ n x1+(m−1)l : XTn
111
⎡
T ln
(8)
In formula (8), m represents the total number of time series after a slice. The autoencoder can retain important features in the data while squeezing highdimensional data. It uses multiple self-compilers to extract features in different time series fragments. Figure 2 is a schematic diagram of the process of the autoencoder training process.
Fig. 2. Schematic diagram of the autoencoder training process
For normal time series sample segmentation processing, and all the samples in the same location of the data fragments, through the extraction in the nonlinear features, based on the principle, monitoring video pedestrian target clustering steps are as follows: first of the pre-processing of each sample, and then smooth, strengthen, and then using the super segmentation method, complete the monitoring video pedestrian feature extraction. For image segmentation processing, the interval between the target point and the hyperplane is calculated with the formula: γ =
1 t a xn + b a
(9)
In formula (9), a represents the normal vector; b the determination threshold; t the split time. According to this formula, the maximum conversion value of the calculation 1 , and judging from the maximum value of the norm equivalent, it can interval is a
112
H. Liu and L. Xie
continue to be converted to the minimum value of the normal vector to ensure that the classification spacing reaches the maximum. In the extracted visual images, similar methods are used to integrate images with similar pixels to form a visual image library. The data of each cluster subset is averaged using the standard function of the sum of squares of errors, and then classified and optimized using iterative methods. The square error function is as follows: e=
k |N − Oc |2
(10)
c=1
In formula (10), N represents sample data; Oc represents the data cluster center; k represents the square error and the number of partitions; c represents the number of clusters. In a given sample, N initial centers are randomly and uniformly generated. The distance between each data sample point and K center centers is calculated respectively, and they are designated as the closest center, which is temporarily classified into one category. The mean value of each type of samples is calculated, and the center of gravity is repositioned. The clustering can be completed when the center of gravity does not change or the specified convergence degree is reached. 3.4 Characteristic Balancing Processing Based on Data Fusion The enhanced feature point detection network is a single-stage target detection method. Different from the target detection framework of the RCNN series, the enhanced feature point detection network does not generate candidate boxes, but directly returns the location and category of the boundary box at the output layer. The enhanced feature point detection network draws on the ideas of residual network and feature pyramid network, adds cross layer jumping connection, and integrates the features of coarse and fine granularity, which can better realize the detection task. Multi scale prediction is added, that is, prediction is made on three characteristic layers of different sizes, and three anchor frames are predicted for each scale. The anchor frame is designed by clustering, and nine clustering centers are obtained, which are divided into three feature layers according to the size, and three features of different sizes are fused. The feature extraction network of the enhanced feature point detection network is a balanced network, and its network structure is shown in Fig. 3. It can be seen from Fig. 3 that the Convo logical in the feature balance network represents an activation function, and its operation process includes convolution layer, batch normalization layer and activation function. For the feature balanced network, the inseparable parts of the convolution layer and the hidden layer together constitute the minimum component, which uses deep features and shallow features for small-scale pedestrian detection. After the difference between background and foreground is obtained, the object and background are completely segmented by binarization. The binary processing can eliminate the background drag retained by the differential processing, only retain the dynamic range of the background, and mainly focus on moving targets. There are still some less important interferences in the binary processing, which can be filtered in the
Pedestrian Detection in Surveillance Video Based on Time Series Model
113
Fig. 3. Feature-balanced network structure
subsequent process. After the binary processing, the thresholding method is used to determine whether the pixel is located on the detection target. Moving objects are prone to smear, and the median filter can effectively preserve the edge characteristics of the image without paying attention to details. Median filtering is a nonlinear window filtering algorithm, which can retain image details while removing noise. The basic principle of median filtering is: divide any pixel according to the pixels of adjacent areas, and replace the central pixel value with the median value. The median filtering has a good effect on noise processing. The interference caused by large discrete points between adjacent regions can be effectively eliminated by replacing pixel values with intermediate points. Compared with mean filtering, this method has better filtering effect and can retain image details better. 3.5 Surveillance Video Pedestrian Detection Before detecting a target surveillance video pedestrian, the surveillance operator must divide the area of interest within the surveillance scene, which is usually a pre-set rectangular area. Let the known delimited region of interest be rectangular We , and the coordinates in the upper left corner of We be (x1 , y1 ), and those in the lower right corner of We be (x2 , y2 ). If the monitored pedestrian mass center of the video is a rectangle, and the external rectangle area of the video pedestrian exceeds the set threshold, the pedestrian is considered passing by and giving an alarm, otherwise the alarm will not be issued. The judgment formula is: true, if AreaEe (11) t= false, else In formula (11), true means video surveillance captured pedestrians; false indicates video surveillance missed pedestrians; and AreaRe indicates a rectangular area. In the
114
H. Liu and L. Xie
monitoring system, an alarm is raised when a pedestrian enters a predetermined area of interest, and not when the intended range is not reached. An alarm can be given during the entire detection process when entering the predefined area of interest rectangle through monitoring the pedestrian’s center of mass, otherwise it will not be given.
4 Experiment 4.1 Experimental Environment The experimental platform is ECS, the operating system is Ubuntu 16.04, the graphics card model is GeForce GTX 2080Ti, the video memory is 11 GB, the memory is 16 GB, the Cuda version is 10.0.130, and the OpenCV version is 3.2.00. 4.2 Experimental Data Set The training and testing data sets used in this experiment are all from PASCAL VOC data sets. VOC2007train, valid and VOC2012 train, valid data sets are used for training. In order to verify the effectiveness of the method, the VOC2007 test data set is used for verification. The total training data is 22136 pictures, including 6496 pictures of pedestrians, and the total validation data is 4952 pictures, including 2097 pictures of pedestrians. 4.3 Experimental Parameter Setting Only the pedestrian category is trained. The default size of the input image is 416 × 416, the number of input channels is 3, the set number of iterations is 50200, the batchsize is 64, and the learning rate is 0.001. When the number of iterations reaches 40000, the learning rate is updated to 0.01, and the processed dataset is trained in the same performance server. Under the same experimental environment and experimental parameters, the network is trained. 4.4 Experimental Evaluation Index In order to evaluate the detection effect numerically, FA and FB are two indicators are used, in which FA indicates the accumulation degree of target identification error, and FB indicates the measurement of the average accuracy of target matching. The calculation formula is as follows: P1 + P2 FA = 1 − Bt
t
et,i
ti i
FB = i
(12)
Dt
(13)
Pedestrian Detection in Surveillance Video Based on Time Series Model
115
In formula (12), P1 represents the omission rate; P2 represents the false alarm rate; and Bt represents the actual number of targets at time t. In formula (13), et,i represents the i th target matching error at time t; Dt indicates the number of successfully matched targets. The smaller the results, the better the detection effect. The larger the calculation results are FA , the better the detection results are FB . 4.5 Subject Determination Three different infrared image test sets were used for infrared human body test experiments. Test set 1 comes from the video monitoring results of traffic pedestrians. It is a multi-human test set, which includes pedestrians’ adhesion (occlusion). Test set 2 comes from video monitoring results of traffic footpaths, a 2-person test set. Test set 3, derived from the video monitoring results of traffic footpaths, is a 1-person test set, and the subjects are shown in Fig. 4
. (a) test set 1
(b) test set 2
(c) test set 3 Fig. 4. Experimental subjects
The test set shown in Fig. 4 is the standard test result to analyze the rationality of the surveillance video pedestrian detection method based on the time series model. 4.6 Experimental Results and Analysis The design method, reference [2] method, reference [3] method and reference [4] method were used for comparative test. The test results are shown in Fig. 5.
116
H. Liu and L. Xie
Fig. 5. Comparative analysis of the detection results of the three methods
As can be seen from FIG. 5, the reference [2] method are used for feature matching, but lack of noise processing, and multi-target is taken as a background image occurs. Neither The the reference [3] nor reference [4] method identified all of the occluded targets. But use design method can identify all pedestrians through multi-target background recognition and noise processing.
Pedestrian Detection in Surveillance Video Based on Time Series Model
117
To further verify the reliability of the surveillance video pedestrian detection method based on the time-series model, the calculation results of the three methods FA and FB were compared, as shown in Fig. 6. Feature matching Binocular stereo matching Time series model
100
100 80
60
60
FA/%
FB/%
80
40
40
20
20
0
Feature matching Binocular stereo matching Time series model
2
4 6 8 Detection times/times
(a)
FA
10
0
2
4 6 8 Detection times/times
(b)
10
FB
Fig. 6. Comparative analysis of the calculation results of the three methods FA and FB
According to Fig. 6, the maximum FA is 42% based on feature matching, FB and 62%, the detection method FA 38% and FB 61%, and the time series model based detection method FA 19% and FB 90%. According to the above analysis results, using the time-series-based model detection method is more effective.
5 Conclusion The pedestrian detection method in surveillance video based on time series model is studied. First, foreground segmentation is performed to identify pedestrian foreground. Then the region of interest is segmented and the relevant features are extracted. Finally, an enhanced feature point detection network model is constructed, and pedestrian detection is performed by using time-domain and frequency-domain information as symbol sequences. The comparison experiment verifies the judgment effect of the method and increases the authenticity of the method research. Although the target detection has been greatly improved, there are still some defects. In the next step, we will focus on the following contents: Although the coverage of the image can be reduced by using segmentation methods, due to the existence of non relative moving objects, corresponding recognition algorithms must be used for segmentation and correction, which not only wastes a lot of time, but also reduces the segmentation accuracy. In the future research, we will focus on how to reduce the error and further improve the segmentation accuracy. Aknowledgement. 1. 2018 Anhui Provincial University Natural Science Research Key Project: Research and Application of Intelligent Pedestrian Detection and Tracking Method Based on Digital Video (KJ2018A0669)
118
H. Liu and L. Xie
2. 2017 Anhui Provincial University Natural Science Research Key Project: Research on Key Technologies of High-Performance Computer Fault-tolerant Systems in Heterogeneous Environments (KJ2017A837)
References 1. Shen, X.: live in peace in white. The study for scene recognition of surveillance video based on semi-supervised feature fusion. Comput. Simulat. 38(1), 394–399 (2021) 2. Ye, L., Li, W., Zheng, L., et al.: Multiple object tracking algorithm based on detection and feature matching. J. Huaqiao Univ. Nat. Sci. 42(5), 661–669 (2021) 3. Zhang, J., Ji, F.: Moving target real-time tracking system based on binocular vision. China Comput. Commun. 34(5), 122–124 (2022) 4. Xiao, J., Shen, M., Jiang, M., et al.: Abnormal Behavior Detection Algorithm With Videobag Attention Mechanism in Surveillance Video. Acta Automatica Sinica 48(12), 2951–2959 (2022) 5. Jian, Y., Ji, J.: Research on pedestrian identification model in optical sensor monitoring system Laser J. 41(03), 82–85 (2020) 6. Zhang, B., Zhao, W., Duan, P., et al.: Surveillance Video Re-Identification with Robustness to Occlusion. J. Signal Process. Signal Process. 38(06), 1202–1212 (2022) 7. You, F., Liang, J., Cao, S., et al.: Dense pedestrian crowd trajectory extraction and motion semantic information perception based on multi-object tracking. J. Trans. Syst. Eng. Inform. Technol. 21(06), 42–54+95 (2021) 8. Liu, J., Li, X., Ye, L., et al.: Pedestrian detection algorithm based on improved RetinaNet. Sci. Technol. Eng. 22(10), 4019–4025 (2022) 9. Qi, P., Wang, H., Zhang, J., et al.: Crowded pedestrian detection algorithm based on improved FCOS. CAAI Trans. Intell. Syst. 16(04), 811–818 (2021) 10. Liu, S., He, T., Dai. J.: A survey of CRF algorithm based knowledge extraction of elementary mathematics in Chinese. Mobile Netw. Appli. (2021) 11. Liu, S., He, T., Dai, J., et al.: Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 29(1), 90–102 (2021) 12. Gao, P., Li, J., Liu, S.: An introduction to key technology in artificial intelligence and big data driven e-Learning and e-Education. Mobile Netw. Appli. (2021)
Computer Vision Based Method for Identifying Grouting Defects of Prefabricated Building Sleeves Shunbin Wang(B) and Lin Wu ZaoZhuang Vocational College of Science and Technology, Zaozhuang 277599, China [email protected]
Abstract. In order to accurately detect the grouting defects of construction sleeves, a method for identifying the grouting defects of prefabricated construction sleeves based on computer vision is proposed. The 3D target detection algorithm is used for feature point detection, and the corner detection model of assembled building is constructed to obtain the feature angle detection results and match the feature angles. The constraint relationship is extracted from the data in the image sequence of the grouting defects of the prefabricated construction sleeve, and the parameters of the filling defects of the construction sleeve are calibrated. Finally, the defect identification is completed through the sleeve grouting scanning of computer vision. The experimental results show that the method based on computer vision to identify the grouting defects of prefabricated construction sleeve has strong recognition ability and can well complete the defect identification. Keywords: Computer Vision · Fabricated Buildings · Construction Sleeve · Sleeve Grouting · Identification of Grouting Defects
1 Introduction The modernization of the construction industry is the development trend of the future construction industry, and it is the fundamental change of the construction production mode from extensive production to intensive production. Compared with the traditional cast-in-place building model, which has large environmental pollution and long production cycle, the prefabricated component model of prefabricated building has the advantages of fast construction speed, reliable quality, energy saving and environmental protection, and labor saving [1]. The key of prefabricated structure is how to ensure the connection performance between prefabricated components and the overall performance of the structure. The quality of these performance is closely related to the connection between prefabricated building components. The working performance and durability of node connection will directly affect the reliability of connection and the safety of the overall structure. At present, the most commonly used connection method for the main reinforcement of prefabricated components in prefabricated concrete structures is the reinforcement sleeve © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 119–136, 2024. https://doi.org/10.1007/978-3-031-50574-4_9
120
S. Wang and L. Wu
grouting connection, that is, the connecting reinforcement is inserted into the metal sleeve, and then the high-strength grouting material is poured. The stress is transferred through the bonding of grouting material and sleeve wall and the bonding of grouting material and reinforcement [2]. The node connection is the key to the fabricated concrete structure, and it is an important guarantee that its overall performance is equivalent to that of the cast-in-place. Therefore, the quality of the grouting connection of the steel sleeve is very important. In my country, due to the short development time and fast development speed of steel bar sleeve grouting connection, insufficient training of on-site personnel, and the need to improve the precision of factory production, there are still some problems in grouting quality. Moreover, in actual engineering, due to the influence of various factors, the injected grouting material may return, coagulate, harden under the action of its own fluidity and gravity, or the air inside the sleeve cannot be effectively discharged, resulting in the sleeve. There is a hollow at the end or in the middle, which reduces the effective anchorage length of the steel bar in the sleeve. The grouting defect will lead to the decrease of the joint connection performance, which will seriously affect the overall performance of the prefabricated building, thus making the prefabricated building have greater safety hazards. Therefore, related research is very necessary and urgent [3, 4]. In recent years, with the continuous development of computer, big data network, aerospace and other high-tech, image rendering technology has been widely used in many research fields. The image drawing work is to use image processing technology to conduct professional operation and processing on the actual scene and screen through shooting, so as to form a scene information picture that meets the work requirements and standards [5, 6]. At present, image rendering technology faces great challenges in terms of operation. Due to the expensive and complex image capturing equipment, the common camera hardware facilities can not meet the standard requirements, and the captured images are poor in terms of pixels. For some scenes that require comprehensive and detailed image rendering, high-end equipment has to be used to complete the operation, which increases the complexity of the shooting process. When there are obvious light changes, noise effects and size changes in the real scene, the traditional methods have different degrees of error. Therefore, the current image mosaic technology has adopted a lot of research on how to further improve the image accuracy and simplify the image matching process, and the image adoption and other factors in the mosaic process have also been focused on. To sum up, this paper uses computer vision to identify the grouting defects of prefabricated construction sleeves. Computer vision can identify, track and measure the targets by using cameras and computers. It is an effective image processing method. After the detection image obtained by computer vision is transmitted to the computer, it can effectively improve the information processing ability and speed up the information processing speed. The accuracy and effectiveness of defect identification can be effectively improved by three-dimensional scanning of sleeve filling defects.
Computer Vision Based Method for Identifying Grouting
121
2 Feature Recognition of Grouting Defects in Prefabricated Building Sleeves Based on Computer Vision 2.1 Feature Point Detection There are a large number of key feature points in the prefabricated building that need to be displayed externally, but the distribution position of the key feature points cannot be directly displayed through the 3D model, and each key feature point in the prefabricated building needs to be detected before display [7, 8]. The 3D target detection algorithm of computer vision can detect and identify key feature points in a targeted depth, especially for the key feature points that are deeply hidden inside the prefabricated building. The 3D target detection algorithm can even track the key feature points in real time. The 3D object detection algorithm first extracts and identifies the three view photopigment features of the prefabricated building, and then searches for the detection points of the corresponding positions and features in the 3D model. The features of different photopigments are composed of different depth map codes. The arrangement of codes can be completely applicable in the exhibition space of prefabricated buildings, and the mapping of the external display phenomenon of photopigments is difficult to distinguish in the eyes of people. Therefore, the 3D object detection algorithm should define the coded structure through computer technology to facilitate the identification of all key feature points in the prefabricated buildings. The 3D object detection process is shown in Fig. 1 below: Xw
Yc
Xc
Zc Ow
Yw
Fig. 1. 3D object detection process
As can be seen from the above figure, 3D coordinate points are established, and target detection is realized through 3D coordinate points. The key feature points identified by the 3D target detection algorithm enter the candidate area frame and wait for verification. The multi-feature energy loss function is used to convert the light of the key feature points. Contents such as pigment characteristics, semantic information and point cloud density are added to the computational model to obtain the feature extraction frame and pose information of key feature points [9, 10].
122
S. Wang and L. Wu
The expression of the multi-feature energy loss function is shown in formula (1): L(y, f (x)) = max(0, −f (x))
(1)
wherein, x and y represent the horizontal and vertical coordinates of key feature points in the 3D model respectively; f (x) represents the model boundary of prefabricated buildings; L represents the characteristic energy value of key characteristic points of prefabricated buildings [11]. After the feature extraction of key feature points and the confirmation of attitude information are completed, all detection information will be transmitted to the dedicated network of the prefabricated building display platform. The computer vision information preprocessing system will redefine the specification of the 3D object detection algorithm according to the amount of information transmission, so that the detection target of the algorithm is associated with the camera monocular detection target of computer vision, the key feature points detected are highlighted in the prefabricated building display platform. The key feature points of the prefabricated building in the prefabricated building display platform are in a static state, while the actual prefabricated building displayed is in a moving state, so it is necessary to plan the movement trajectory of the feature points in the prefabricated building display platform. The process of planning the trajectory is the preprocessing process of key feature points. On the basis of computer vision, a motion data sequence with defects is established, and key feature points, data preprocessing packages and motion compression frames are added to the motion data sequence. Then, the motion data sequences at different positions are segmented and identified to ensure that each motion data sequence is continuously discovered and processed in the prefabricated building display platform [12]. In order to standardize the display of motion data series only in the prefabricated building display platform, it is also necessary to establish a low dimensional local linear model. There are a large number of missing marker points in the model. Each missing marker point has the reconstruction function to guide the movement of key feature points. The steps to guide the motion of key feature points by losing marker points are shown in Fig. 2 below: (1) PCA calculation is carried out for the set of missing marker points, and a representative marker point is selected as the “main marker point” among many missing marker points. The PCA calculation matrix equation is as follows: Z = (a1 , a2 , · · · , am ; b1 b2 · · · bm )
(2)
In the formula, Z represents the “main marker”; a and b represent lost markers of different categories. (2) Segment and disperse the marker point group except the main marker point into a low-dimensional linear model; (3) A feature extractor is installed in the linear model, which can reflect the mapping relationship between the marker points other than the main marker point and the motion data, and transmit this mapping relationship to the main marker point to enrich the feature recognition range of the main marker point; (4) Then, a large amount of motion data will be transmitted to the main marker point, and the main marker point will establish a new model according to the feature position of the feature points of the motion data, so that the motion data will move in an
Computer Vision Based Method for Identifying Grouting
123
Start
PCA calculation loses information
marker population segmentation
N whether to interrupt
Y
Linear feature extraction
Y Whether the conversion is complete
N
Send motion data
End
Fig. 2. Lost marker points guide the movement process of key feature points
infinite cycle in the new model according to the provisions. When the key feature points can circulate infinitely in the prefabricated building display platform, it is the basis for real-time display of 3D prefabricated buildings [13]. 2.2 Feature Corner Detection and Extraction Detect the characteristic corners of prefabricated buildings, take the circular template as the object template, and take the center of the template as the core. The corner detection model of the prefabricated building is shown in Fig. 3:
124
S. Wang and L. Wu B
C
A
E F
D
Fig. 3. Prefabricated building corner detection model
Observing Fig. 3, it can be seen that the detection moves within the scope of the basic template of the circular template, observes the photosensitive area of the template, and records the changes between the prefabricated building area and the original data [14]. The area distribution of prefabricated buildings is shown in Fig. 4 below:
B
A
C
D
F
E
Fig. 4. Area distribution of prefabricated buildings
According to Fig. 4, the larger the prefabricated building area contacted at the edge of the image, the smaller the area near the corner, and the area at the core corner is the smallest. Through the detection of the area of the prefabricated building area, the corner position can be identified, the change and contrast of the corner can be monitored, and the more accurate monitoring results can be obtained finally.
Computer Vision Based Method for Identifying Grouting
125
2.3 Prefabricated Building Algorithm Feature Corner Extraction and Matching The parallel movement of the template in the image range, the pixel change of the photosensitive area and the change of brightness can be calculated and expressed by the following formula: 1, if |Ia − Ib|≤ th (3) c(a, b) = 0, if |Ia − Ib|≥ th In formula (3), b is the location of the core feature corner of the two-dimensional image, and a is any point in the image except the core feature corner; Ia represents the brightness of the location of point a, and Ib represents the brightness of the location of core point b; th represents the luminance range of the total number of generated corners. From this formula, the brightness value of the characteristic corner can be calculated, laying the foundation for the accuracy of the homography matrix operation. g − n(a), n(a) ≤ g R= (4) g=0 As shown in formula (4), in order to find as many special points as possible, it is assumed that the total number of feature corner points is R, the maximum value in the template value range is n, and g is used to represent the corner points after eliminating the influence of noise. Generate positions so that all possible feature corners can be found with the greatest probability after the operation [15]. After the feature corners are extracted, the rest of each image loses its use value. Without processing, it will occupy more storage space, resulting in jumbled data in memory. Therefore, this paper uses the gray detection method to calculate the redundant feature corners and match the related images or feature corners again. After the second screening, the unqualified feature points can be directly excluded, The reserved feature corners can be used as standby data. It should be specially pointed out that there will also be unreal matching corner points in these matched feature points. Therefore, you must choose carefully in the process of matching focus image selection.
3 Recognition of Grouting Defects in Prefabricated Building Sleeves Based on Computer Vision 3.1 Calibration of Grouting Defects of Prefabricated Building Sleeves Calibrate the grouting defects of prefabricated building sleeves based on computer vision. In an ideal situation, the conventional prefabricated building sleeve grouting defects use a standard reference to constrain the image of the prefabricated building sleeve grouting defects, so as to determine the parameters of the prefabricated building sleeve grouting defect model. The parameter matrix calibration method can obtain higher calibration accuracy, but it captures more parameters, which needs to be applied to the functional equation set, so that the computer calibration equipment increases the amount of calculation, which is easy to cause unstable operation.
126
S. Wang and L. Wu
Under natural visual conditions, when the optical parameters change in terms of focal length, multiple, etc., it is difficult to select the standard reference object. The method proposed in this paper does not need to establish a standard 3D coordinate system of the reference object, but directly through the assembly. The motion of the grouting defect of the prefabricated building sleeve is used to obtain the image sequence of the grouting defect of the prefabricated building sleeve, and the constraint relationship is extracted from the data in the image sequence of the grouting defect of the prefabricated building sleeve to calculate the calibration parameters of the grouting defect of the prefabricated building sleeve. The image sequence of the grouting defect of the prefabricated building sleeve is shown in Fig. 5: z
z
ly mb e s As fects de y
x o
X
y
Fig. 5. Image sequence of prefabricated building sleeve grouting defects
The calibration method of prefabricated building sleeve grouting defects applied in this paper also combines the chessboard calibration method. First, the chess board is used as the calibration plate, and the mobile prefabricated building sleeve grouting defects are photographed and scanned from different angles to detect all angle points in each picture. Without considering the angle change, the matrix interactivity is used to obtain the internal parameters of the prefabricated building sleeve grouting defect through the parameter linear equation, and then the least square method is used to calculate the radial coefficient of the parameters. Finally, the error range is narrowed by using the projection error principle of the prefabricated building sleeve grouting defect, and the calibration information is optimized. As shown in Fig. 6, the simulation template of chessboard calibration method is: 3.2 Three-Light-Band Laser Stereo Vision Scanning Based on Computer Vision The three-band laser stereo vision scanning method based on computer vision used in this paper is mainly composed of multiple three-band laser stereo vision sensors. The angle deviation method is used to capture and calculate the animation data in threedimensional space. At the same time, the three-dimensional scanning simplified image is also used to carry out three-dimensional three-dimensional point tracing on the grouting
Computer Vision Based Method for Identifying Grouting
127
Fig. 6. Simulation template of chessboard calibration method
defect image of the prefabricated construction sleeve. The scanning of sleeve grouting defects in prefabricated buildings by the three-band laser stereo vision scanning method in computer vision can obtain more accurate scanning results, provide effective support for the identification of sleeve grouting defects, and help to improve the accuracy of sleeve grouting defect identification. In the process of projecting the grouting defects of the prefabricated building sleeve by the three-light belt laser stereo vision scanning, the parameter matrix projection method was mainly used in the early stage, and the parameter sequence was set to a matrix form in the form of 3*4, and the parameters were determined. The influencing factors of the internal environment are placed in the positioning system of the relative world coordinate system of the grouting defect of the 3D laser stereo assembled building sleeve, and the parameters in the system are called the external parameters of the stereo scan matrix. Combining the characteristics of internal and external parameters, and using the parameter equations of relevant 3D parameters, the equations are expressed in a linear form to form different types of line segments, so that the specific details of the 3D human body animation in space in the world coordinate system can be better obtained. Coordinate position, this paper also adds coordinate constraints to the precise coordinate system of grouting defects in prefabricated building sleeves, and directly uses computer vision three-light belt laser stereoscopic grouting defects in prefabricated building sleeves to obtain more accurate coordinate positions. For the three light band laser stereo vision scanning and projection of the scanning site of the grouting defects in the 3D fabricated building sleeve, the position relationship between the two devices is solved by using the matrix algorithm. In the process of solving, the main left and right coordinates of the 3D human body coordinates are expressed in the form of a 3 * 3 grid projection matrix.
128
S. Wang and L. Wu
The limit geometry method of stereo vision is an important stereo vision relationship in the stereo vision scanning method of 3D prefabricated building sleeve grouting defects based on computer vision. The limit geometry relationship diagram for the stereo vision scanning of prefabricated building sleeve grouting defects is shown in Fig. 7: A O 1 left image
imaging base
O right image B
C
Fig. 7. Limit geometric relationship diagram of defect stereo vision scanning
According to the relevant definition description of limit geometry, in the process of scanning grouting defects of prefabricated building sleeves, the search for image matching points in the 3D image does not need to be carried out in the overall 3D image space of grouting defects of prefabricated building sleeves, but Use the geometric relationship to find the geometric limit point of the image in the 3D image, find the conjugate ray, and define the matrix parameters of the ray to find the geometric common focus on the 3D human body features along the geometric linear relationship, and find the deformed light to judge the three-band laser The stereo vision scanning relationship. The grouting defect of the sleeve of the three band laser assembled building moves from top to bottom together with the laser under the working state. The binocular stereo conscious scanning system is controlled and fixed by the camera, which can comprehensively apply the 3D scanning technology on the basis of the mechanical structure.
4 Data Processing of Grouting Defects in 3D Prefabricated Building Sleeves Data capture of grouting defects in prefabricated building sleeves is one of the important ways to synthesize human motion animation. Under normal circumstances, the original data captured by motion capture equipment will be scattered due to noise and other environmental factors. Due to the lack of data and other factors, this paper mainly focuses on post-processing and collection of the grouting defect data of the 3D scattered assembled building sleeves obtained by the motion capture device. In this paper, in the process of collecting the data of grouting defects of 3D prefabricated building sleeves, the noise reduction processing of the data is first carried out. Because the grouting defects of prefabricated building sleeves have a certain degree of fixation, resulting in a certain error in the acquired 3D defect data. All the data with errors belong to noise data, so the noise reduction processing of the data in this paper
Computer Vision Based Method for Identifying Grouting
129
is an inevitable means. In the noise data space, stereo matching is carried out for the projection retained at a specific time point in the process of collecting the grouting defect data of prefabricated building sleeves. The matched data is called temporal pulse noise data. This data will often appear in the overlapping motion database, and will be marked or overlapped. In most cases, this data will lead to the loss of acquired defect data, Therefore, it is impossible to collect the grouting defect data of the three band laser 3D fabricated building sleeve. After obtaining the data source of the grouting defect of the prefabricated building sleeve, first preprocess the data source of the grouting defect of the prefabricated building sleeve, and then use the template data to plan the preprocessed data, and the planned data forms a matching state. The data of the matching state is put into the prefabricated building sleeve grouting defect data topology. If the matching is successful, it will be displayed to the 3D simulation-driven graphics. If the matching is not successful, the data will be tracked, the topological structure of the data that has not been successfully matched will be checked and the frame number of the data will be replaced, and finally the prefabricated building will be set. Central processing of barrel grouting defect data.
5 Identification of Grouting Defects There are three main steps to apply computer vision to image mosaic in computer system. The precise homography matrix algorithm is needed as the operation basis in computer system program. In the process of image transformation and mosaic, it is necessary to estimate the linear motion change of feature points. Assuming that two images are A(x, y) and B(x , y ), the process of image mosaic and transformation can be expressed as: ⎞⎛ ⎞ ⎛ ⎞ ⎛ h0, h1, h2 x x ⎟⎜ ⎟ ⎜ ⎟ ⎜ (5) k = ⎝ y ⎠ = ⎝ h3, h4, h5⎠⎝ y ⎠ w h6, h7, h8 w Among them, k represents the scale element of the image; X = (x, y, w) is the coordinate of the image feature point; h is the independent parameter of the two images as a whole, there are a total of 8 independent parameters, as long as 4 independent parameters are arbitrarily selected, the matching can be performed, and the h value can be estimated. However, this algorithm may also generate operational errors due to the uncertainty of the position of the feature points. Therefore, in the process of selecting any feature point, it is necessary to pay attention to whether the four points are repeated. If so, the answer of the linear equation is uncertain. Sex will increase accordingly. In order to get more accurate results, the selected matching points must be nonlinear optimized. The feature points obtained through the previous process still have the possibility of data uncertainty or missing feature points. This paper uses computer vision to randomly extract different feature points for filtering. This algorithm is the most widely used estimation algorithm in the computer system. Its operation power is very high, and it can still efficiently complete the matching when the data matching failure rate is high.
130
S. Wang and L. Wu
At the same time, It can also filter the data set on a large scale under the condition that the probability of the operation result is almost unchanged. It is precisely the application of computer vision that enables feature corners to maintain a high level of accuracy and integrity. While computer vision ensures the authenticity and accuracy of feature points, it cannot reduce the interference of noise factors on feature point extraction. Noise interference often leads to inaccurate results obtained by homography matrix. This paper is based on computer vision to ensure that the homography matrix pair The accuracy of image matching feature point evaluation. Under the same environmental background, KLT can track dynamic objects with very high accuracy and efficiency. When the movement position of key points in the stitched image changes greatly, the KLT tracking algorithm can be used to stitch the matching images into one more accurately. Full panorama image.
6 Experimental Studies In order to verify the effectiveness of the method proposed in this paper for identifying the grouting defects of prefabricated building sleeves based on computer vision, comparative experiments are designed to compare the method proposed in this paper for identifying the grouting defects of prefabricated building sleeves with the traditional method for identifying the grouting defects of prefabricated building sleeves based on SURF and the method for identifying the grouting defects of prefabricated building sleeves based on edge fusion, The model effect is analyzed by comparing the results. The experimental image sample is shown in Fig. 8 below:
prefab stairs
Laminated board
Laminated board
Fig. 8. Tracking shot scene sample
Computer Vision Based Method for Identifying Grouting
131
According to Fig. 8, there are about 72 more accurate matching feature points in this paper. The test site is shown in Fig. 9 below:
Fig. 9. Test site
Using the computer vision-based grouting defect recognition method for prefabricated building sleeves proposed in this paper, the traditional SURF-based grouting defect recognition method for prefabricated building sleeves, and the edge fusion-based method for grouting defects in prefabricated building sleeves are used to simultaneously analyze the above images. After splicing, the experimental results of the feature point offset parameters obtained are shown in the following table: It can be seen from the data in the above table that the offset change of the image feature points of the computer vision based prefabricated building sleeve grouting defect recognition method proposed in this paper is much more stable than that of the traditional SURF based prefabricated building sleeve grouting defect recognition method and the edge fusion based prefabricated building sleeve grouting defect recognition method. Because this model regards the feature points as dynamic observation targets. The KLT tracking method is used to calculate the dynamic changes of feature points, so the more accurate homography matrix operation result is finally obtained, so that the image stitching accuracy is effectively improved. The distribution of feature points in the tracking algorithm is in a stable symmetrical distribution state, which enables the homography matrix to obtain more accurate position data in the operation, thus improving the accuracy of image mosaic and the integrity of the panoramic image mosaic. In order to verify the degree to which the grouting defect identification method for prefabricated building sleeves based on computer vision proposed in this paper is affected by external factors, this paper also designs three models for the same splicing under the influence of noise factors, size differences and image deformation. The matching degrees of the feature points of the objects are compared, and the comparison results obtained are shown in Fig. 10:
132
S. Wang and L. Wu Identification method of grouting defects in prefabricated building sleeves based on SURF Recognition method of grouting defects in prefabricated building sleeves based on edge fusion Recognition method of grouting defects in prefabricated building sleeves based on computer vision
Feature point matching degree%
99 98 97 96 95 94 93 92 91
light and shadow
noise
scale
Fig. 10. Experimental results of feature point matching degree
According to the above experimental data, KLT algorithm has higher matching degree and faster processing speed than the traditional splicing algorithm. It has a strong ability to deal with overlapping parts of images, image size differences, deformation and rotation problems, light and shadow changes, and noise interference, and has a fast reaction speed, fast image mosaic and fusion speed, high efficiency, and stronger completion. In the face of the disorder of image sequence, it can also intelligently and automatically complete image sequence arrangement for more complete image mosaic. In the aspect of natural scene image mosaic, this mosaic algorithm can also properly handle the changes and transitions of light and shadow colors, so that the final product presents a perfect state of natural coordination. The image mosaic effect obtained by the computer vision based method for identifying the grouting defects of prefabricated building sleeves is shown in Fig. 11 below:
Computer Vision Based Method for Identifying Grouting
133
(a) Before splicing of prefabricated buildings
(b) After splicing of prefabricated buildings Fig. 11. Image stitching effect obtained by the computer vision-based method for identifying grouting defects in prefabricated building sleeves
From the experimental results, the computer vision-based method for identifying grouting defects in prefabricated building sleeves proposed in this paper is more accurate and efficient than the traditional splicing model. However, there is still a certain degree of error under the interference of the external environment, and there are still some indistinguishable blurs on the stitched image. However, in view of the current application of image stitching, computer vision has strong stitching capabilities and can be used in a wider range of fields in the future. The results obtained after defect identification are shown in Fig. 12 below:
134
S. Wang and L. Wu
Beam side plate
feed port
corner keel
floor panel
K board Floor side keel
Inside corner keel
end beam
middle beam
Beam bottom plate
beam bottom keel Exterior wall panels into order
single support
interior wall panel Bracing
Outer corner keel
Fig. 12. Defect identification experimental results
In order to further verify the defect recognition performance of the method in this paper, taking the defect recognition accuracy as the index, the paper applies the computer vision-based prefabricated construction sleeve grouting defect recognition method, the SURF-based prefabricated construction sleeve grouting defect recognition method, and the edge fusion based prefabricated construction sleeve grouting defect recognition method for defect recognition, The higher the accuracy of defect identification, the stronger the defect identification performance of the method. The defect identification accuracy results of the three methods are shown in Table 4. Table 1. Offset parameters of feature points based on SURF-based identification method for grouting defects in prefabricated building sleeves x
y
−0.7065
−0.2876
0.2544
−0.2461
0.1123
0.0536
0.1196
0.3217
0.2236
−0.0415
From the results of defect identification accuracy shown in Table 4, we can see that the identification accuracy of this method is 99.62%, while the identification accuracy of the two traditional identification methods is 82.34% and 86.41% respectively. Therefore, it shows that this method can accurately identify the grouting defects of prefabricated construction sleeves (Tables 1, 2 and 3).
Computer Vision Based Method for Identifying Grouting
135
Table 2. Splicing model of the identification method of grouting defects in prefabricated building sleeves based on edge fusion x
y
−0.2541
−0.2806
0.2683
−0.2982
0.1007
0.0985
0.1276
0.2817
0.2924
−0.0405
Table 3. Splicing model based on computer vision-based grouting defect identification method for prefabricated building sleeves x
y
0.0527
−0.0549
0.3635
−0.2354
0.2105
0.2265
0.0842
−0.3314
0.0731
0.2179
Table 4. Results of defect identification accuracy Defect identification accuracy/% Recognition method based on computer vision
SURF-based recognition method
Recognition method based on edge fusion
99.62
82.34
86.41
7 Conclusion In order to improve the construction quality of prefabricated buildings, a method for identifying sleeve grouting defects in prefabricated buildings based on computer vision is proposed, and its performance is verified from both theoretical and experimental aspects. It has high defect identification accuracy when identifying sleeve grouting defects. Specifically, compared with the SURF-based recognition method and the edge fusion-based recognition method, the defect recognition accuracy of this method is significantly improved, and the recognition accuracy of this method is 99.62%. Therefore, it shows that the proposed defect identification method based on computer vision can better meet the requirements of grouting defect identification for prefabricated construction sleeves. In the future research work, the efficiency of defect identification will be
136
S. Wang and L. Wu
further tested and studied to improve the practical application reliability of the defect identification method.
References 1. Dan, D., Dan, Q.: Automatic recognition of surface cracks in bridges based on 2D-APES and mobile machine vision. Measurement 168(6), 108429 (2021) 2. Lin, Z., Lai, Y., Pan, T., et al.: A new method for automatic detection of defects in selective laser melting based on machine vision. Materials 14(15), 4175 (2021) 3. Wang, S., Wang, C., Li, W., et al.: Study on the operational efficiency of prefabricated building industry bases in Western China based on the DEA model. Arab. J. Geosci. 14(6), 446 (2021) 4. Zhao, Y., Chen, D.: A facial expression recognition method using improved capsule network model. Sci. Program. 2020(2), 1–12 (2020) 5. Zhang, X., Gong, W., Xu, X.: Magnetic ring multi-defect stereo detection system based on multi-camera vision technology. Sensors 20(2), 392 (2020) 6. Xu, W.: Simulation Study on stability evaluation of spatial structure of building interior environment. Comput. Simulat. 37(02), 455–458+462 (2020) 7. Wen, Y., Fu, K., Li, Y., et al.: A sliding window method to identify defects in 3D printing lattice structure based on the difference principle. Meas. Sci. Technol. 32(6), 65–78 (2021) 8. Yi, M., Deyuan, Z., Xuan, Z., et al.: Research on defect detection of sleeve grouting in a precast column based on bp neural network. Structural Engineers 38(03), 24–32 (2022) 9. Sun, H., Gao, Y., Zheng, X., et al.: Failure mechanism of precast defective concrete based on image statistics. J. Build. Mater. 24(06), 1154–1162 (2021) 10. Du, Y., Du, J.: Research on recognition of sleeve grouting quality defects based on piezoelectric wave method. Build. Struct. 51(09), 49–55 (2021) 11. Wang, J., Xiao, Q., Shen, Q., et al.: Analysis on the axial behavior of cfrp wrapped circular cfst stub columns with initial concrete imperfection. Prog. Steel Build. Struct. 23(06), 44–53+70 (2021) 12. Yang, C., Shi, C., Su, S., et al.: Detection and analysis of grouting compactness based on frequency spectrum analysis and wavelet packet entropy technology. Build. Struct. 51(16), 110–115 (2021) 13. Jiang, M., Wei, X., Sun, Z., et al.: Study on non-destructive testing methods for composite slabs and wall panels in precast concrete structure. Build. Struct. 52(S1), 2144–2149 (2022) 14. Xiao, S., Li, X., Xu, Q.: Research progress of quality detection and defect repair for sleeve grouting in assembled monolithic concrete structure. Build. Struct. 51(05), 104–116 (2021) 15. Zhao, J., Yajing, Shang, J., et al.: Comprehensive analysis and judgment of hole defects in steel plate shear wall with ultrasonic testing. Build. Struct. 52(03), 94–98 (2022)
Stability Detection of Building Bearing Structure Based on Bim and Computer Vision Lin Wu(B) and Shunbin Wang ZaoZhuang Vocational College of Science and Technology, Zaozhuang 277599, China [email protected]
Abstract. The stability of building load-bearing structure is closely related to the building life. Unstable building load-bearing structure has a great potential safety hazard. The stability detection method of building load-bearing structure based on BIM and computer vision is designed. The computer vision system for image acquisition of building load-bearing structural components is designed, which is composed of CCD camera, image acquisition card, light source and UAV. The secondary development technology of Revit is applied to establish the overall structural information model, and the influencing factors of structural stability are introduced into the information model. Build a stability detection network model based on YOLOv4 network architecture, and implement the stability detection of building load-bearing structures. The test results show that the accuracy rate of the design method is higher than 0.920, the recall rate is higher than 97%, and the test time for new and old buildings is 106.85 min and 75.41 min respectively under different confidence levels. Keywords: BIM · Computer Vision · Structural Information Model · Building Load-Bearing Structure · Stability Test
1 Introduction During the long-term use of engineering structures, under the action of internal or external, artificial or natural factors, material aging and structural damage will occur as time goes by, which is an irreversible objective law. Such damage accumulation will lead to deterioration of structural performance, reduction of bearing capacity and durability. If the law and degree of such damage can be scientifically evaluated and effective treatment measures can be taken in time, the process of structural damage can be delayed and the purpose of extending the service life of the structure can be achieved [1]. Therefore, the reliability evaluation method and reinforcement technology of existing structures have become a hot issue in the engineering field. Many engineering technicians and research groups have turned their attention to this field, and put the identification and reinforcement technology of engineering structures in a very prominent position. Among them, the stability detection of building load-bearing structures is an important research content. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 137–150, 2024. https://doi.org/10.1007/978-3-031-50574-4_10
138
L. Wu and S. Wang
Structural reliability refers to the ability of the structure to complete the predetermined functions within the specified time and under the specified conditions. The specified time refers to the design service life of the structure, and the specified conditions refer to the normal design, normal construction, normal use and maintenance. The intended functions include safety, applicability and durability [2]. The purpose of engineering structure design is to use the performance of the structure to complete the predetermined function of the design. Its main function is to transmit and resist various actions of human or nature through the structure, so that the building can complete the predetermined function within the specified time and under the specified conditions. Reference [3] proposes a building stability detection method based on edge feature fusion. This method uses an encoder network to fuse the RGB feature map and edge feature map of the building, and calculates the weight of the fused feature map. Through the weighted processing of the weights, the stability detection is completed. Reference [4] proposes a building stability detection method based on convolutional neural network. This method improves the feature extraction network in the Mask-RCNN model and designs a multi-scale group convolutional neural network with an attention mechanism. The convolutional neural network is used for feature training to complete stability detection. Although the above methods can complete the stability detection, the acquisition accuracy of building data is low, resulting in poor stability detection performance. Therefore, a building load-bearing structure stability detection method based on BIM and computer vision is proposed. This method first uses computer vision technology to collect the images of building load-bearing structure components, then builds the BIM model of building load-bearing structure, and finally uses YOLOv4 network architecture to complete the detection of building load-bearing structure stability.
2 Stability Testing of Building Load-Bearing Structures 2.1 Image Collection of Building Load-Bearing Structural Components Based on Computer Vision The image of building load-bearing structure is collected based on computer vision technology. A computer vision system for image acquisition of building load-bearing structural components is designed, which consists of a CCD camera, an image acquisition card, a light source, and an unmanned aerial vehicle. First design a complete CCD camera that integrates image acquisition, transmission and display. The CCD image sensor selects KAI-01050 image sensor, whose resolution and frame rate meet the design requirements of the camera, and the size of a single pixel and the overall The image sizes are all larger, so the image quality is higher and the image is sharper. The normal operation of CCD sensor requires external complex line and field signals to drive, and the collected image is an analog signal. The digital image can be obtained only after the A/D conversion through the relevant double sampling chip. If discrete devices are used to realize the functions of driving and analog-to-digital conversion, the image acquisition circuit will be very complex, which not only increases the difficulty of software and hardware debugging, but also is difficult to meet the size requirements of
Stability Detection of Building Bearing Structure
139
the image acquisition circuit board. Therefore, the camera selects AD9920A chip from Analog Devices as the analog front-end image acquisition chip, and its main parameters are shown in Table 1. Table 1. AD9920A chip main parameters Serial No
project
parameter
1
Data bit width/bits
Fifteen
2
Sampling frequency/MSPS
Fifty-two point five
3
Is the horizontal signal programmable
Yes
4
Is the vertical signal programmable
Yes
The sampling rate of AD9920A chip can reach the data rate of KAI-01050 image sensor selected in this subject. Therefore, AD9920A chip is selected to generate various driving signals required by CCD. The sensor chip used in this project has a resolution of 1024X 1024 under four channels, a frame rate of 120fps, and a data rate close to 1Gibps. The USB3.0 interface can reach the actual transmission rate of 3.2Gbps, which can meet the design requirements, and at the same time supports hot swapping, which is easy to use, and all mainstream PCs have this interface. Therefore, this topic chooses USB3.0 as the data transmission channel. The overall structure of the designed CCD camera is shown in Fig. 1. The image acquisition module configures the register inside the AD9920A chip through the C8051F921 single-chip microcomputer to generate the horizontal and vertical driving signals that drive the CCD to work normally. The horizontal signal needs level conversion to drive the KAI-01050 CCD sensor to collect images. The collected analog image data is sent back to the AD9920A for correlation double sampling, and the 12 bit digital image data obtained is transmitted to the main controller FPGA of the data transmission module with the same line (HD), field (VD), and data (DCLK) synchronization signals. At the same time, the C8051F921 microcontroller in this module is also responsible for receiving various parameters sent by the upper computer, such as the amplification gain of analog signals, exposure time, number of row blanking pixels, sampling points of correlated double sampling, etc. After receiving the parameters, the microcontroller configures the register of AD9920A to change the working state of CCD according to the parameters. The main controller of the data transmission module selects the cyclone II FPGA of Altera Corporation, and the USB3.0 transmission selects the CYUSB3014 chip of Cypress Corporation. After receiving the digital image from the image acquisition module, the FPGA first performs data buffering and data bit width transformation through FIFO (FirstInFirstOut), and then sends the data to the CYUSB3014 chip according to the USB3.0 timing requirements. After the chip receives the data, it uploads the data to the host computer through the batch transmission mode according to the USB3.0 protocol [5, 6].
140
L. Wu and S. Wang
Fig. 1. General structure of the designed CCD camera
The image display program of the upper computer module uses MFC and multithread programming, receives image data with the USB3.0 driver function provided by Cypress, writes the data to the two image buffers of the upper computer through ping-pong operation, and displays the image data through DrawDib function. The image display interface of the upper computer can set parameters to control the camera working in different states. System workflow: After the system is powered on, the single-chip microcomputer configures the internal registers of AD9920A through the SPI bus to generate the horizontal, vertical and exposure signals that drive the CCD to work. The analog image collected by the CCD is converted into a digital signal by AD9920A correlated double sampling, and sent to the FPGA of the data transmission system. The FPGA buffers the image data through the FIFO, converts the 8bit image data into 32bit, and then sends the data to the USB3.0 chip through the state machine.After receiving the data, the USB3.0 chip forwards the data to the USB3.0 interface through its internal DMA channel, and sends the data to the upper computer through batch transmission. The upper computer uses the driver interface function provided by Cypress to receive data and display images through the application program. The design of the image acquisition card is shown in Fig. 2. The designed acquisition card is mainly composed of image acquisition interface, core controller, external memory and data transmission bus interface. At present, the CameraLink protocol has gradually become a general image transmission protocol, and the CameraLink interface standard is based on the standard interface defined by the Camera Link protocol, and the interface uses LVDS to transmit signals,
Stability Detection of Building Bearing Structure
141
Fig. 2. Design of frame grabber
which has great advantages in transmission rate, so Camera Link is used. The interface acts as an image acquisition interface. The image acquisition card has high requirements for real-time performance. When image acquisition is carried out, the image frame synchronization and line synchronization signals must be accurately captured to obtain complete and effective images. And in order to ensure the continuity of the image, image data buffer and corresponding logic control are also required to transmit to the PC accurately. In addition, the controller also needs to set parameters and trigger the camera. This series of actions requires strict requirements on timing. Therefore, FPGA is used as the core controller. Because the image data transmission rate is very fast and the amount of data is large, and the internal resources of the FPGA are limited, it is not enough to cache large batches of image data, so the current relatively advanced and mature DDRII SDRAM is used as the external memory for caching, and the FPGA controls the data. In order to meet the requirements of real-time image processing and high resolution image, the image acquisition card must have a high bus transmission rate. As a new generation of I/O serial bus standard, PCle bus has an effective data transmission bandwidth of 250 MB/s for x1 channel and 1 GB/s for x4 channel, which is much higher than PCI bus. Therefore, the x4 channel PCle bus interface is selected as the bus transmission interface of the image acquisition card, which can meet the requirements of high image resolution and real-time image processing.
142
L. Wu and S. Wang
First, the PC correctly configures PCle through the underlying driver to make the PCle interface work normally. Then the PC sends camera related setting instructions to the FPGA through the PCIe bus, and the FPGA sends the instructions to the camera in the form of serial port data by using the prepared serial port program. The camera receives the instructions to set and output images. The acquisition card receives the image data through the Camera Link interface, extracts the effective image according to the line field synchronization signal, and then caches the image to DDR2.At the same time, the number of image lines is counted inside the FPGA to determine the completion of a frame of image transmission. If a frame of image transmission is completed, the PC is notified to initiate a command to receive image data, and the FPGA reads the data from DDR2 and sends it to the PC through the PCIe bus. Machine. Since the PCIe bus transmits data faster than the capture card can receive images from the Camera Link interface, and DDR2 can read and write at the same time, while the data is sent to the PC, the camera output image can be uninterrupted, thus ensuring the continuity of the image. The light source is LED light source, and its preparation equipment is shown in Table 2. Table 2. LED light source preparation equipment Serial No
Equipment name
Specification and model
1
LED test fixture
3028
2
LED test power supply
LED260E
3
High precision spectrometer
FR-23
4
Integrating sphere
G5000
5
Dispenser
ME32
6
Special oven
DF453
7
Coater
V100
8
Electronic balance
MN342S
Preparation plan: The solid crystal and wire bonding process in the sample preparation process is completed by the enterprise sample line, so the preparation process mainly starts from the preparation of fluorescent glue. The specific process is as follows: (1) First, add silica gel into the rubber cup. Weigh the silica gel of A and B models according to the ratio of 1:10. First add A glue, then add B glue; Secondly, the phosphor material is added according to the amount determined in advance; Finally, anti settling starch was added, and the amount of anti settling starch was 0.12% of the amount of silica gel. (2) The prepared mixture of phosphor, silica gel and anti sedimentation starch is homogenized and vacuumed. The mixture is stirred at 1000 rpm, 800 rpm and 1000 rpm for 2 min respectively. After stirring, the mixture is vacuumed to eliminate bubbles and mix the mixture of phosphor, silica gel and anti sedimentation starch evenly.
Stability Detection of Building Bearing Structure
143
(3) Pour the prepared fluorescent glue into the syringe of the dispensing machine, and evenly dispense a fixed amount of fluorescent glue into the LED bracket that has been solidified and welded, and the fluorescent glue and the cup of the bracket are flush or slightly concave. (4) The LED brackets that have been dispensed are placed in an oven for baking, and the baking conditions are 90 min at 80 °C, 90 min at 120 °C, and 180 min at 150 °C. (5) After the baking is completed, test with a testing instrument. (6) The drone selected is the DJI Mavic 2, which is used to carry other acquisition equipment. 2.2 Build the BIM Model of the Building’s Load-Bearing Structure BIM model of building load-bearing structure based on BIM technology. Specifically, the Revit secondary development technology is used to establish the overall structural information model, and the attributes of the influencing factors are customized by editing the shared parameters to complete the function of parameter setting of the influencing factors of structural stability. And the influence factors of structural stability are imported into the information model [7]. According to the material level, component level and structure level, the information model of influencing factors of concrete structure stability is constructed. Firstly, the information of material hierarchy is added to the information model of influencing factors of concrete structure stability. The damage caused by material properties includes freeze-thaw damage, water permeability, chloride ion corrosion, carbonization, reinforcement corrosion, sulfate corrosion, alkali aggregate reaction and concrete cracking caused by volume stability. Next, carbonation, steel corrosion, and concrete shrinkage and creep are added to the building information model. Taking carbonation as an example, the carbonation mechanism and the reaction process of cement in the hydration process are regarded as the LOD100 level and established in the information model. LOD200 mainly stores the information of the carbonation depth prediction model, among which the cement dosage, cement type, water-cement ratio and aggregate have an important influence on the carbonation depth. LOD300 stores the evaluation index of carbonization and the influence of carbonization on structural stability in the structural stability information model. There are many influencing factors of reinforcement corrosion, including carbonization, chloride ion erosion, thickness of protective layer, reinforcement material, reinforcement corrosion rate, environmental factors, etc. The function of rebar creation in concrete components is realized through secondary development of Revit. Then, factors such as reinforcement corrosion rate are added to the created reinforcement properties, which is equivalent to considering the influence of reinforcement corrosion in the model. LOD100 stores relevant information from the perspective of material damage, such as the decrease of cohesive force caused by insufficient thickness of concrete protective layer. LOD200 stores relevant information data at this level from the perspective of electrochemical corrosion process of reinforcement. The shrinkage and creep of concrete need to be detected during the whole life cycle of the building. This feature gives full play to the advantages of the building information model. It is necessary to establish a 3D model according to the design criteria of standard
144
L. Wu and S. Wang
components. By testing, the model analysis of the whole life cycle of the building can be realized. Secondly, the information of component hierarchy is added to the information model of influencing factors of concrete structure stability. The theoretical research on concrete corrosion cracking is mainly conducted from the perspectives of material damage, crack extension, etc. The current research mainly focuses on the critical corrosion rate of concrete corrosion cracking, and the layered research on corrosion cracking is conducted through the reinforcement corrosion rate [8]. The research on bond performance between reinforcement corrosion and concrete mainly analyzes the change of bond strength between reinforcement and concrete, as well as the deformation caused by the stress transmission between them. The bond performance between reinforcement and concrete is evaluated through the change of bond strength, which is taken as the performance evaluation index. The change of member bearing capacity is studied through the factors such as the diameter of steel bar, the minimum thickness of concrete cover, and the corrosion rate of steel bar. The impact on the member bearing capacity is considered, and other mechanical properties of the member are analyzed. The relevant influence parameters are input into the BIM model, which is used as the evaluation index of member bearing capacity in the BIM database. Thirdly, the information of the structure level is added to the information model of the influencing factors of concrete structure stability. The research on the structure level of concrete stability mainly includes two aspects, one is the stability design of the proposed concrete structure, and the other is the stability evaluation and life prediction of the in-service concrete structure. The stability of the structure is evaluated according to the design years, environmental categories, damage parameters, and reliability theory, and these indicators are added to the building information model and integrated into the BIM database according to the level. 2.3 Stability Testing Build a stability detection network model based on YOLOv4 network architecture, and implement the stability detection of building load-bearing structures. The data in BIM model of building load-bearing structure is used as input data of network model to realize stability detection. The stability detection network model structure includes three parts: the backbone feature extraction network CSPParknet53, the enhanced feature extraction network (SPP and PANet) and YOLO Head. The input size of the data can select multiple resolutions. After a series of convolution, standardization and activation functions, the input data is continuously downsampled, the scale of the feature layer is continuously reduced, and the number of channels is continuously increased, thus obtaining more semantic information. In the backbone feature extraction network part, the last three feature layers are selected, which are respectively 8, 16, and 32 times compressed from the original data, and are used to detect three different scale features: small, medium, and large.After acquiring the feature layer, the last feature layer is passed into the SPP structure part after three convolutions. SPP uses pooling kernels of different sizes to perform maximum pooling on the input feature layer, and the pooled results are stacked and passed into the PANet structure after three convolutions. PANet includes two processes of up-sampling
Stability Detection of Building Bearing Structure
145
and down-sampling. In the up-sampling process, the lower-layer network is doubleupsampled and then stacked with the upper-layer feature layer to achieve feature fusion. In the down-sampling process, the upper-layer network is doubled After downsampling, it is stacked with the lower feature layer to achieve feature fusion [9]. After completing PANet, YOLO Head uses the extracted effective features to predict the results. The three feature layers correspond to three YOLO Heads respectively. The CIoU loss takes into account three geometric parameters, namely, overlapping area, center point distance, and aspect ratio. In the boundary box regression process, it can converge more quickly and have better performance, making the regression process of the target box more stable. CIoU is used as the loss function of the stability detection network model. The calculation formula of CIoU value is as follows: C 2 D, Dgt + φβ (1) CIoU = A − B2 In formula (1), A represents the IoU loss value, and the IoU loss is defined as the difference between 1 and IoU. This method can well reflect the degree of coincidence between the predicted frame and the real frame, and has scale invariance, but when the predicted frame is When it does not intersect with the real box, the sliding gradient will not be provided; B represents the diagonal distance of the smallest closure area that can contain both the prediction box and the real box; C 2 D, Dgt represents the Euclidean distance between D and Dgt ; D represents the center point of the prediction box; Dgt represents the center point of the ground truth box; φ represents the scale threshold; β represents the sliding gradient. The formula for calculating φ is as follows: φ=
β 1−A+β
(2)
The calculation formula of β is as follows: β=
4 f f 2 arv tan − arv tan π2 g g
(3)
In formula (3), f represents the width of the real frame; g represents the height of the real frame; f represents the width of the predicted frame; g represents the height of the predicted frame. The calculation formula of CIoU loss is as follows: LCIoU = 1 − CIoU
(4)
The method used for parameter optimization is the momentum method, which adds items related to the previous step size on the basis of the original step size of the small batch random gradient decline, which makes the update direction of the parameter not only determined by the current gradient, but also related to the gradient decline direction accumulated previously. This method updates the gradient by using the accumulated historical gradient information. During the accumulation process, the gradient in direction
146
L. Wu and S. Wang
A is constantly offset and the gradient in direction B is strengthened, which can accelerate the descent of SGD in the correct direction and reduce the vibration. In addition, the momentum method can avoid the problem that the gradient at the saddle point and the local minimum point is 0, and find a better solution. Label smoothing is mainly used in multi-classification tasks, using “one-hot” to add noise, reducing the weight of the real sample label category when calculating the loss function, and adjusting the upper limit of the prediction target to a lower value, such as 0.9, in the process of calculating the loss, use this value instead of 1, this method can prevent the model from overfitting during the training process and make the network more robust. In the target detection algorithm, the adjustment of learning rate has a great impact on the network training performance. When the learning rate is large, it can speed up the learning rate and prevent vibration at the local best point, but it is easy to lead to non convergence of the model training and low accuracy of the model; When the learning rate is low, it can help the model to converge and refine, but it is difficult to jump out of the local optimal value and the convergence speed is slow [10]. Select the method of hierarchical descent.
3 Experimental Tests In order to test the performance of the proposed test method for the stability of building load-bearing structures, the test experiment is carried out. The test process is as follows: first, train the stability detection network model, and then set the accuracy rate and recall rate as the experimental indicators. Finally, the method in this paper is compared with that in reference [3] (method 1) and reference [4] (method 2). 3.1 Model Training In the test of the design method, the idea of transfer learning is first used to train the stability detection network model. The pre training weights prepared in advance are used for training. The training process includes freezing thawing training. The specific settings of super parameters are as follows: (1) The first step is to freeze the first 249 layers of the network (backbone feature extraction network) for training, using the YOLOv4 pre-trained weight file, the frozen generation is set to 50, the batch size is 8, and the initial learning rate is 0.001; (2) After the 50 generations in the second step, the first 249 layers are unfrozen, the batch size is 2, the learning rate is 0.0001, and a total of 100 generations are trained. In the training process, early stop is set to automatically end the training when the loss value of the verification set does not decrease for 6 generations, and the model is basically convergent at this time. Adam is initially selected as the optimization algorithm in the training process, and the recommended values are used as the parameters. The attenuation rate is 0.999, and the momentum coefficient is 0.9. The programming language used is Python, compiled through VS Code, the network building platform is
Stability Detection of Building Bearing Structure
147
the Keras framework of Windows10, and GPU is used for acceleration. The computer configuration is: Intel (R) Core (TM) i5–10500 CPU@3. 10 GHz, 16 GB and 8 GB NVIDIA GeForce RTX 2060 SUPER. Use Tensorboard to monitor the changes of training set and validation set loss and learning rate in each generation during the training process, and get the loss change image and learning rate drop., the curve tends to be flat. At the 116th round, since the loss of the validation set of multiple trainings no longer decreases, the network is considered to converge. At this time, the loss of the training set is 2.246, the loss of the validation set is 2.728, and the training is over. 3.2 Evaluation Indicators In order to evaluate the performance of the trained model, accuracy recall (P-R) curve, AP (Average Precision), mAP (Mean Average Precision), F1 score and other indicators are commonly used as evaluation criteria. When drawing P-R curve and calculating mAP value, accuracy and recall are two important indicators. When calculating the accuracy and recall rate, we must first obtain the equivalent of true positives (TP), false positives (FP), true negatives (TN), false negatives (FN), among which TP represents that positive samples are correctly identified as positive samples, TP represents that negative samples are correctly identified as negative samples, FP represents that negative samples are incorrectly identified as positive samples, and FN represents that positive samples are incorrectly identified as negative samples. On this basis, the calculation formulas of accuracy and recall can be obtained: Precision = Recall =
TP TP + FP
TP TP = TP + FN GT
(5) (6)
In formula (6), GT refers to the true label; Precision refers to the ratio of correctly identified positive samples to all identified positive samples, also known as “precision”; Recall refers to the correctly identified positive samples. The ratio of samples to all true labels. In this process, it is very important for the network to judge the values of TP, FP and FN. For a certain category, the detector first traverses all GT in the data, and then all detection frames of the category are filtered out according to the initial set confidence threshold. The remaining detection frames are arranged according to the confidence level to judge whether the detection frame with the highest confidence level and the CIoU of GT meet the threshold conditions. If it meets the requirements, the detection frame will be marked as TP, and the detection frames of the same GT will be FP later. If it does not meet the requirements of CIoU, it will be directly marked as FP.It is worth noting that mAP is the final evaluation index of the detection model. After all operations are completed, the final detection result is used as the evaluation standard. NMS will only perform this operation when making predictions, and will not perform NMS operations during testing., because a large number of positive and negative samples are needed to learn during training. In this study, the confidence threshold is set to 0.01 when the
148
L. Wu and S. Wang
detector detects the class, CIoU is set to 0.5, and the confidence threshold is set to 0.6 when performing non-maximum suppression. After the quantity of TP, FP and FN is obtained, the accuracy and recall rate under different confidence levels can be calculated according to the formula. The change trend of accuracy and recall rate is reverse. A good detector should keep its accuracy at a high level when the recall rate increases. 3.3 Test Results In the test, method 1 and method 2 are used respectively to compare the three methods. Table 3. Accuracy test results Confidence
Precision Design method
Method 1
Method 2
0.10
0.942
0.814
0.796
0.15
0.940
0.813
0.794
0.20
0.935
0.805
0.791
0.25
0.932
0.802
0.790
0.30
0.930
0.796
0.781
0.35
0.928
0.794
0.780
0.40
0.925
0.793
0.778
The test results show that the accuracy of the design method is always higher than 0.920 when the confidence level increases from 0.10 to 0.40, while the highest confidence levels of Method 1 and Method 2 are only 0.814 and 0.796, indicating that the accuracy of the design method is higher than that of the other two comparison methods. Then test the recall rate data of the three methods under different confidence levels, and the test results are shown in Fig. 3. The recall rate test data show that the recall rate of the three methods shows an increasing trend in the increase of confidence. The recall rate of the design method is much higher than that of method 1 and method 2, indicating that the detection performance achieved by the design is better. Finally, the stability detection time of the three methods is tested, and the test results are shown in Fig. 4.
Stability Detection of Building Bearing Structure 100
149
Method 1 Design method Method 2
99 98
Recall (%)
97 96 95 94 93 92 0.10
0.15
0.20
0.30 0.25 Confidence
0.35
0.40
Fig. 3. Recall test data
144
Design method Method 1 Method 2
Time for stability test/min
120
96
72
48
24 0 Old buildings
New buildings
Fig. 4. Stability detection time test results
The above test results show that for new buildings, the stability testing time of the three methods is longer than that of old buildings. Whether it is a new building or an old building, the detection time of the design method is lower than that of the two comparison methods, 106.85 min and 75.41 min respectively.
4 Conclusion The stability detection of building load-bearing structure is a work involving safety, which must be paid attention to. In the research, a method of building load-bearing structure stability detection based on BIM and computer vision is designed. The image of building load-bearing structure construction is collected by computer vision technology, and the BIM model is built. Finally, the stability detection model is built according to YOLOv4 network architecture, and the stability detection results are obtained.
150
L. Wu and S. Wang
The experimental results show that the proposed stability detection method can further shorten the detection time based on the detection accuracy. Therefore, it shows that the proposed detection method based on computer vision can effectively improve the reliability of building load-bearing structure stability detection.
References 1. Sgobba, S., Santillana, I.A., Arnau, G., et al.: Examination and assessment of large forged structural components for the precompression structure of the ITER central solenoid[J]. IEEE Trans. Appl. Supercond. Supercond. 30(9), 1–4 (2020) 2. Guo, W., Yang, W., Ma, Z., et al.: Stability criterion of overburden structure above goaf under buildingload and its application[J]. J. China Coal Soc. 47(6), 2207–2217 (2022) 3. He, X., Qiu, F., Cheng, X. et al.: High-resolution image building target detection based on edge feature fusion. Comput. Sci. 48(9), 140–145 (2021) 4. Zhao, R., Wang, J., Lin, S.: Small building detection algorithm based on convolutional neural network. Syst. Eng. Electron. 43(11), 3098–3106 (2021) 5. Wang, Y., Zhang, Z.: Design of overall stability analysis system for GSCAD building using BIM. Tech. Autom. Appl.Autom. Appl. 41(11), 175–179 (2022) 6. Lan, Q., Zhang, X., Li, Y., et al.: Output feedback disturbance rejection control for building structure systems subject seismic excitations. Measure. Contr. 53(10), 1616–1624 (2020) 7. Di, W., Zhe, Z.: Simulation of building space dimension deviation identification based on grid multi density. Comput. Simul. 39(2), 447–451 (2022) 8. Li, Y., Yue, X., Zhao, L., et al.: Effect of north wall internal surface structure on heat storagerelease performance and thermal environment of Chinese solar greenhouse. J. Building Phys. 45(4), 507–527 (2022) 9. Gu, Y., Li, P., Meng, X., et al.: Re-measurement and stability analysis of super-large structure plane control network. Architect. Technol. 53(7), 847–850 (2022) 10. Martinez, J., Albeaino, G., Gheisari, M., et al.: UAS point cloud accuracy assessment using structure from motion-based photogrammetry and PPK georeferencing technique for building surveying applications. J. Comput. Civ. Eng.Comput. Civ. Eng. 35(1), 1–15 (2021)
Intelligent Integration of Diversified Retirement Information Based on Feature Weighting Ye Wang(B) and Yuliang Zhang Wuxi Vocational Institute of Commerce, Wuxi 214153, China [email protected]
Abstract. When carrying out intelligent integration of retirement information, the existing data processing and analysis architecture can no longer meet the current requirements for the storage and processing of massive text data, which reduces the efficiency and accuracy of intelligent integration of diversified retirement information. Therefore, a feature-weighted intelligent integration method of multi-element retirement information is proposed. Use the packet capture mechanism to collect diversified retirement information samples, and complete the pre-processing of initial retirement information through information filtering, normalization and other steps. Intelligent calculation and distribution of diversified retirement information weights, extraction of mutual information, information gain and other characteristics of diversified retirement information, use of feature weighting algorithm to determine the type of retirement information, and complete the intelligent integration of diversified retirement information. Through the performance test experiment, it is concluded that compared with the traditional integration method, the integrity coefficient of the retirement information integration result obtained by the optimization design method has increased by 2.4%, and the information redundancy coefficient has been effectively controlled. It is applied to the retrieval of retirement information, effectively improving the retrieval speed of information. Keywords: Feature Weighting · Diversified Information · Retirement Information · Intelligent Integration
1 Introduction Retirement means that according to the relevant regulations of the state, a worker quits his job due to old age or disability due to work or illness, completely losing the ability to work. On March 12, 2021, in the “14th Five-Year Plan” and the outline of the 2035 long-term goals announced, it is clearly stated that the statutory retirement age should be gradually delayed in accordance with the principles of “small-step adjustment, flexible implementation, classified advancement, and overall consideration”. In April 2021, the Ministry of Human Resources and Social Security and the Ministry of Finance issued the “Notice on Adjusting the Basic Pension for Retirees in 2021”. The overall adjustment level is 4.5% of the monthly per capita basic pension for retirees in 2020. In order to © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 151–167, 2024. https://doi.org/10.1007/978-3-031-50574-4_11
152
Y. Wang and Y. Zhang
improve the efficiency of retirement work, it is necessary to verify the relevant information of retirees in advance [1]. Today’s society is in the era of digitization, intelligence, and networking. Information technology is increasingly integrated into people’s daily lives. Productivity has undergone a qualitative innovation, realizing the digital storage and management of retirement information. For this purpose, an intelligent integration method for retirement information is designed. Information integration is to realize the serialization, sharing and transfer of information resources under the leadership of certain organizations according to the development trend of information technology, thus realizing the management process of optimizing the allocation of information resources, broadening the application fields of information resources, and maximizing the value of information. From the perspective of the elements covered by the information system, there are many reasons for information integration, which can be viewed from different perspectives. Under the influence of various advanced technologies, the public’s access to retirement information has become more diversified and richer, and the use of retirement information resources has also increased. Due to the complex content and large number of retirement information resources, it is difficult to achieve efficient use of resources in a short period of time. Therefore, only by scientifically integrating retirement information resources, clarifying the subject content and information items, and doing a good job of summarizing and sorting out, can the utilization efficiency of retirement information resources be improved. Using advanced technologies such as big data, Internet of Things, Internet and other advanced technologies and related equipment to carry out information resource integration work, and actively build a retirement information resource with technological attributes as a whole, will help improve the collection, arrangement, protection and utilization of retirement information resources. At the same time, after completing the integration of retirement information resources, we should actively purchase modern equipment, build a retirement information resource library, promote the intelligent management of retirement information, and continuously demonstrate the effect of resource sharing and sharing. At present, more mature information intelligent integration methods include: information integration method based on support vector machine, information integration method based on wavelet decomposition and information integration method based on deep neural network. Among them, information integration method based on support vector machine takes information as processing object, and is based on VC dimension theory of statistical learning theory and structural risk minimization principle, According to the limited sample information, we should seek the best compromise between the complexity of the model and the learning ability, determine the integration method between data, and complete the task of information integration. The information integration method based on wavelet decomposition decomposes the initially collected information to improve the sufficiency of the signal integration results. In addition, the information integration method based on deep neural network uses deep neural network for feature extraction, and integrates information at the feature level. With the continuous development of information technology, the diversified retirement information data sets have reached TB and PB, or even larger. The existing data processing and analysis architecture can no longer meet the application requirements of the current massive text data
Intelligent Integration of Diversified Retirement Information
153
storage and processing analysis. The intelligent integration efficiency and accuracy of diversified retirement information are reduced. Feature weighting refers to the process of assigning a certain weight value to each vector feature in the vector space, and the size of this weight value depends on the ability of feature items to express text. By effectively weighting the feature items, the feature items that can well represent the text are given higher weight values, and the feature items that have less ability to distinguish categories are given smaller weight values. Not only the distribution state of the original feature set in the vector space is effectively improved, but also the influence of those noise feature items can be suppressed well. Feature weighting algorithms include: Boolean weight, entropy weight, word frequency weight, inverse document frequency and TF-IDF function. Based on the above contents, the intelligent integration method of diversified retirement information based on feature weighting is studied, and the feature weighting technology is applied to the optimization design of the intelligent integration method of diversified retirement information, and the diversified retirement information samples are collected and preprocessed by using the data packet capture mechanism. The type of retirement information is determined by feature weighting algorithm, and the intelligent integration of diversified retirement information is completed. The integrity coefficient of the integration result of retirement information obtained by the design method in this paper is improved by 2.4%, and the information redundancy coefficient is effectively controlled. When it is applied to the retrieval of retirement information, the integration effect of retirement information can be improved, and the retrieval and retrieval performance of retirement information can be effectively improved.
2 Intelligent Integration Method of Diversified Retirement Information The core processing flow of the optimally designed intelligent integration method of diversified retirement information is shown in Fig. 1. According to the different stages of information integration and integration methods, the integration level is divided into three levels: information level integration, feature level integration and decision level integration. The information level is to directly process the observation information of diversified retirement information and the observed characteristic values of the target, and then estimate, predict and judge the state of the target according to these characteristic values. As the lowest level integration, information level integration requires that the information to be processed comes from the same sensor. Given the huge amount of information to be processed at the bottom, as well as the characteristics of information discontinuity and jump, information level integration is urgently required to have high error prevention and correction capabilities [2]. At the same time, these harsh conditions will inevitably lead to the decline of the anti-interference ability of the whole system. Therefore, in order to ensure that multiple sensors observe the target at the same time, their processing information needs to be the original observation information. The main advantage of information-level integration is its high integration accuracy. As the lowest level of integration, it provides necessary intermediate integration results for advanced information integration to improve overall
154
Y. Wang and Y. Zhang
Fig. 1. Flow chart of intelligent integration of diversified retirement information
operational efficiency. Functional level integration is an intermediate level integration. This level of data integration is usually divided into: the integration of the characteristic values of the observed events and the integration of the characteristic states of the observed events. This integration level is in the middle of the integration level. This level of integration requires each sensor in the system to observe a target, and to sort out the corresponding eigenvalues according to the target’s own characteristics, and then the eigenvalues are sent to the intermediate link of data integration for further integration. The process also judges the target based on the properties of the target’s eigenvalues. Decision level integration requires that the target eigenvalue to be processed must be an intermediate decision result, that is, the processed data in this part is the data that has been properly processed by the sensors in the system. Here, it is necessary to continue to correlate and combine the integration results at a higher level, and finally get the final result of the level integration at this level. Due to the large number of times of consolidation at this level, the amount of indirect result data using the upper level is large and complex, which will lead to low accuracy of the consolidation results. However, this level of integration itself relies on the results of low-level integration as the integration data, reducing the requirements for each other’s traffic and the underlying receiver, so as to achieve strong processing capacity and the ability to integrate and adapt to complex environments. Combine the above three levels to complete the integration and processing of diversified retirement information.
Intelligent Integration of Diversified Retirement Information
155
2.1 Collect Diversified Retirement Information In the environment of retirement information registration and management system, data packet capture technology is used to collect diversified retirement information samples. A packet capture mechanism consists of three main parts: the first is the bottom-level packet capture mechanism for a specific operating system, the second is the top-level user program interface, and the last is the packet filtering mechanism. The implementation of the underlying packet capture mechanism may be different due to different operating systems, but the form is not very different. Usually, the transmission path of the data packet is the network card, the device driver layer, the data link layer, the network layer, the transport layer, and finally it is processed by the application program [3]. For the user, the user program only needs to call the corresponding processing function to obtain the required data packets, because the packet capture mechanism provides a unified interface for it, and the captured data packets are processed according to the user’s requirements. Filter, and finally pass only the data packets that meet the conditions to the user program. The collection process of diversified retirement information is shown in Fig. 2.
Fig. 2. Flowchart of Diversified Retirement Information Collection
First, specify the network device interface to be captured. In Linux, etho represents the first network card, and lo represents the loopback interface of the host, both of which are optional. Then initialize pcap to tell it which device it is capturing packets from. You can capture packets from multiple devices by using multiple file handles. Next, set and
156
Y. Wang and Y. Zhang
compile the filter rules. If you only want to capture specific transport packets, such as HTTP protocol packets, you must create a rule set, use it to filter out the required packets, and then enter the execution loop of the main program that captures the packets.In this phase, each time pcap receives a packet, it calls another defined function. This function can have any desired function, such as analyzing the data packet and printing out the result to the user, or saving the result as a file. Or do nothing and pcap will work until all the desired packets have been received before stopping. Finally, when the required data is captured, it needs to be closed and returned [4]. Since the information in the retirement information registration and management system will be updated according to the retiree’s procedures, it is necessary to update the diversified retirement information in real time during the actual collection process. The update process can be expressed as: xretire (t) = xretire (t − 1) + xadd − xcomplete
(1)
In the formula, xretire (t − 1) and xretire (t) represent the information collection samples at the previous time and the information update results at the current time, respectively. xadd and xcomplete correspond to the added retirement information and completed retirement information. Complete the collection of initial diversified retirement information according to the above process. 2.2 Preprocessing of Diversified Retirement Information In the capture process, in order to ensure that the buffer is not overflowed by invalid data, it is necessary to perform coarse filtering first, then perform another filtering, and finally perform packet analysis. When filtering is complete, packet disassembly must be performed because the header of the packet is the most critical part of the packet. The principle of decomposition is to extract the data of each field in the packet structure according to the data structure specified by the protocol. Use the Berkeley packet filter to filter the initial diversified retirement information data samples. The processing principle is shown in Fig. 3.
Fig. 3. Schematic diagram of filtering diversified retirement information
Intelligent Integration of Diversified Retirement Information
157
After receiving a packet, the network card driver usually submits it to the protocol stack of the system. When a process uses BPF for network monitoring, the driver will first call BPF and copy a copy of data to the BPF filter. The filter will decide whether to receive this packet according to user-defined rules [5]. Then judge whether the data packet is sent to the local machine. If not, the driver returns from the interrupt and continues to receive data; If it is sent to the local computer, the network card driver will submit it to the protocol stack of the system and return it. After the initial information filtering process is completed, the collected retirement information is quantified using the probabilistic reasoning model. The quantitative representation of retirement information is as follows: Pi (1 − Qi ) (2) χ (T, I) = lg Qi (1 − Pi ) In the formula, T and I represent the collected retirement information samples and the urgency of retirement, respectively. The calculation formulas of variables Pi and Qi are as follows: i Pi = M M (3) Ntext,i −Mi Qi = Ntext −M In Formula 3, variables Mi and M respectively represent the number of information with characteristic items and high urgency. Ntext,i and Ntext correspond to the number of retirement information with characteristic items and the total number of retirement information collected [6]. In addition, we also need to process retirement information by word segmentation and de stop words, and finally put the processing results of retirement information into Formula 4 to achieve the normalization of all diversified information. χmax − χ (4) χnormalization = χmax − χmin In the above formula, parameters χmax and χmin are the maximum and minimum values of diversified retirement information respectively, and χ and χnormalization are the samples of return information before and after normalization. After the above process, complete the preprocessing of all diversified retirement information collected. 2.3 Intelligent Calculation and Allocation of Diversified Retirement Information Weights The weight value of diversified retirement information is extracted from three aspects: Boolean weight, word frequency weight, and word frequency inverse document frequency weight. The Boolean weight uses the mutual information relationship between rows and columns as a measure, and is defined as follows: P xi yj (5) lg ωb yj = P yj P(xi ) xi ∈X
In the formula, P() is the probability solution function, xi and yj represent the sample in line i and j of the joint probability matrix respectively, and the solution result ωb yj
158
Y. Wang and Y. Zhang
represents the weight value of diversified retirement information χi . The word frequency in the word frequency weight represents the number of times the feature item is contained in the text. The word frequency is used to measure the differentiated contribution rate of the feature item to the category [7]. Therefore, if the feature item appears more times in the information set, its word frequency weight will be larger. The calculation formula is as follows: ωc = TFi
(6)
In the formula, TFi is the number of target words contained in the retirement information set. In addition, the calculation of the word frequency inverse document frequency weight is divided into two parts, namely the word frequency and the inverse document frequency. The inverse document frequency is a measure of the popularity of a feature word. The number of documents for the item, and then logarithmically obtain the final value [8]. If the feature word only exists in individual information sets, it means that the higher the concentration of the feature word, the higher its contribution rate to the document category. The calculation formula of the word frequency inverse document frequency weight is as follows: Ntui (7) ωw = TFi × log2 numi Among them, Ntui and numi respectively represent the number of training retirement information and the number of information that the feature words appear in the retirement information. For the convenience of calculation, normalization is usually required. The formula for calculating the frequency weight of the word frequency inverse document can be converted into: Ntui TFi × log2 num + 0.01 i (8) ωw = 2 Ntui TFi × log2 numi + 0.01 In the actual intelligent calculation process of the weight of diversified retirement information, the influence of the position and length of the feature word on the value of the weight can be considered, and the weight calculation result can be updated based on the calculation result. The final update calculation result of the weight of the diversified retirement information is as follows: ω = α · ρ(ωb × ωc × ωw )
(9)
where, α and ρ represent the length and location parameters of the feature words respectively. The calculation results of Formula 5, Formula 6 and Formula 8 are substituted into Formula 9, and the output results are the calculation results of the weight of diversified retirement information. On this basis, use the linear regression theory to allocate the weight of retirement information. This method needs to establish a multiple linear regression model on the training data, minimize the gap between the estimated score of diversified retirement information and the actual correlation score through the least
Intelligent Integration of Diversified Retirement Information
159
square method, and obtain a set of coefficients of the model as the weight of each member system [9]. In the process of establishing the model, it is required that the training data contain effective information scores and information correlation evaluation. In this linear model, the score provided by the system is the estimated score of the document. According to the given relevance evaluation information, we can accurately know whether a piece of information is relevant to the query. This strategy trains a specific fusion model by linearly combining the estimated scores of information in multiple retrieval systems to ensure the overall relevance of diversified retirement information. 2.4 Extracting Diversified Retirement Information Features The idea of feature selection is to calculate the evaluation value of all feature words according to the evaluation function. The feature term is filtered by a set threshold, so as to reduce the feature dimension. The extraction process of diversified retirement information features is shown in Fig. 4.
Fig. 4. Flow chart of feature extraction of diversified retirement information
The extracted diversified retirement information features include mutual information, information gain, GINI χ2 Statistics, chi square statistics and other feature vectors, in which the information gain is the proportion of the information of the feature item in the text, and the information gain method is a calculation method based on the feature item in the information entropy. It is defined as the amount of information that a feature item can provide for the whole classification. The calculation formula of information gain is: ⎧ τIG = φ1 + φ2 ⎪ ⎪ ⎪ ⎪ ⎪ M ⎪ ⎪ P(Ci t) ⎪ ⎨ φ1 = P(t) P(Ci t) log P(Ci ) (10) i=1 ⎪ ⎪ ⎪ M ⎪ ⎪ P Ci t ⎪ ⎪ = P(t) P Ci t log φ ⎪ 2 ⎩ p(Ci ) i=1
160
Y. Wang and Y. Zhang
In the formula, t is the feature item, P(Ci ) and P(Ci t) are the probability value of the occurrence of category i and the probability that the text belongs to Ci when t appears in the text. The χ2 statistic is a relatively common feature selection method. It mainly selects those items that are strongly associated with each class. The extraction results of the χ2 statistic feature are: τχ 2 =
Ntui × (λA λD − λC λB )2 (λA + λC ) × (λB + λD ) × (λA + λB ) × (λC + λD )
(11)
In the formula, λA , λB , λC and λD respectively represent the number of occurrences of item t and category c in the retirement information set, the number of occurrences of item t but category c in the retirement information set, the number of occurrences of item t but category c in the retirement information set, and the number of occurrences of both item t and category c. If t and c are independent, the value of the χ2 statistic feature will be 0. For each class in the labeled dataset, the value of the χ2 statistic feature between each item and that class is calculated. In addition, the GINI index is a method of measuring impurity. Its main idea is to use all attribute values in the N records as candidate division points, and then calculate the GINI index of each candidate, that is, the best division point corresponds to Generates the smallest GINI indicator value point [10]. The original intention of the GINI indicator is to measure the impurity of an attribute in text classification. The smaller the impurity, the better the attribute. Then the extraction result of the GINI feature index is: (12) τGini = 1 − [P(i|t )]2 The extraction results of diversified retirement information features such as chi square statistics and mutual information are obtained according to the above methods. 2.5 Determine Retirement Information Types Using Feature Weighting Algorithm A feature weighting algorithm is used to weight the calculated retirement information weight and the extracted retirement information features, and with the support of the information classifier, determine the type of retirement information. The execution principle of the feature weighting algorithm is as follows: γ = τcom · ω
(13)
In the formula, τcom is the fusion result of the extracted features of diversified retirement information, and ω is the weight of retirement information. The specific value of this variable can be determined according to Formula 9. Input the retirement information feature weighting result γ into the classifier. Randomly select any information sample in the retirement information as the information classification center, and use Formula 14 to measure the similarity between the random retirement information and the information sample.
Intelligent Integration of Diversified Retirement Information
161
n γik × γjk k=1
s=
n k=1
γik2
×
n k=1
(14) γjk2
Substitute the calculation result of formula 13 into formula 14 to obtain the measurement result of similarity. The retirement information whose similarity measure result is higher than the threshold s0 is divided into the same category as the classification center. According to the above process, the classification and processing of all the collected diversified retirement information is realized. 2.6 Realize the Intelligent Integration of Diversified Retirement Information According to the feature extraction results and weight assignment results of diversified retirement information, the data objects form a cluster tree, and each node is a cluster formed by data objects. Hierarchical clustering methods can be divided into cohesive and split ones according to whether the clustering tree is formed from the bottom up or from the top down. Each data object is regarded as a cluster at first, and then similar clusters are merged successively until all data objects are in a cluster or a certain termination condition is met. Each data object in the diversified retirement information set is regarded as a cluster, and the similarity between any two clusters is calculated using formula 14. Select the two clusters with the largest similarity, and combine the two clusters into a new cluster. The update result can be expressed as: Zupdate = Zi ∪ Zj
(15)
The above operations are performed in a loop until merged into a cluster or a certain termination condition is met. Divide hierarchical clustering method: It is the opposite of agglomerative hierarchical clustering method, which starts by placing all data objects in a cluster, and then gradually splits the clusters into smaller clusters until each data object becomes a separate cluster or reach a certain termination condition. Finally, the intelligently integrated diversified retirement information is output in the form of directory topology to complete the intelligent integration of diversified retirement information.
3 Experimental Analysis of Information Integration Performance Test In order to test the advantages of the intelligent integration method of diversified retirement information based on feature weighting of optimal design in terms of integration performance, experiments are set up in the way of comparative testing, and the results of information integration are applied to the search of retirement information. By comparing with traditional methods, the performance advantages of the optimal design method are reflected.
162
Y. Wang and Y. Zhang
3.1 Prepare a Sample of Retirement Information This experiment involves multiple data sets, which are the information of retirees in different regions. The data sets are all from the regional personnel resource management system. The preparation of some data sets is shown in Table 1. Table 1. Retirement information sample preparation Retirement Area
Number of retirees
Retirement Information Proficiency/GB
A1
365
15.6
A2
577
27.2
A3
283
13.8
A4
496
26.5
A5
434
25.8
A6
502
26.4
A7
378
18.9
The retirement information initially prepared includes the name, age, length of service, social security payment, retirement time, retirement amount, etc. of retirees. The corresponding information is extracted in the management system environment, and the necessary preprocessing process is carried out. Only the plain text information in the web page is extracted, and the anchor text information that will cause noise is deleted. Chinese documents are converted to BIG5 encoding format, removing stop words, words with only one word and words with a frequency of less than 4. 3.2 Configure the Experimental Environment The experimental program uses the Visual Studio2005 development environment and uses the C# language to write. The test machine is a Dell server with a Xeon® CPU E5405 @2.00 GHz and 4 GB of RAM. The operating system uses Windows Server 2003. 3.3 Describe the Experimental Process of Integrated Performance Testing In the configured experimental environment, realize the conversion between the intelligent integration method of diversified retirement information based on feature weighting and the running program, input the prepared diversified retirement information sample into the information integration program, and obtain the intelligent integration result of retirement information, as shown in Fig. 5. In order to reflect the performance advantages of the optimization design method, the traditional information integration method based on support vector machine and the information integration method based on wavelet decomposition are set as the experimental comparison methods in the experiment, and the integration method is developed
Intelligent Integration of Diversified Retirement Information
163
Fig. 5. Results of intelligent integration of diversified retirement information
in the same experimental environment. The same information samples are processed, and the output integration result of retirement information is shown in Fig. 6. According to the above methods, the development of intelligent integration methods for diversified retirement information will be realized, and the integrated data will be extracted and counted. 3.4 Set the Information Integration Performance Test Indicators In order to verify the integration performance and application performance of the intelligent integration method of diversified retirement information based on feature weighting, the integration data integrity coefficient, redundancy coefficient and retrieval delay of retirement information are set as the quantitative test indicators of the experiment. The test results of the integrity coefficient are as follows: ζcomplete =
Nintegration × 100% Nsample
(16)
In the formula, Nintegration and Nsample represent the integrated retirement information and prepared retirement information respectively. In addition, the numerical results of redundancy coefficient are: ζredundancy =
Nredundancy × 100% 2Nsample
(17)
The variable Nredundancy is the amount of redundant information in the retirement information integration result. In addition, the retrieval delay of retirement information is used to reflect the application performance of the retirement information integration method. The test results of this indicator can be expressed as: tretrieval = tinput − tout
(18)
164
Y. Wang and Y. Zhang
Fig. 6. Results of intelligent integration of diversified retirement information
In Formula 18, tinput and tout correspond to the start time and output time of the retirement information retrieval task. The final calculation shows that the greater the integrity coefficient of the integrated data and the smaller the redundancy coefficient,
Intelligent Integration of Diversified Retirement Information
165
the better the intelligent integration performance of diversified retirement information, the smaller the retrieval delay of retirement information, and the faster the retrieval speed, the better the application performance of the intelligent integration method of diversified retirement information. 3.5 Analysis of Information Integration Performance Test Results Through the statistics and analysis of relevant data, the test comparison results of the integrated data integrity coefficient obtained by the intelligent integration method of diversified retirement information are obtained, as shown in Table 2. Table 2. Retirement information integration complete coefficient test data table Number of experiments
The output data volume of the information integration method based on support vector machine/GB
Output data volume of information integration method based on wavelet decomposition/GB
Output data volume of intelligent integration method of diversified retirement information based on feature weighting/GB
1
15.2
15.1
15.5
2
26.4
26.6
27.1
3
13.4
13.3
13.6
4
25.7
25.7
26.3
5
25.0
25.1
25.5
6
25.5
25.3
26.2
7
18.1
18.2
18.8
By substituting the data in Table 2 into formula 16, the average integrity coefficient of the two traditional integration methods is 96.8%, and the average integrity coefficient of the optimal design method is 99.2%. In addition, through the calculation of formula 17, the test results of the intelligent integration redundancy coefficient of retirement information are obtained, as shown in Fig. 7. It can be seen intuitively from Fig. 7 that compared with the traditional integration method, the retirement information integration method obtained by the optimal design method has a lower information redundancy coefficient, that is, the integration performance is better. Applying the intelligent integration results of diversified retirement information to the retrieval of retirement information, through the calculation of formula 18, the test results reflecting the application performance of the integration method are obtained, as shown in Fig. 8. It can be seen from Fig. 8 that the intelligent integration method of diversified retirement information based on feature weighting optimized design has shorter retrieval delay and faster retrieval speed.
166
Y. Wang and Y. Zhang
Fig. 7. The test results of the redundancy coefficient of intelligent integration of retirement information
Fig. 8. Application performance test results of retirement information integration method
4 Conclusion Due to the increase of retirees and the gradual increase of retirement information resources, the traditional information management and integration work can no longer meet the needs of retirement information. The application of feature weighting technology has changed the integration method of diversified retirement information, and created a better sharing platform for it. The experimental results show that the integrity coefficient of the retirement information integration result obtained by this design method is improved by 2.4%, the information redundancy coefficient is effectively controlled, and the information retrieval speed is effectively improved.
Intelligent Integration of Diversified Retirement Information
167
References 1. Cai, Y., Li, Z., Yu, X.: The model design of government affairs archives integration and utilization based on enterprise user portrait. Archives Sci. Study 2, 125–131 (2021) 2. Wan, R., Li, T.: Research on the integration technology of government information resources in standard geographical name and address. Bull. Sur. Mapp. 4, 136–140 (2021) 3. Huang, S.: Multimedia network information integration and management technology in the era of big data. J. Beijing Inst. Graph. Commun. 29(8), 146–148 (2021) 4. Li, L., Dong, J., Zuo, D., et al.: SLA-aware and energy-efficient VM consolidation in cloud data centers using host state 3rd-Order Markov chain model. Chin. J. Electron. 29(6), 1207–1217 (2020) 5. Aoih, A., Dp, B.: Why and when does financial information affect retirement planning intentions and which consumers are more likely to act on them? - ScienceDirect. J. Bus. Res. 117, 411–431 (2020) 6. Palmer, J.C., Chung, Y., Park, Y., et al.: Affectivity and riskiness of retirement investment decisions. Pers. Rev. 49(9), 2093–2110 (2020) 7. Srivastava, G., Lin, C.W., Pirouz, M., et al.: A pre-large weighted-fusion system of sensed high-utility patterns. IEEE Sensors J. PP(99), 1–1 (2020) 8. Huang, G., Qu, H.: Data visualization and data fusion on the visual performance of illustration. J. Intell. Fuzzy Syst. 39(6), 8795–8803 (2020) 9. Bahrami, S., Dornaika, F., Bosaghzadeh, A.: Joint auto-weighted graph fusion and scalable semi-supervised learning. Inform. Fusion 66(1), 213–228 (2021) 10. Yin, C., Wang, P., Wang, C., et al.: Multi area resource information search integration simulation for industrial product design. Comput. Simul. 36(6), 438–441,454 (2019)
Recognition Method of Abnormal Behavior in Electric Power Violation Monitoring Video Based on Computer Vision Mancheng Yi(B) , Zhiguo An, Jianxin Liu, Sifan Yu, Weirong Huang, and Zheng Peng Guangzhou Power Supply Bureau of Guangdong Power Grid Co., Ltd., Guangzhou 510000, China [email protected]
Abstract. In order to improve the accuracy of abnormal behavior recognition in electric power illegal behavior monitoring, a method of abnormal behavior recognition in electric power illegal video based on computer vision is proposed. The monitoring image of electric power violations is collected by sensors, and the monitoring video image is preprocessed based on mathematical morphology and neighborhood average filtering; The static target detection method and background difference method in computer vision technology are used to separate the background and moving foreground in the video frame sequence; Locate the staff in the video image and track their movement track; Fusing FAST corner and SIFT algorithm to extract corner features and texture features of staff action behavior in the monitoring image; The above features are input into the long and short memory recurrent neural network to realize the recognition of abnormal behavior in the electric power illegal monitoring video. The results show that the Kappa coefficient between the method and the measured results remains above 80%, which proves that the recognition method improves the accuracy of abnormal behavior recognition. Keywords: Computer Vision · Abnormal Behavior of Electric Power Violation · Surveillance Video · Identification Method Middle Figure Classification
1 Introduction The operating environment of electric energy is complex, facing challenges of high voltage and high voltage operation. To ensure the safety of designers during power operation, the energy industry requires energy operators to monitor operators and operational processes in real-time. In the past, human supervision was often used, not only wasting a lot of work, but also due to the fatigue, laxity, and other personal reasons of superiors, which often led to inadequate supervision, and human supervision could not simultaneously balance the global situation. With the development of high-definition video monitoring, high-definition cameras are widely used in power construction sites for all-round, largescale and long-term supervision. At present, security personnel at power operation sites © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 168–182, 2024. https://doi.org/10.1007/978-3-031-50574-4_12
Recognition Method of Abnormal Behavior in Electric Power
169
often use high-resolution cameras to capture on-site information, and then supervisors inspect the returned videos to determine whether the operator has engaged in illegal activities. However, the observation range of the human eye is limited, and long-term observation may lead to visual fatigue. However, using computer vision technology to detect risks on returned images can aid in security monitoring. When risk information is detected, an alarm is issued, and then security monitoring personnel conduct inspections based on video information, greatly improving the work efficiency of security monitoring personnel and adding a “dual guarantee” for risk detection in the energy operation center. Reference [1] proposes a video anomaly detection method considering crowd density. Design and generate the framework of the confrontation network; According to the scene density and the object of the behavior, a video anomaly detection model based on the generation of confrontation network is constructed from two aspects of individual behavior anomaly and group anomaly, and the individual abnormal behavior and group abnormal behavior are detected based on reconstruction and prediction methods. Reference [2] designed a method for detecting abnormal behavior through feature extraction based on the characteristics of different application scenarios. Analyze the characteristics and categories of abnormal human behavior; From the perspective of identifying and detecting abnormal behavior, a method for identifying abnormal behavior has been designed to supplement the identification and detection of abnormal behavior. The above traditional target detection methods can only detect the target by extracting the simple features of the target. However, power operation is usually in an open and complex outdoor environment. There are a lot of buildings, infrastructure and tools in the monitoring image. There are many people coming and going, and they are often blocked by obstacles. Different light intensities also bring different colors. The above traditional target detection methods are difficult to extract more complex and comprehensive features to identify the target. With the rapid development of computer vision, there are better applications in the image recognition field. A computer video is a multidisciplinary scientific field that studies how computers gain a high level of understanding of digital or video images. From a design perspective, he is looking for automatic tasks that can be performed by human vision systems. Under this background, a method of abnormal behavior recognition based on computer vision in power violation monitoring video is studied.
2 Research on the Identification Method of Electric Power Violations 2.1 Key Technologies Computer vision is the automatic extraction, analysis, and understanding of useful information from a single image or sequence of images. It involves developing theoretical and algorithmic foundations to achieve automatic visual understanding. As a scientific discipline, computer vision is related to the theory behind artificial systems that extract information from images. Image data can take various forms, such as video clips, views from multiple cameras, or multidimensional data from medical scanners. As a technical discipline, computer vision attempts to apply its theories and models to the construction
170
M. Yi et al.
of computer vision systems [3]. The application range ranges from industrial image processing systems to artificial intelligence research, to computers or robots that can understand the surrounding world. One. Automatic inspection, e.g. in manufacturing applications; Assisting humans in carrying out identification tasks, such as the species identification system; Process control, such as industrial robots; Detect events, such as visual observation or personnel counting; Interaction, e.g. as an interactive device input computer human; Facility or environmental modelling, such as medical image analysis or terrain modelling; Navigation, such as autonomous vehicle or mobile robot; Organization information, such as a database for indexing images and image sequences. The organization of computer vision systems largely depends on their application. Some systems are independent applications that solve specific measurement or detection problems, while others are larger subsystems. This subsystem includes subsystems such as mechanical actuator control, design, database, human-machine interface, etc. The specific implementation of a computer vision system also depends on whether its functions are predefined or whether its parts can be learned or modified during operation. Many features are unique to applications. However, many computer vision systems have typical functions. (1) Image Acquisition - Digital images are generated by one or more image sensors. In addition to different types of photosensitive cameras, they also include distance sensors, tomography equipment, radar, ultrasonic cameras, etc. Depending on the type of sensor, the obtained image data is a regular 2D image, 3D volume, or image sequence. Pixel values typically correspond to the intensity of light in one or more spectral bands (gray or colored), but can also be related to different physical measurements, such as the depth, absorption, or reflection of sound or electromagnetic waves or nuclear magnetic resonance [4]. (2) Before applying image processing methods to image data to extract specific information, it is usually necessary to process the data to ensure that it meets certain assumptions implicit in the method. For example: Resample to ensure the correct image coordinate system. Reduce noise and ensure that sensor noise does not provide incorrect information. Enhance contrast to ensure relevant information is recognizable. Scaling space means that the image structure is enhanced at a local appropriate scale. (3) Extract image features of different complexity levels from image data. A typical example of this function is: Lines, edges, and burrs. Local attractions, such as corners, points, or points. More complex features may be related to texture, shape, or motion. Detection/segmentation: At a certain point in the processing, determine which pixels or regions of the image are related to subsequent processing.
Recognition Method of Abnormal Behavior in Electric Power
171
The segmentation of one or more image regions containing specific objects of interest. Divide one or more videos into a series of prominent masks per frame while maintaining their temporal semantic continuity. (4) Identification - In this step, the input is usually a small group of data to obtain the identification results, such as: Verify that the data meets the assumptions based on the model and specific applications. Evaluate application specific parameters, such as the pose of the object or the size of the object. Image recognition fonts categorize the detected objects into different categories. 2.2 Video Image Preprocessing of Power Violation Monitoring Digital image processing is an image processing process and technology that utilizes digital processing techniques, such as noise reduction, image enhancement, image restoration, image segmentation, etc. In today’s society, with the rapid development of information technology, computer processing skills have significantly improved, and mathematics has developed rapidly. The demand for applications in various fields of production and life is also constantly increasing. Images are the most intuitive way to obtain information and are still the focus of research to this day. These factors have led to the rapid development of digital image processing technology [5]. Digital image processing technology is committed to helping people understand the world more accurately through images. The task of image processing is. 2.2.1 Mathematical Morphology Image Processing Morphology generally refers to an important branch of biology. It is a discipline to study the structure of animals and plants. Its important function is to obtain topology and structure information of animals and plants. Mathematical Morphology is a mathematical processing method based on set theory and topology. As an image processing tool, it processes images to extract valuable information that can describe the shape of the region, such as region edges and bones. The idea achieved is to simplify image data with specific shaped structural components, maintain the foreground shape, fill holes in the target area to extract or eliminate false boundaries caused by noise, and thereby improve detection accuracy. Its four main operations are expansion, corrosion, opening, and closing [6]. These four operations can also be derived and combined into various image processing algorithms, such as image segmentation, edge detection, image enhancement, etc., to process and analyze the shape or structure of the target area in the image. (1) Bulge: Bulge is to connect the separated parts of the object in the image to make the object more complete. Expansion is to expand or coarsen the target. The calculation formula is as follows: F1 = A ⊕ B = {C(D)C ∩ A = ∅}
(1)
172
M. Yi et al.
among, ⎛
⎞ 101 C = ⎝0 0 0⎠ 111
(2)
In formula, F1 represents the expanded power violation monitoring video image; A represents the pixel set of the original video image; C represents the structural elements; (D)C represents the structural element template; and ⊕ represents the expansion operator. (2) Corrosion: The corrosion operation is to corrode the edges of the binary image. The operation formula is as follows: F2 = AB = {C(D)C ⊆ A}
(3)
In formula, F2 represents the power violation monitoring video image after corrosion; represents the corrosion operator; (D)C represents the element set that the structural element C moves over the plane. The formula (3) indicates that the point set obtained by corrosion B corrosion to A is composed of points in B translated by C and included in A . The erosion operation can remove the boundary points of the target area, making the target boundary shrink from outside to inside. Therefore, the area of the image target area will be smaller after the erosion operation, and some small targets and isolated noise points can be effectively removed. The protrusions around the target after corrosion have been eliminated. (3) Open operation: the open operation is to first erode and then expand the image to make the image contour smooth, break the narrow neck and eliminate thin protrusions; After this operation, the position and shape of the image will not change. The closing operation is to expand the image before etching, which can make the image contour smooth [7]. The calculation formula is as follows: F3 = A ◦ E = (AE) ⊕ E
(4)
among, ⎛
⎞ 001 E = ⎝1 1 0⎠ 100
(5)
In, F3 represents the power violation monitoring video image after the operation; E represents the structural element; ◦ represents the operation symbol. (4) Closed operation: closed operation is to expand the image and then corrosion, can make the outline of the image become smooth, but unlike open operations, it can bridge narrow, intermittent, and elongated grooves, eliminate small holes, and fill cracks on the contour line, and can make the position and shape of the image will not change [5]. The operation formula is as follows: F4 = A • E = (A ⊕ E)E
(6)
In formula, F4 represents the power violation monitoring video image after closed operation; • represents the closed operation symbol.
Recognition Method of Abnormal Behavior in Electric Power
173
2.2.2 Image Noise Processing Image noise processing is the unnecessary or excessive interference information in an image, which seriously affects image quality. Generally, the noise of the image can be eliminated by filtering processing. Common methods include neighborhood average filtering, median filtering, Gaussian filtering, etc. The average neighborhood filtering method can also be called the average value filtering method. The implementation process involves first crossing pixels with all their surrounding pixels, and then replacing the corresponding pixels with the average values in the output image to achieve filter smoothing. The simplest neighborhood average method is to take the same value for all template coefficients. For example, take the template coefficient as 1, which is also called the Box template. The commonly used templates are the following two template types with the size of 3 × 3 and 5 × 5 . ⎛ ⎞ 111 ⎝1 1 1⎠ 111 (7) 9 ⎤ ⎡ 1 1 1 1 1 ⎥ ⎢ ⎢1 1 1 1 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢1 1 1 1 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢1 1 1 1 1 ⎥ ⎦ ⎣ 1 1 1 1 1 25
(8)
The calculation formula of the proposed method is as follows: A (i, j) A(i, j) =
(i,j)∈R
N
(9)
In formula, A(i, j) represents the value at pixel point (i, j) after the mean filtering of the power violation surveillance video image; A (i, j) represents the original pixel value at pixel point (i, j); N represents the number of pixels in the template R and R represents the set of pixels in the template. 2.3 Positioning and Tracking of Operators The positioning and tracking of operators in the electric power violation monitoring video image is the research basis for the analysis and detection of abnormal behavior in the monitoring video. Moving target positioning refers to the use of digital image processing technology to separate the background and moving foreground in the video frame sequence, so as to extract the moving operators in the image. Tracking is a process based on extracting target features, used to track and retrieve motion paths, so as to detect the anomaly of some specific subsequent behaviors. The positioning and tracking of
174
M. Yi et al.
operators will directly affect the subsequent analysis and detection results of abnormal behaviors of target personnel. Therefore, it is essential to do a good job in the basic work of positioning and tracking. 2.3.1 Target Positioning of Operators Operator localization is the extraction of moving target characters from video images. Moving target location is the basis of target recognition, tracking and understanding of target behavior, so the accuracy of target location seriously affects the next step of video processing. The operator positioning can be divided into two types according to the state of the background: one is static target detection, in which case the camera is static relative to its field of view; The other is dynamic target detection. In this case, the camera needs to follow the target movement to obtain a larger field of view. The field of view of the video collected in this paper is static, so the method of static object detection is adopted. This article mainly uses the background difference method in static object detection methods to locate the operator’s target. Background difference method is a very common operator location method. Its basic principle is to obtain the foreground target by subtracting the pixel brightness of the current video frame and the background image at the same position. The specific steps are: Step 1: Establish a background model as the background image according to the current scene; Step 2: Update the background image according to the changes of the background model, and differentiate the current video frame from the background image; Step 3: Perform threshold processing for the differential image to extract moving targets. With the current frame video image representing St (i, j) and the background frame representing Qt (i, j) , the differential image Tt (i, j) can be represented as: Tt (i, j) = |St (i, j) − Qt (i, j)|
(10)
the expression (i, j) represents the spatial coordinates of the pixels. Thresholding of the differential image yields the foreground target Ut (i, j) , expressed as:
1, Tt (i, j) ≥ W , target (11) Ut (i, j) = 0, Tt (i, j) < W , background In formula, W is the segmentation threshold, when the pixel Tk (i, j) of the difference image is greater than the threshold, the point is marked as the foreground pixel, otherwise marked as the background pixel. The foreground target obtained from the above formula is the target positioning result of the operator, and the result is used for real-time tracking of the operator’s motion process. 2.3.2 Motion Tracking of Operators The operator’s motion tracking is to track and locate the moving target in the sequence image in real time after detecting the moving target person, and obtain the position
Recognition Method of Abnormal Behavior in Electric Power
175
coordinate, motion speed, direction and other motion information of the moving person in real time through tracking, laying the foundation for the subsequent motion target behavior analysis. At present, many scientists both domestically and internationally are conducting extensive research and analysis on the problem of moving target tracking. And have also successfully implemented many moving target tracking algorithms. The existing algorithms mainly focus on two analysis ideas: first, detect the moving target from the sequence image, and then locate and identify the target to achieve target tracking; The second is to first establish the target template according to the acquired target prior information, and then match the template in the sequence image to find the moving target and achieve real-time tracking. At present, a lot of research has been done on target tracking methods. According to the image edge, contour, shape, texture, region, etc. and similarity measurement (Euclidean distance, etc.) of the tracked moving target, it can be roughly summarized into the following categories: tracking methods based on active contour, region based, feature-based, and model-based. The quality of the target tracking method lies in whether it can accurately and real-time locate the target position in the sequence image to achieve target tracking. The effective expression of the tracked moving target and the definition of similarity measure are the key to determine the accuracy and stability of the tracking algorithm. Currently, the main target tracking methods for mobile personnel include modelbased tracking, feature-based tracking, region-based tracking, and active contour-based tracking. Among them, the realization process of region based target tracking method is: take the connected area of the moving target detected in the video sequence image as the detection unit, and extract the features of each target region. The extracted features can be color, area, centroid, shape, etc., and match the similarity of the features of the moving region in the adjacent two frames of images, The position of moving target is determined according to the feature similarity of moving region in two images to achieve the tracking of moving target. When calculating the matching degree of the target area in two adjacent frames of the image sequence, the best matching target can also be obtained by combining multiple features of the target. The key of region based target tracking algorithm is to segment the moving target region in the image accurately. Only in this way can the extracted motion features be accurate and effective, the accuracy of subsequent matching be improved, and the tracking effect be improved. When there are few moving objects in the video images of power violation monitoring, the region based target tracking method has higher tracking accuracy and can track moving objects stably. However, when there are many moving objects in the sequence image, the target area is easy to be occluded, which affects the tracking effect, reduces the tracking accuracy, and easily leads to the loss of the target. However, this method is simple and easy to implement. Selecting specific motion features for different tracking objects can make the tracking effect of moving objects better. 2.4 Staff Motion Feature Extraction Staff motion feature extraction refers to extracting effective feature data from video sequences to describe human motion state, which is the premise of describing and understanding human behavior in video sequences. Therefore, the rationality of feature selection will directly affect the effect of human behavior recognition. At present,
176
M. Yi et al.
there are four main methods of features used in human behavior analysis and understanding: appearance shape features, motion features, space-time features, and hybrid features combining motion features and shape features. Among them, appearance shape feature and motion feature are two commonly used features, and spatiotemporal feature has also been widely used in human behavior recognition. Appearance shape features are easy to obtain and relatively stable, and are insensitive to texture changes. Motion features are suitable for behavior description and recognition in the case of long distance or low visibility. With the continuous in-depth study of human behavior analysis and understanding in video sequences, high-level feature information in video image sequences has been widely studied and can more accurately describe human behavior. Considering from the aspect of obtaining human behavior feature data, we can avoid the difficulty of accurately estimating human behavior by extracting the moving human body area in each frame of video sequence, and then using the features of the moving area or contour sequence to describe and identify human behavior [8]. Corners, as an important local feature of images with rotational invariance, do not change with changes in lighting. Extracting key features can reduce the computational complexity of data and shorten processing time without losing image information. This chapter proposes an image feature extraction algorithm based on FAST corners, combining FAST and SIFT algorithms. The system diagram of this algorithm is shown in Fig. 1.
Fig. 1. Image feature extraction based on FAST angles
Aiming at the corner features of FAST, the feature descriptor of FAST is constructed by SIFT algorithm. The specific implementation process is as follows:
Recognition Method of Abnormal Behavior in Electric Power
177
Step 1: In order to have rotation invariance, within the neighborhood of any FAST feature point, according to the main direction a of the point, rotate the initial coordinate system by an angle a, and then establish a new coordinate system. Step 2: Take the feature point as the center, take the 16 × 16 pixel template window as the neighborhood of the feature descriptor, divide the pixel neighborhood window into 16 4 × 4 sub regions, and then establish a gradient histogram in 8 directions in each sub region, and count the amplitude of each direction in the region. Step 3: Arrange the 8 directional gradient histograms in 4 × 4 sub regions in order of position. Since there are 4x4 sub regions, there are 4 × 4 × 8 = 128 data in total, and finally 128 dimensional feature vectors are formed, as shown in Fig. 3.5. Step 4: The feature vector is weighted by a Gaussian standard function with a variance of 6 and then normalized. This can eliminate the influence of light changes, so that the feature vectors with light changes do not change to a certain degree, thereby reducing sensitivity to brightness changes. Step 5: In order to improve the discriminability of corner features of FAST, normalize the feature vector again. The information represented by a single feature is limited, so LBP algorithm is used here to extract texture features. LBP algorithm was originally used to describe texture features. The basic LBP algorithm works on the 3x3 neighborhood of the central pixel and uses the grayscale value of the central pixel in the neighborhood as a threshold, then compares the grayscale values of the pixels in the 8-neighborhoods with the grayscale values of the central pixel. If the grayscale value of pixels in the neighborhood is greater than the grayscale value of the middle pixel, mark the pixel point value as 1, otherwise mark it as. Finally, you get an 8-bit binary number and convert the sum of binary numbers with different weights of each position into a decimal number, so the decimal number is the LBP value of the neighborhood. Equation (12) is the definition of the basic LBP algorithm. Z(L, b) =
L−1
f (hi − h)2L
(12)
1, hi − h ≥ 0 0, hi − h < 0
(13)
i=0
Among, f (hi − h) =
In the formula, L is the number of domain pixels (the number of sampling points), b represents the radius of the neighborhood, h represents the gray value of the central pixel in the neighborhood, and hi is the gray value of the i th pixel in the neighborhood. 2.5 Identification of Abnormal Behavior The feature data can be classified for behavior recognition. For example, we first build our own behavioral feature library by learning typical behaviors, and then extract motion features from real-time behaviors to compare and match with them. There are two key issues: one is to establish a meaningful feature sequence database by analyzing typical
178
M. Yi et al.
samples; The second is to find an appropriate matching method to compare the similarity between the feature sequence to be detected and the feature sequence database. Up to now, there are many methods for behavior recognition. Here, we use short-term and short-term memory neural networks for recognition [9]. In the traditional neural network algorithm, the nodes in the same hidden layer are not connected. Therefore, the traditional neural network cannot save data information at different times, so it cannot capture the relationship between consecutive video frames. The recurrent neural network can store data information of different times through feedback connection. However, Sepp Hochreiter found the long-term dependence of the recurrent neural network, that is, when learning sequence data, the recurrent neural network will appear gradient disappearance and gradient explosion phenomena, and it is impossible to master the nonlinear connection of a long time span. Long Short Memory Recurrent Neural Network (LSTM) is a very special type of recurrent neural network that can effectively alleviate the problem of RNN gradient disappearance and explosion. The LSTM structure has been carefully designed to avoid long-term dependencies. The LSTM structure includes Forgotten Gates, Input Gates and Output Gates. Unlike the nonlinear transformation of input data in RNN, LSTM structure threshold unit controls information transmission. Forgotten Gate: The most important element in LSTM structure is the unit state. It learns the dependence of each frame image like a chain, so that information can be transmitted downward. The mathematical formula is described as follows: Yt = ψ · v(βt−1 , αt ) + Q · k
(14)
In the formula, Yt represents the result of the forgotten gate operation; ψ is the sigmoid function; v is the weight matrix of the forgotten gate joint, with the bottom marker corresponding to the corresponding joint; βt−1 is the output of a time node on the hidden layer, αt is the output of the input layer at all time, and k is the offset value of the forgotten gate. Entrance Gate: Finally, define how to filter out based on the state of the exit cell. There are two tasks to be done in the input gate: the first input threshold determines what information needs to be updated on the sigmoid level and generates the content that needs to be updated on the Tanh level; The second element updates the device status. The mathematical formula is described as follows: λt = Yt · λt−1 + Ot · λˆ t
(15)
In the formula, Ot is the cell state of the input gate at time t ; λˆ t is the temporary state with new candidate values; λt represents the new cell state; λt−1 represents the new old cell state. Output gate: The output gate is the last threshold through which information flows. When information passes through the forgetting and entrance gate, it reaches the exit gate. Its main function is to output the final value. There are two more steps: the first step is to determine the status of the community; Step 2: Hide the output of cells. Enter the cell state in the Tanh function (convert the value to between -1 and 1), and then multiply it by the sigmoid threshold to get the output. The mathematical expression reads as follows:
Recognition Method of Abnormal Behavior in Electric Power
βt = υt · Tanhλt
179
(16)
In the formula, υt represents the output gate cell state at time t The basic process of abnormal behavior identification based on LSTM is as follows: (1) As the input of LSTM at time t, the behavior characteristics output the calculation results. (2) The input gate, exit gate, and forgetting gate are used to control the gate, process the previous input data and input time t, and check the output and unit status of time t. Output to the next LSTM unit up to the last layer. (3) The output layer neurons are used to process the information learned by the last layer LSTM unit. (4) The data loss function of the calculated output layer is updated by the gradient descent method until the conditions are met. Based on the above research, a computer-aided method for identifying abnormal behaviors in videos for monitoring power breaches was completed.
3 Method Test 3.1 Video Image Acquisition for Action Monitoring of Power Maintenance Personnel Take the image of maintenance personnel’s work site collected by the on-site video monitoring system in the past as an example to test the method. A total of 1200 video images were obtained, of which 600 were abnormal and the rest 600 were normal. Some of the images are shown in Fig. 2.
Fig. 2. Example of the action monitoring video image of the electric power maintenance personnel
To test the feasibility of this algorithm, the experimental platform used in this article is a personal computer with a CPU frequency of 2.53 GHz and an NVIDIA ForceGTX1070 graphics card with 4 GB of memory and 1 GB of video memory. This algorithm was developed by MATLAB R2016A. All processed images are stored in an HDFS database with an image size of 256 x 256.
180
M. Yi et al.
3.2 Positioning of Operators For example, a behavioral motion image uses differential background method to achieve target localization, as shown in Fig. 3.
Fig. 3. Target positioning
3.3 Feature Vectors Two features are extracted for each action monitoring video image of the power maintenance personnel, and some of the results are shown in Table 1. Table 1. Feature vector extraction table Image
Corner/°
Texture/dpi
1
1.2623
15.8754
2
2.154
14.5365
3
0.231
16.8754
4
1.2645
14.7452
5
2.1225
11.8765
6
0.0862
13.8645
7
1.0861
11.7452
8
0.5521
12.8422
9
0.4553
13.8645
10
0.4222
15.5122
11
1.7452
13.5312
12
1.3011
14.4222
13
1.8952
13.8654
14
1.0252
15.2102
Recognition Method of Abnormal Behavior in Electric Power
181
3.4 Image Recognition Results For the detection results, calculate the Kappa coefficient between the measured results and the measured results, and judge the accuracy of the method identification. The results are shown in Fig. 4. 1.0 Abnormal behavior image
Kappa coefficient
0.9
0.8
Normal behavior image
0.7
0.6
0.5 100
200
300 400 Number of images / piece
500
600
Fig. 4. The Kappa coefficient
As can be seen from Fig. 4, under the Kappa coefficient of the studied method, the K a p p a coefficient between the measured results decreases with the number of video images, but it remains above 80%, proving the accuracy of the identification method.
4 Conclusion Whether or not the construction behaviour of the energy conservation industry is regulated, it is related to the personal safety of workers and is crucial for the development of the energy industry. The behavioral analysis of employees is an important research direction in the field of modern deep learning. Currently, video surveillance is mainly implemented in maintenance projects in the energy industry. Through the analysis of these videos, we can obtain relevant information about the construction of staff. In view of the illegal operation of electric power maintenance personnel, the illegal behavior of electric power maintenance site is identified from the perspective of computer vision. The results are as follows: 1) Data enhancement and image enhancement technology can increase the number of pictures and improve the training effect to a certain extent; 2) The proposed algorithm can effectively identify the types of illegal operations in power operation scenarios and specific scenarios; 3) The power operation environment is complex, and there are often people blocking and different distances that affect the detection effect. In the future, the applicability of the model can be improved by continuing to expand training samples and modify training models.
182
M. Yi et al.
Aknowledgement. Science and Technology Project of China Southern Power Grid Co., Ltd. (GZHKJXM20200058).
References 1. Shen X., Li, C., Li, H.: Overview on video abnormal behavior detection of GAN via human density. Comput. Eng. Appl. 58(7), 21–30 (2022) 2. Xiaoping, Z., Jiahui, J., Li, W., et al.: Overview of video based human abnormal behavior recognition and detection methods. Contr. Decision 37(1), 14–27 (2022) 3. Daradkeh, Y.I., Tvoroshenko, I., Gorokhovatskyi, V., et al.: Development of effective methods for structural image recognition using the principles of data granulation and apparatus of fuzzy logic. IEEE Access 9(99), 13417–13428 (2021) 4. Liu, S., Wang, S., Liu, X., Gandomi, A.H., Daneshmand, M., Muhammad, K.: Victor hugo c de albuquerque, human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans. Multimed. 23, 2188–2198 (2021) 5. Liu, S., Liu, D., Muhammad, K., Ding, W.: Effective template update mechanism in visual tracking with background clutter. Neurocomputing 458, 615–625 (2021) 6. Shuai, L., Shuai, W., Xinyu, L., et al.: Fuzzy Detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 29(1), 90–102 (2021) 7. Gao, P., Zhao, D., Chen, X.: Multi-dimensional data modelling of video image action recognition and motion capture in deep learning framework. IET Image Proc. 14(7), 1257–1264 (2020) 8. Shatalin, R.A., Fidelman, V.R., Ovchinnikov, P.E.: Incremental learning of an abnormal behavior detection algorithm based on principal components. Comput. Opt. Opt. 44(3), 476–481 (2020) 9. Zerkouk, M., Chikhaoui, B.: Spatio-temporal abnormal behavior prediction in elderly persons using deep learning models. Sensors 20(8), 2359 (2020)
Damage Identification Method of Building Structure Based on Computer Vision Hongyue Zhang(B) , Xiaolu Deng, Guoliang Zhang, Xiuyi Wang, Longshuai Liu, and Hongbing Wang China Construction Third Engineering Bureau Group Co., Ltd., Zhanjiang 524000, China [email protected]
Abstract. Under the influence of load, earthquake, settlement and other factors, the building structure will be damaged to different degrees. If the building structure damage is not found and handled in time, it will lead to the risk of building components falling off or even collapsing. Therefore, a method of building structure damage identification based on computer vision is proposed. According to different damage types and mechanisms of building structures, the identification criteria of building structure damage are set up to provide reference for damage identification. The computer vision technology is used to collect the building structure image, and through geometric registration, light correction, graying, noise reduction and other steps, complete the preprocessing of the initial building structure image, improve the image denoising effect, and avoid the impact of noise, light and other factors on the recognition results. The image features of the building structure are extracted from the three aspects of color, texture and geometric shape, and compared with the set recognition standards. The damage type of the building structure is determined by similarity measurement, and the identification results including the damage parameters of the building structure are output. Compared with the traditional identification methods, the identification accuracy of the optimized design building damage identification method is improved, and the parameter identification error is smaller, that is, the identification performance is better. Keywords: Computer Vision · Building Structure · Structural Damage · Damage Identification
1 Introduction Construction refers to the artificially constructed assets, which belong to the category of fixed assets, including two categories of houses and structures. A house refers to an engineering building for people to live, work, study, produce, operate, entertain, store and carry out other social activities. Different from buildings are structures, which refer to engineering buildings other than houses, such as walls, roads, dams, wells, tunnels, water towers, bridges and chimneys [1]. In house construction, the system composed of various components that can withstand various functions is the building structure. Under © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 183–199, 2024. https://doi.org/10.1007/978-3-031-50574-4_13
184
H. Zhang et al.
the influence of factors such as load, earthquake, temperature change, and foundation settlement, building structures will be damaged to varying degrees. If the damage to the building structure is not detected and dealt with in a timely manner, it will lead to the danger of building parts falling off or even collapse. The safety of building use, a method for identifying damage to building structures is proposed. At present, the structural damage identification mainly uses the monitoring acceleration, based on the dynamic inversion theory to identify the structural damage and modify the mechanical model. Since structural modal parameters are functions of structural physical parameters and boundary conditions, and are only related to the characteristics of the structure itself, structural damage identification methods based on modal parameters have been widely studied. In addition, it also includes such methods as mode confidence criterion method, curvature mode method, stiffness method, flexibility method, residual force vector method, modal strain energy method, etc. However, a large number of studies have shown that the natural frequency of structures is often greatly affected by environmental effects, such as the change of natural frequency caused by temperature is even greater than that caused by structural damage; In addition, a parameter such as frequency, which reflects the overall performance of the structure, may not be sensitive to early minor local damage. Therefore, such methods have been difficult to be applied in practice, which has become an important bottleneck restricting the application of structural health diagnosis theory in civil engineering. Aiming at the problem of low recognition accuracy of existing methods, this paper proposes a method of building structure damage identification based on computer vision. Computer vision technology can transform the identification of building structure damage into image recognition, and use computers to process, analyze and understand the image to identify various targets and objects with different patterns. It can reduce the difficulty of building structure damage identification, and also improve the identification performance of building structure damage. At the same time, the method in this paper completes the preprocessing of the initial building structure image through geometric registration, light correction, graying, noise reduction and other steps, which improves the image processing effect, and is conducive to improving the recognition effect.
2 Design of Damage Identification Methods for Building Structures 2.1 Setting Building Structural Damage Identification Standards The damage of building structures includes cracks and deformation. Among them, cracks are the most common damage of building structures. At the initial stage of cracks, they have little impact on bridge operation and road driving, and almost no impact. If timely maintenance and remedy are not taken at the initial stage of crack formation, the damaged cracks will gradually form deeper cracks over time, which will affect the stability of the overall structure and pose a threat to the safety of vehicles and pedestrians [2]. Therefore, relevant departments need to take effective detection and maintenance programs, and take measures to ensure safety in the period when cracks have not deteriorated. The cracks can be divided into four categories according to their shapes and trends. The specific damage types and characteristics of building structures are shown in Table 1.
Damage Identification Method of Building Structure
185
Table 1. Description of the types of cracking damage in building structures Damage type number
Types of structural cracking damage
Cracking damage characteristics
1
Lateral cracks
The cracks are parallel to the ground, mainly distributed in the bottom interlayer, bottom plate and roof, etc. Such cracks are divided into road surface temperature shrinkage cracks and base layer reflection cracks. Normally, most of the cracks are equally spaced
2
Longitudinal cracks
Such cracks are perpendicular to the ground and are mainly distributed at both ends and corners of the building
3
Block crack
The staggered distribution of transverse cracks and longitudinal cracks makes the pavement split into many large blocks, and the blocks are also interlaced with each other
4
Mesh cracks
This type of fracture is a regional block formed by multiple fractures with different strikes, generally showing an irregular grid shape of smaller blocks
The reason why the cracks and damages of building structures can be identified through images is that the cracks are quite different from their backgrounds. Using computer technology to recognize and classify cracks, the quality of the recognition results largely depends on the selection and extraction of fracture feature vectors. Through the analysis of the crack characteristics of the building structure, the building structure with cracking damage can be summarized as: compared with the image background of the building, the crack performance will be obviously darker, that is, the pixel gray value of the target area is lower than that of the background pixel; There are obvious edge contours in the image performance, that is, the gray value of the area around the crack is small, and the gray value of the background is large, resulting in a very large difference in gray level between the background and the pixels adjacent to the crack; the overall performance of the crack is in the image. Although there may be fractures at local locations, discontinuities are allowed within them; building structural cracks are linear targets with different lengths and widths, and their trends are also different [3]. According to the above method, the characteristics of the image corresponding to the deformation and damage of the building structure can be obtained, and the summarized characteristics can be stored in the form of quantitative feature variables, which can be used as the identification standard of the damage of the building structure.
186
H. Zhang et al.
2.2 Using Computer Vision to Collect Building Structure Images With the support of cameras and light sources, the building structure images are collected according to the process shown in Fig. 1.
Start Start the image acquisition device
Load driver
Get computer vision information Set visual image working parameters Real time acquisition and transmission of building image N
Is the acquisition program terminated? Y Is the collected data saved normally?
N
Y Turn off the computer vision acquisition device End
Fig. 1. Flowchart of computer vision acquisition
The process of camera imaging is to convert the 3D scene into a 2D scene and lose the depth information. Therefore, before obtaining the 3D scene information, it is necessary to determine the corresponding relationship between the 3D information and the 2D information, which is determined by the camera model. The process of solving the model parameters is to calibrate each camera [4]. In the measurement process based on computer vision, camera calibration is an indispensable part. The process of camera calibration is to obtain the parameters of each camera model, including internal parameters, external parameters and distortion parameters: the internal parameters are some imaging parameter information of the camera itself, which determines the projection relationship from 3D space to 2D image; External parameters refer to the orientation of the camera in the natural scene, which determines the relative position relationship between the camera coordinate system and the world coordinate system; The distortion
Damage Identification Method of Building Structure
187
parameter occurs in the process of transformation from camera coordinate system to image physical coordinate system. Therefore, the quality of camera calibration results directly affects the accuracy of image measurement of building structures. In machine vision, the projection relationship between the three-dimensional space and the twodimensional image can be described by the camera model. In fact, there are mainly linear models and nonlinear models to express the correspondence of camera models through mathematical models. The linear model is also called the pinhole model, which means that the object is projected onto the imaging plane through the center point of the optical axis of the camera. However, many cameras are composed of lenses or lenses. Although the lens can collect more light and make the image clearer, it will distort the image [5]. If the measurement accuracy requirements are not so high, the pinhole model can be used for calibration. The calibration principle of the camera is shown in Fig. 2.
Fig. 2. Schematic diagram of camera calibration
Figure 2 contains three parts: world coordinate system, camera coordinate system and image coordinate system. Assuming (xbuild , ybuild , zbuild ) is the coordinate of any point in the building structure, the corresponding imaging results can be expressed as: ⎧ f x ⎪ ⎪ x = build ⎨ p zbuild (1) f xp , yp = f y ⎪ ⎪ ⎩ yp = build zbuild In formula (1), f represents the focal length of the camera. According to the actual imaging requirements of the building structure, set the working parameters of the camera. According to formula (1), the imaging results of all building structure nodes in the imaging range can be obtained. According to the spatial relationship of the building structure, connect multiple imaging points to obtain the image of the building structure. 2.3 Preprocessing of the Initial Image of the Building Structure When collecting the image of building structure, the image quality is degraded due to the limitation of actual conditions and the influence of uncontrollable factors such as random
188
H. Zhang et al.
noise. At the same time, when analyzing and processing the image, it is required to obtain the image of the target object that is clear, simplified and has eliminated the interference factors. Therefore, the image should be preprocessed before the image detection and analysis. 2.3.1 Image Geometric Registration The method of image geometric registration is pixel displacement calibration. Pixel displacement calibration is to convert the pixel displacement in the image into the real displacement of the structure, and obtain the corresponding image through the reference object with known actual size in the image and the pixel size of the reference object in the image, so that the integer pixel displacement and sub-pixel displacement calculated in the image are corresponding to the actual displacement through the proportional relationship [6]. The correspondence between the actual displacement and the pixel displacement in the image can be expressed as: wactual =
Jactual wimage Jimage
(2)
In formula (2), Jactual and Jimage represent the actual size of the building structure and the pixel size in the image respectively, wactual and wimage correspond to the actual displacement and pixel displacement. In the actual pixel displacement calibration process, determine the calibration direction of the displacement, and move w pixels in the corresponding direction. The calculation formula of w is as follows: w = wactual − wimage
(3)
This completes the geometric registration of the initial image. 2.3.2 Image Light Correction Since the pictures taken by the camera may have unbalanced light, and YCbCr color space is used in the system, it is necessary to compensate for the light. The method used for light compensation is to extract the 5% pixels with the largest brightness in the picture, and then linearly enlarge them to make the average brightness of these pixels reach 255. The brightness of the whole picture is linearly amplified according to the obtained coefficient, specifically, the RGB value of the picture pixel is adjusted. 2.3.3 Grayscale Image The grayscale image only has the brightness information of the surface of the building structure. After grayscale processing the color image, the processing workload of the computer can be greatly reduced. The color information of the image will not have a great impact on the identification of cracks in the building structure. For the purpose to be realized, it is mainly to obtain the morphological characteristics of cracks, so the identification of cracks in the building structure is based on grayscale images [7]. The grayscale processing process of the initial image can be expressed as: fgray (x, y) = ωR R(x, y) + ωG G(x, y) + ωB B(x, y)
(4)
Damage Identification Method of Building Structure
189
In formula (4), R(x, y), G(x, y) and B(x, y) respectively represent the red, green and blue color components in the initial image, ωR , ωG and ωB correspond to the weight values corresponding to the color components. Thus, the grayscale processing of the building structure image is completed. 2.3.4 Image Noise Reduction Gaussian filtering is selected as the method of filtering and denoising of building structure images. Gaussian filtering can effectively reduce the degree of blurring effect at the filtering position and obtain better filtering effect. The continuous two-dimensional Gaussian function is discretized to obtain the Gaussian template M , the Gaussian template M is placed at the initial end of the building structure image, and each pixel is filtered through the template, then the filtering result of pixel (i, j) can be expressed as for: 1 (x − γ − 1)2 + (y − γ − 1)2 exp − (5) fn (x, y) = 2π δ 2 2δ 2 In formula (5), δ is the weight value of the pixel to be processed in the image, γ is the length of the Gaussian template. The processing results of all the pixel points in the initial image of the building structure are obtained according to the above processing process. 2.3.5 Image Segmentation and Fusion Threshold segmentation technology is the simplest image segmentation method at present, and its core is to select an appropriate threshold, which is usually selected according to the histogram of the image. The image processed by the threshold segmentation technology is intuitive, easy to implement, fast in calculation, and has a more significant and intuitive segmentation effect for images with different gray levels of the target and the background [8]. The threshold segmentation method is divided into global threshold segmentation and local threshold segmentation. That is, in the segmentation process, each pixel in an image uses the same threshold, which is called global threshold segmentation; If different thresholds are used, it is called local threshold method. The iterative threshold segmentation method is adopted for the segmentation of the building structure image, that is, the segmentation method based on the approximation principle to calculate the optimal threshold value of the image through iterative calculation. This method has strong adaptability and is a more effective segmentation method to calculate the threshold value. Define the initial segmentation threshold as: η0 =
1 (gmin + gmax ) 2
(6)
In formula (6), gmin and gmax correspond to the maximum and minimum values of the grayscale of the image. According to the threshold set in formula (6), the image is divided into two parts, the background and the target, and the average grayscale of the
190
H. Zhang et al.
two parts is obtained, denoted as g0 and gb . On this basis, use formula 7 to update the segmentation threshold. ηnew =
1 (g0 + gnew ) 2
(7)
In the process of building structure image segmentation, the segmentation threshold is updated and iterated in real time, and whether the current threshold is consistent with the target threshold is judged. If the judgment result is consistent, the current threshold is judged as the optimal threshold, and the output result is the optimal solution of image segmentation. 2.4 Extracting Building Structure Image Features Image features refer to the underlying features of the image itself, mainly including the visual features of the image surface, such as color features, texture features, shape features, and the spatial relationship features of the image content. Color feature is a global feature, which describes the surface color attributes of the scene corresponding to the image or image region; Texture feature refers to the pattern formed by arranging and combining texture primitives according to a certain rule, which is closely related to the spatial transformation of image brightness; The shape feature refers to the contour feature of the image object or the region of interest feature of the image. The contour feature of the image refers to the feature of the outer boundary of the object, while the region feature of the image refers to the feature of the inner region of the outer boundary of the object; Spatial relationship feature refers to the independent segmentation of multiple target objects in the image by segmentation algorithm, and the relative spatial position or direction between them is called spatial relationship feature [9]. The extraction process of building structure image features is shown in Fig. 3. The color features of building structure images are expressed in the form of color moments. Color moments are a simple and effective color feature. The i pixel in the image takes a certain feature, and the first three color moments are defined as: ⎧ Nq ⎪ ⎪ 1
⎪ A = f (q, i) 1 ⎪ Nq ⎪ ⎪ i=1 ⎪ 1/2 ⎪ ⎪ Nq ⎨ 1
2 A2 = Nq (f (q, i) − A1 ) (8) ⎪ i=1 ⎪ ⎪ 1/3 ⎪ ⎪ Nq ⎪
⎪ 1 3 ⎪ ⎪ (f (q, i) − A1 ) ⎩ A3 = Nq i=1
In formula (8), f (q, i) is the i node of image q, Nq is the number of pixels contained in the image, and the solution results A1 , A2 and A3 are the first moment, the second moment and the third moment respectively. In the extracted color features of the building structure image, the first order moment represents the average intensity of each color component, the second order moment reflects the color variance of the area to be measured, that is, the non-uniformity, and the third order moment defines the gradient of the color component,
Damage Identification Method of Building Structure
191
Gaussian function
Image
Image scale space Gauss difference image Detection of spatial extremum Determine feature point location Eliminate edge response
Multi feature fusion
Fig. 3. Flow chart of feature extraction of building structure image
that is, the color asymmetry. Texture feature is one of the important information and features of an image. Texture is usually defined as a local property of an image, which reflects some rules of image gray distribution in a macro sense. Remote sensing images can be regarded as a combination of different texture areas. Various target areas of objects on the image have their own texture features, and these different texture areas are combined to form an image. According to the research on texture features, a variety of different methods for analyzing texture features have been proposed, among which the texture feature based on gray level co-occurrence matrix is the most direct texture analysis method. The gray level co-occurrence matrix method takes the gray value and position of the pixel as the research object, and reflects the comprehensive information of the pixel gray level, such as direction, adjacent interval, and variation range. It is the basis for analyzing the local texture patterns of images and their arrangement. The data of each point in the gray-level co-occurrence matrix represents the probability of occurrence of pixel pairs. Generally, these data are not used directly in texture analysis. We usually use the secondary statistics of the gray-level co-occurrence matrix to define some texture statistics. According to the gray-level co-occurrence, the texture statistics defined by the matrix can be expressed as:
(x − y)2 f (x, y) (9) χ= x
y
In formula (9), x and y are the horizontal and vertical components of image pixels. In addition, the extraction results of geometric features of building structures can be expressed as: λ = 4πL×S 2 (10) σ = SSex
192
H. Zhang et al.
In formula (10), S is the area of the connected domain of the building texture, Sex is the area of the minimum circumscribed rectangle, and L is the perimeter of the connected domain. The feature extraction results λ and σ represented by formula 10 are the extraction results of circularity and rectangularity geometric features, respectively. Finally, the extracted building structure image features are fused using the fusion principle shown in Fig. 4.
Building structure image
Image preprocessing
Feature extraction classifier
Color characteristics of building structure
Texture characteristics of building structure
Shape characteristics of building structure
Calculate feature weight value
Fusion results of architectural structural features
Fig. 4. Principle diagram of image feature fusion of building structure
The final result of building structure image comprehensive feature extraction is obtained, which is recorded as τcom . 2.5 Realizing Damage Identification of Building Structures The method of similarity measurement is used to match the extracted building structure features with the set damage standard features. Similarity measurement refers to a measure that comprehensively evaluates the degree of similarity between two things. The closer two things are, the more similar they are, and the more distant two things are, the less similar they are [10]. The similarity measure uses the Euclidean distance as the metric, and the Euclidean distance represents the distance between points. When the dimensions of each component of the eigenvector are inconsistent, it is usually necessary to standardize each component first. The measurement method is simple and therefore widely used. The formula for calculating the Euclidean distance between the extracted feature and the set standard feature is:
Damage Identification Method of Building Structure
d=
(τcom − τset )2
193
(11)
In formula (11), τset is the set damage standard feature of building structure. After similarity measurement, if the calculated d is higher than the similarity value d0 , it is considered that the building structure corresponding to the current image has damage, and the damage type is consistent with the type of τset , otherwise it proves that the current building structure has no damage. The damage parameters of building structures with damage are calculated [11–13]. The results of damage parameter identification method for cracked and damaged building structures are as follows: ⎧ ⎨ lx = (xe − xs ) × ς (12) l = (ye − ys ) × ς ⎩ y Sdamage = lx × ly In the formula (12), (xs , ys ) and (xe , ye ) represent the position coordinates of the start and end points of the crack in the building structure image respectively, ς is the proportional coefficient between the image coordinate system and the world coordinate system, and the final damage parameter calculation results lx , ly and Sdamage correspond to the damage of the building structure. Length, width and area of cracks. In addition, the parameter calculation results of the deformation and damage of the building structure can be expressed as: n
xi,actual − xi,u + yi,actual − yi,u h =
(13)
i=1
In the formula (13), xactual,i , yactual,i and xi,u , yi,u represent the current position coordinates and initial position coordinates of node i in the building structure respectively. Finally, the identification results including whether the building structure has damage, damage types and damage parameters will be output in a visual form.
3 Recognition Performance Experimental Analysis In order to test the recognition performance of the damage identification method of building structure based on computer vision, the experimental performance test experiment is designed by means of comparative test, and through the comparison with the traditional identification method, it reflects the optimal design identification method in the damage management and maintenance of building structure. 3.1 Configuring Computer Vision Equipment In order to meet the operation requirements of computer vision in the damage identification method of optimized design of building structures, Nikon D7000 camera is selected as the computer vision image acquisition equipment, and the camera’s internal
194
H. Zhang et al.
parameters such as sensitivity area, resolution, aperture range, focal length, etc. can be determined. Considering the actual environment inside the building, the corresponding auxiliary tools include fill light, tripod, laser rangefinder pen, ruler, etc. Sony ICX625 CCD sensor is used inside the camera, which can enable the camera to obtain high quality, high sensitivity and low noise images. The data interface of the camera is IEEE802.4, which can be directly connected with the main test computer by a network cable, so no image acquisition card conversion is required. The lens used with the camera is COMPUTAR M0814-MP, which features a compact design, less than 1.0% deformation rate, can capture the full resolution of a mega-pixel camera, and has adjustment screws to lock the focal length and aperture, High-contrast, high-definition images are rendered across the entire screen range. 3.2 Prepare Building Structural Damage Identification Samples All buildings in a certain area are selected as the research objects. Before the experiment, the field survey of the building structure is carried out, the actual data of the building research object is recorded, and compared with the design data of the building, so as to determine whether there is damage to the current building structure and the damage parameters, which are used as the comparison standard to judge the accuracy of the building structure damage identification method based on computer vision. The original image of building structure damage was taken by several inspectors. Using the configured computer vision equipment, it was ensured that the cracks basically appeared in the middle of the image, and no other shooting conditions were set. The resolution of the original image is 1280 × 1280, the initial acquisition of some building structure image samples is shown in Fig. 5. According to the above method, all the sample preparation results in the building structure damage identification performance test experiment are obtained, and the number of samples is 8,000. 3.3 Setting the Test Index for Damage Identification Performance of Building Structures The test was conducted from two aspects: the accuracy of damage type identification and the accuracy of damage parameters. The test index reflecting the accuracy of damage type identification was set as λ. The numerical results are as follows: λ=
Numdis−cor × 100% Numsample
(14)
The variables Numdis−cor and Numsample are the number of correctly identified samples output by the identification method and the total number of samples prepared for the experiment, respectively. In addition, the quantitative test indicators of damage parameter accuracy are the identification error of crack length, the identification error of crack width and the identification error of structural deformation. The test results of the above indicators can be expressed as:
Damage Identification Method of Building Structure
(a) Image sample 1
(b) Image sample 2
(c) Image sample 3
(d) Image sample 4
195
Fig. 5. Schematic diagram of a sample for damage identification of building structures
⎧ ⎨ ϑx = |l x−dis − lx−actual | ϑy = ly−dis − ly−actual ⎩ ϑdeformation = |Qdis − Qactual |
(15)
In the formula (15), lx−dis , lx−actual , ly−dis and ly−actual respectively represent the identified and actual values of the length and width of the cracks in the building structure, while Qdis and Qactual correspond to the identified results and actual data of the building structure shape variables. The higher the value of λ, the smaller the value of ϑx , ϑy and ϑdeformation , which indicates that the damage identification performance of the corresponding method is better. 3.4 Identification Performance Experimental Test Process and Result Analysis The building structure image samples obtained by computer vision technology are input into the running program of the building structure damage identification method, and the corresponding damage identification results are obtained. Figure 6 shows the damage identification results of sample No. 1. Similarly, the damage identification results of all building structure samples can be obtained. In order to reflect the advantages of the optimization design method in damage identification performance, the traditional damage identification method of building structures based on curvature mode is set as the experimental comparison method, and the identification output results of the comparison method are obtained with the support of relevant data. The output results of the two identification methods are compared with
196
H. Zhang et al. Damage identification of building structure Image feature extraction
Computer vision initial image
Identification results Damage type of building structure
Cracking damage
Crack length
25mm
crack width
4.5mm
Fig. 6. Damage identification results of building structures
the set damage standards, and the test results reflecting the identification accuracy of building structure damage types are obtained, as shown in Fig. 7.
Correct number of building damage type identification / piece
Curvature modal identification method
Computer vision recognition method
160
120
80
40 1
2
3 Experimental group
4
5
Fig. 7. Test results of building structure damage type identification performance
It can be seen intuitively from Fig. 7 that compared with the traditional identification method, the optimal design method can identify more correct samples. Substitute the data in Fig. 7 into Eq. 14, and calculate the damage type identification of the two methods. The accuracy rates λ are 91.4% and 99.6%, respectively. In addition, the identification results of the damage parameters of the building structure are shown in Table 2.
25
18
26
31
11
16
2
3
4
5
6
2.9
3.0
1.4
1.7
3.2
4.5
45
53
92
76
33
59
14
10
30
25
15
23
2.6
3.0
1.1
1.3
2.8
4.2
Crack width/mm
45
51
90
75
31
56
Deformation/mm
16
11
30
26
16
25
Crack length/mm
2.9
3.0
1.4
1.6
3.0
4.5
Crack width/mm
45
53
91
76
32
59
Deformation/mm
Crack length/mm
Deformation/mm
Crack length/mm
Crack width/mm
Output data of traditional damage Output data of building structure damage identification methods for building structures identification method based on computer based on curvature mode vision
Actual damage data
1
Sample No
Table 2. Building structure damage parameter identification data sheet Damage Identification Method of Building Structure 197
198
H. Zhang et al.
Substituting the data in Table 2 into Eq. 15, it is calculated that the average identification errors of the crack length, width and building structure deformation of the traditional identification method are 1.67 mm, 0.28 mm and 1.67 mm respectively, while the average identification error of the optimized design method is 1.67 mm, 0.28 mm and 1.67 mm respectively. The error corresponds to 0.5 mm, 0.05 mm and 0.33 mm. To sum up, compared with the traditional identification methods, the optimally designed computer vision-based damage identification method for building structures has higher damage type identification accuracy and smaller parameter identification errors, that is, better identification performance. This is because the method in this paper completes the preprocessing of the initial building structure image through geometric registration, light correction, graying, noise reduction and other steps, improves the image processing effect, reduces the impact of noise, light and other factors, and thus improves the accuracy of damage identification.
4 Conclusion Building structure damage identification is one of the important research topics in the inverse mechanics problem. Building structure damage identification is of great significance for extending the service life of buildings. In this context, this paper proposes a method of building structure damage identification based on computer vision. The main innovations of this method are as follows: (1) Through geometric registration, light correction, graying, noise reduction and other steps, the preliminary processing of the initial building structure image is completed, which improves the image quality and provides a basis for the later damage identification. (2) Use computer vision technology to collect building structure images, which can reduce the difficulty of building structure damage identification and improve the identification performance of building structure damage. (3) The experimental results show that the recognition error of this method is low, which verifies its application value.
References 1. Liang, Z., Hua, J., Chen, H., et al.: A computer vision based structural damage identification method for temporary structure during construction. J. Graph. 43(04), 608–615 (2022) 2. Hou, Y., Wang, X., Hou, T., et al.: Research progress of damage detection methods of ancient timber structures. Adhesion 49(01), 123–126+130 (2022) 3. Zhao, J., Lin, W.: Modeling and simulation of geometric profile calibration error identification of combined. Comput. Simul. 39(6), 308–312 (2022) 4. Tu, M., Li, H.: Research on damage detection of fabricated building steel structure based on thermal image enhancement. Comput. Simul. 38(12), 420–423+444 (2021) 5. Cui, H., Du, H., Zhao, F., et al.: Damage identification in a plate structure based on a crossdirection strain measurement method. Measurement 158(9), 107714 (2020) 6. Zeng, T., Barooah, P.: Identification of network dynamics and disturbance for a multizone building. IEEE Trans. Control Syst. Technol. 28(5), 1–8 (2020)
Damage Identification Method of Building Structure
199
7. Miao, B., Wang, M., Yang, S., et al.: An optimized damage identification method of beam using wavelet and neural network. Engineering 12(10), 748–765 (2020) 8. Erdogan, M., Yilmaz, A.: Detection of building damage caused by Van Earthquake using image and Digital Surface Model (DSM) difference. Int. J. Remote Sens. 40(9–10), 3772– 3786 (2019) 9. Guo, Q., Feng, L., Zhang, R., et al.: Study of damage identification for bridges based on deep belief network. Adv. Struct. Eng. 23(8), 1562–1572 (2020) 10. Meoni, A., D’Alessandro, A., Kruse, R., et al.: Strain field reconstruction and damage identification in masonry walls under in-plane loading using dense sensor networks of smart bricks: experiments and simulations. Eng. Struct. 239(112199), 112–199 (2021) 11. Weng, S., Zhu, H.: Damage identification of civil structures based on finite element model updating. Eng. Mech. 38(03), 1–16 (2021) 12. Weng, X., Liu, C., Miao, J., et al.: Research on damage location of high-rise building based on dynamic testing. J. Qingdao Univ. Technol. 42(04), 1–8 (2021) 13. Hou, Y., Hu, W., Yang, J.: Damage identification and location of ancient timber structure based on auto-correlation function peak change rate. Struct. Eng. 36(3), 48–55 (2020)
Automatic Focus Fusion Method of Concrete Crack Image Based on Deep Learning Chuang Wang(B) , Jiawei Pang, Xiaolu Deng, Yangjie Xia, Ruiyang Li, and Caihui Wu China Construction Third Engineering Bureau Group Co., Ltd., Zhanjiang 524000, China [email protected] Abstract. The algorithm used in the traditional image auto focus fusion method is easy to fall into local iteration, resulting in poor quality of image fusion results. A depth learning based concrete crack image auto focus fusion method is designed. Extract the features in the digital image to obtain the template feature set of the concrete crack image, match, and use the filter function to reduce noise, optimize the image auto focus fusion algorithm, and improve the quality of the fused image after transformation. In order to verify the effectiveness of the design method, comparative experiments are designed to compare the results of the design method and the traditional methods. In terms of fusion focusing results, output signal to noise ratio and image histogram, the results of the design method are better than the traditional methods. The image quality and output signal to noise ratio obtained are higher, and the histogram distribution is more uniform, indicating that the image quality is better. Keywords: Deep learning · Artificial intelligence · Image autofocus · Image fusion
1 Introduction Self focusing image fusion is an important part of image fusion technology, and the key is image processing. When forming images in a scene, due to the limited focusing range of the optical system, it is usually not easy for the optical imaging system to generate clear images of objects at different distances in the scene [1]. When the focus of the imaging system is focused on an object, it can form a clear image on the image plane. At this point, the images generated on the surface of objects located elsewhere will exhibit varying levels of blur. Therefore, Principles of Optical Lens Imaging enables the imaging system to continuously increase its resolution, but it is impossible to avoid the impact of the specified focus range on the overall effect of the imaging image, that is, It is not easy to obtain clear images of all objects in the same scene only through the imaging system [2]. To more realistically and comprehensively reflect the information of the scene, it is hoped to obtain, to clear images of all objects in the scene. The way to deal with this trouble is to focus on different objects within the scene to obtain multiple auto-Next, fuse these autofocus images to improve their respective clear areas, to Calculate the usage area of each, convenient obtain all the objects in the scene. Fusion image of all objects is clear, that is, auto-focus image fusion. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 200–211, 2024. https://doi.org/10.1007/978-3-031-50574-4_14
Automatic Focus Fusion Method of Concrete Crack Image
201
Reference [3] ask a full-focus image fusion method of light field based on edge enhancement guided filtering, which can obtain full-focus image of light field, denoise through guided filtering technology, classify image characteristics using fuzzy clustering method, and fuse full-focus image of light field through edge enhancement. This method can improve the fusion effect, but the fusion efficiency is poor. Reference [4] proposes a multi-focus image fusion method using Laplacian and convolutional neural network. The multi-focus image is obtained according to the sensor, and the convolutional neural network is used to build a multi-focus image fusion trainer for image fusion training. The Laplacian method is used to increase the features and improve the multi-focus image fusion effect. However, this method is affected by the light intensity and has poor reliability. To solve the above problems, this paper proposes an concrete crack image autofocus fusion method based on depth learning. The auto-focus image fusion technology enables objects with different imaging distances to be clearly presented in an image. The artificial intelligence image auto-focus fusion method based on deep learning has laid a good foundation for feature extraction, image recognition and other processing, thus effectively improving the utilization of image information and the reliability of the system for target detection and recognition.
2 Automatic Focusing Fusion Method of Concrete Crack Image Based on Depth Learning 2.1 Concrete Crack Image Feature Extraction and Matching In order to realize automatic focusing and fusion of concrete crack image, a remote concrete crack image filtering and detection model is construct, and the feature binding feature matching of concrete crack image is put into effect out under the information enhancement skill. The edge wheel thickness of the local sound array of concrete crack image is determined. The feature quantity is described by p∗ = X (CS2) , θ, ρ , where X (CS2) is the fuzzy blocking feature quantity of concrete crack image, θ is the edge feature quantity of concrete crack image, and ρ is the contour feature quantity of concrete crack image. In the spatial coordinate system, the pixel data set of concrete crack image is as follows: (1) M = θ e , ρ e = EFA(θ, ρ) Among them, EFA is factor analysis, h is the pixel set of concrete crack image, and the pixel characteristic structure and binary respectively of concrete crack image are performed using the positive diffusion method [5, 6]. The separation guide line is (θ e , ρ e ), and the vector features combined with fuzzy blocks concrete crack image is obtained through the local quantitative The specific feature analysis is as follows: ⎧ ⎨ EX (CS2) = {x|x ∈ [0, h] } (2) EY (CS2) = ρ e cosθ e ⎩ (CS2) e e EZ = ρ sinθ
202
C. Wang et al.
Among them, X (CS2) , Y (CS2) , Z (CS2) are respectively the fuzzy block feature matching vector on the three-dimensional point, x is the pixel on the X (CS2) plane, and the three-dimensional scattered point of the concrete crack image is used about information fusion, and ρ e − R is the displacement center [7, 8]. Among them, R is the displacement distance. Based on The sparse method was used to cut the features of the concrete crack image, resulting in the length of the edges of the concrete crack image: Team(z) = argmax (yk· z + ek ) k=1,2,··· ,R
(3)
Among them, z is the pixel of the concrete crack image on the Z (CS2) plane in the spatial rectangular coordinate system, and k is the pixel. Then, the local structure feature decomposition method is used to obtain the template feature set of concrete crack images [9, 10]. Establish the analytical model with the same characteristics as the concrete crack image uses the Gradient descent method to track the pixels. Under the super-resolution visual imaging, the filter function of the concrete crack image is: p m=1
sim(xc , xd ) =
p
ycm ×
ydm
m=1
p m=1
2 ycm
×
p m=1
(4) 2 ydm
Among them, xc and xd are the concrete crack image pixel abscissa under the superresolution visual imaging condition, yc , yd are the concrete crack image pixel ordinate under the super-resolution visual imaging condition, and m is the amplitude-frequency characteristic of the filter function. In the sub-blocks Ga,b with M × N and 2 × 2 sub-blocks, the automatic focusing and information fusion processing of the image are performed by the method of guided filter detection. Figure 1 shows the block diagram of the digital image auto-focus fusion route in this article. It can be seen from the analysis of Fig. 1 that in this paper, the concrete crack image filtering is extracted, then the image features are compared, and the image edge scale function is obtained through pixel feature reconstruction. After calculation, the concrete crack image focusing automatic fusion is achieved by removing the image clutter, and after guided filtering detection. According to the sparse prior visual feature extraction method, the feature segmentation mold of the concrete crack image is established, and the Iterative reconstruction model of the concrete crack image is created by combining the features of the concrete crack image with the information enhancement technology. The specific grayscale pixels are as follows: S
(5) pixel_C = max (Q − P) i=1
Among them, P is the concrete crack image feature, and Q is the artificial intelligence feature after three-dimensional reconstruction. Based on this, a distributed combination
Automatic Focus Fusion Method of Concrete Crack Image
203
Start
Image filtering extraction
Image feature matching
Pixel feature reconstruction
N
Refactoring succeeded Y Image edge scale
Image filtering guided removal
Guided filter detection
Focus fusion
End
Fig. 1. Road map of concrete crack image auto focusing fusion
model of low resolution and high resolution templates concrete crack image is established, and the gray feature decomposition of concrete crack image execute through the scheme and fuzzy relationship degree of spatial block fusion of the concrete crack image is K(x0 , y0 ), and K(x0 , y0 ) is In the center, remote fusion is performed, and the grayscale marginal set of concrete crack images is acquire as: pixel_D = max
S
(R − K)
(6)
i=1
If pixel_C < pixel_D, through the difference fusion and scheduling method, the template matching coefficient of the concrete crack image is obtained as: ⎧
⎪ ⎨β = θ π s = 1 − λλ21 (7) ⎪ ⎩v = λ + λ 1 2
204
C. Wang et al.
Among them, λ1 and λ2 the detection results are for the long and short edges, respectively. Based on edge contour testing, the template matching output is obtained: IGSM =
N
(β(Ci + vi |si ) − β(vi ))
(8)
i=1
Among them, N is a combination of natural numbers and i the number of test point. Using adaptive feature detection and parameter estimation algorithms, guided filtering is applied to concrete crack images, resulting in the following filtering function: xˆ (r / r) =
δ
xˆ i (r / r)uj (k)
(9)
j
Among them, r represents the filter function guide value, k matching coefficient for concrete crack image points, δ break down the scale for Eigen decomposition of a matrix, uj (k) pixel intensity, and j Is the energy coefficient. Foundation on the combination of digital feature matching and communication tracking methods, the concrete crack image guided filtering and noise reduction processing are performed. 2.2 Introduce Deep Learning Neural Network Within the pulse coupled neural network put forward akhorn], a pulse coupled neuron Nij is composed of three parts: receiving domain, modulation part, and pulse generator. Figure 2 shows the structure of pulse coupled neurons. There are two passageway in the receive domain. One passageway is called channel F, which is used to receive feedback input Fij ; The other channel is called channel L and is used to receive link input Lij . The external stimulation signal is transmitted to neuron Nij from channel F, while the output signal of neurons in other adjacent positions is transmitted to neuron Nij from channel L. A pulse coupled neuron can usually be described by the following mathematical formula: ⎧ Fij (n) = exp −1 αF Fij (n − 1) + Sij + VF Mijkl Ykl (n − 1) ⎪ ⎪ ⎪ kl ⎪ ⎪ ⎪ ⎪ Lij (n) = exp −1 αL Lij (n − 1) + VL Wijkl Ykl (n − 1) ⎪ ⎪ ⎪ kl ⎪ ⎨ Uij (n) = Fij (n) 1 + βLij (n) (10) ⎪ exp −1 αL θij (n − 1), if Yij (n − 1) = 0 ⎪ ⎪ θij (n) = ⎪ ⎪ otherwise V , ⎪ ⎪ T ⎪ ⎪ 1, if Uij (n) > θij (n) ⎪ ⎪ ⎩ Yij (n) = 0, otherwise It can be seen from Eq. (10) that the internal activity u is composed of the inputs of channel F and channel L. After obtaining the internal activity U , U is compared with the
Automatic Focus Fusion Method of Concrete Crack Image
205
Fig. 2. Structure of pulse coupled neurons
dynamic threshold B of neurons, and the output Y is obtained by comparison. If there is significant internal activity U , neurons will produce pulse signals. αF , αL and αT are the attenuation ordinary courtes of PCNN neurons, and VF , VL and VT are amplification factors. M and W are the weights of the feedback part and the link part of the receiving domain, respectively. The constant β is the connection strength between neurons. In the system put in this paper, PCNN is used to process image directly. The structure of PCNN is single-layer, similar to a two-dimensional matrix. One neuron is connected with other surrounding neurons. The structure and function of all neurons in PCNN are the same. The number of neurons in PCNN is consistent with the number of pixels of the input image. Moreover, the pixels of the input image correspond to neurons one by one. For neuron Nij , the input of channel F is the gray price of pixel Sij relevant; to the neuron. Fij (n) = Sij
(11)
Generally, the output of an ordinary neuron in PCNN can be expressed by formula (10). However, in this chapter, the output of neuronal hunger is indicate by under; formula (12). Uij (n), if Uij (n) > θij (n) (12) Yij (n) = 0, otherwise The output of PCNN used in this paper is also an image, that is, image a, with O size of P × Q picture element. The picture element Oij value is the sum of the outputs of its
206
C. Wang et al.
corresponding neuron Nij , as shown in formula (13):
Oij = Yij (n)
(13)
n
We use formulas (10) to (12) to describe the pulse coupled neuron and PCNN used in multi focus image fusion in this paper. Before describing the execution steps of PCNN used in this paper, we need to introduce the symbols of variables and constants used. External stimulus (feedback input) F is a two-dimensional matrix with size P × Q, as shown in formula (11). The value of each element in F is the gray value of the pixel of its corresponding input image. L is the link input matrix with size P × Q.
L = exp −1 αL ∗ L + VL ∗ work, where work = (Yconv2K). Y is the output matrix, which records the output of all current neurons. K is the kernel matrix, and its size is (2 × r + 1) × (2 × r + 1). is the threshold matrix. U is the matrix recording the internal activity of each neuron. O is the output matrix of PCNN. F, L, work, Y , , U and O are two-dimensional matrices with the same size Represents dot product, ‘conv2‘ represents two-dimensional convolution. The specific execution steps of PCNN for multi focus image fusion proposed in this paper are as follows: 1) Initialize the relevant parameters and matrices. ⎧ ⎨F = S L=U =Y =0 ⎩ =1
(14)
The initialization of K is different from other array. The value of the elementary substance at the center of the K matrix is 1. The value of other elements in the K matrix
is 1 d , and d is the range from the element to the pivot, and d ≤ r. 2) Select the number of iterations np of PCNN. Initialize n, n = 1. L = exp −1 αL ∗ L + VL ∗ work U = F ∗ (1 + β ∗ L)
(15)
3) If n > np , goto step 4), else go back to step 2). 4) Matrix O is used as the output of PCNN. 2.3 Optimized Image Auto Focus Fusion Algorithm In this paper, (x, y) is used to represent the coordinates of any pixel, giA (x, y) and fNA (x, y), giB (x, y) and fNB (x, y)(i = 1, 2, · · · , N ) are used to represent the high-frequency and lowfrequency subband coefficients of the original figure A and B after NWF transformation and decomposition respectively; giF (x, y) and fNF (x, y) represent the combined NWF transform high-frequency and low-frequency incidentally factor The high-frequency take actor of the original image after NWF transform represent the detail communication of the original presentation. The more detail communication, the higher the definition.
Automatic Focus Fusion Method of Concrete Crack Image
207
The concept of contrast is applied to multi-sensor image fusion, and the image contrast C is defined as: C=
LP − LB × LB LH
(16)
where, LP is the part gray level of the image. LB is the indigenous background gray level of the image (equivalent to the low-frequency component product after NWF transformation); LH = LP − LB . The high-frequency components after the equivalent NWF transformation and the measurement accuracy of image comparison are as follows: (x, y) =
N
i=1 (m,n)∈N (x,y)
giI (x, y)2 fNI (x, y)
(17)
where: I = A, B; N (x, y) is a window defined with pixel (x, y) as the center, which is taken as 10 in this paper × 10 (pixels). The larger C I (x, y), the greater the contrast and the higher the definition of the local area where pixel (x, y) is located in the original image I. According to the contrast measurement of the image, the clear and blurred regions of the pixels can be approximately determined. The fusion process is to combine the NWF transform coefficients of the pixels belonging to the clear region in the multi focus image. The high-frequency subband coefficients of multifocal images are combined as follows: A gi (x, y), C A (x, y) ≥ C B (x, y) F (18) gi (x, y) = gNB (x, y), C A (x, y) < C B (x, y) The low-frequency subband fusion process of multi focus image is similar, that is: A f (x, y), C A (x, y) ≥ C B (x, y) (19) fNF (x, y) = NB fN (x, y), C A (x, y) < C B (x, y) In multi focus image, considering the correlation of adjacent pixels, the possibility of the following situations is generally small: a NWF coefficient after combination comes from the coefficient after image A transformation, while most of the coefficients after combination of adjacent pixels come from the coefficient after image B transformation; And vice versa. Therefore, consistency test should be conducted on the combined NWF transform coefficients to ensure that most of the combined NWF coefficients of a pixel and its adjacent pixels come from the same image. After consistency test, the combined NWF coefficients can be inversely transformed to have the blend image.
3 Experiment 3.1 Experimental Parameters In order to verify the application performance of this method in digital image dynamic focusing fusion, experimental tests and analysis are carried out. Relevant parameters are shown in Table 1. By setting the parameters above, take two groups of image for experiments.
208
C. Wang et al. Table 1. Experimental Data
Serial number
Parameter
Size
1
Image window size
25 * 25 pixels
2
Image filtering coefficient
0.26
3
Image edgeretention
0.8
4
Guided filterscale
4
5
Numbero fiterations
200times
3.2 Comparison of Fusion Focusing Results The imitate tentative of digital image focusing fusion method is completed by using PC equipped with MATLABR2018b software. The computer configuration of the simulation experiment PC is Intel (R) Pentium CPU of 2.42 GHz, the memory RAM of the computer is 4.00 GB, and the operating system of the PC is the 64 bit system of windows 8.1. In this paper, the VITOVision database is selected as the experimental data set to obtain the concrete crack image fusion focusing results and comparison, as shown in Fig. 3.
Fig. 3. Automatic focus fusion result of concrete longitudinal crack image
The concrete crack image fusion focusing results and comparison are shown in Fig. 4.
Automatic Focus Fusion Method of Concrete Crack Image
(a)Original drawing
(c)Paper method
209
(b)BP filtering
(d) Wavelet analysis
Fig. 4. Concrete transverse crack image auto focus fusion results
Based on the analysis of the above simulation results, it can be concluded that this method can effectively achieve the automatic focusing combination of concrete crack images and increase the digital imaging quality. 3.3 Output SNR Comparison The conclusion of comparing the signal-to-noise ratio of the output is shown in Table 2. Table 2. Signal noise ratio comparison test Iterations/time Paper method Edge enhancement guided filtering/dB Laplace method/dB 50
32.12
12.33
23.36
100
45.45
15.43
29.11
150
52.34
19.21
35.46
200
54.34
21.15
42.12
According to the comparison test results of signal-to-noise ratio, the maximum signal-to-noise ratio of the edge enhanced guided filtering method and the Laplace’s method is 21.15 dB and 42.12 dB, respectively, while the maximum signal-to-noise ratio of this method can reach 54.34 dB, indicating that the imaging quality of this method is better.
210
C. Wang et al.
3.4 Image Histogram Comparison
Number of pixels
Number of pixels
So as to further verify the image focus fusion effect of the method in this paper, the image histograms processed by different methods are obtained, and the results are shown in Fig. 5.
Brightness
Brightness
(b) BP filtering
Number of pixels
Number of pixels
(a) Original drawing
Brightness
(c) Paper method
Brightness
(d) Wavelet analysis
Fig. 5. Image histogram under different methods
From Fig. 5, it can be seen that the original image has low brightness and poor brightness; After Laplace’s method processing, the brightness of the image is significantly improved, but the brightness is too bright; After using edge enhancement guided filtering method, the image brightness is too concentrated and the image processing effect is not good; After the processing of the method in this article, the histogram distribution of the image is relatively uniform, achieving the best image processing effect.
4 Conclusion 3D information features of joint concrete crack images, the concrete crack image processing is implement through filtering and testing methods. The article proposes an automatic focusing fusion method of aconcrete crack images based on depth learning. The feature matching of concrete crack images under information execution enhancement
Automatic Focus Fusion Method of Concrete Crack Image
211
ability, characteristic segmentation model of remote geophysical remote sensing concrete crack images is built, and the matching with features of concrete crack presentation is carried out under the message enhancement technique, The spatial matching function of concrete crack image under remote geophysical exploration and remote sensing is obtained. In shape distribution blocks of concrete crack image pixel clusters, combined with shape features, the focus fusion model of concrete crack image is obtained in the multi-scale input image to achieve automatic focus fusion of concrete crack image. The experimental results show that: (1) The solution in the article can be well implemented the automatic focus fusion of remote geophysical remote sensing digital images, and improve the quality of digital imaging. (2) The signal-to-noise ratio of the method in this article can reach 54.34 dB for automatic focus fusion of far digital images, effectively improving the image quality. (3) After processing by this method, the image histogram distribution is relatively uniform, indicating that the image processing effect of this method is the best, which is neither bright nor dark. Although the methods in the article include made good progress in image fusion, the image fusion process is relatively complex and the fusion time is too long. Therefore, how to shorten the integration time will be my research direction.
References 1. Zhou, H., Zhao, L.H., Liu, H.: Research on image restoration methods for global optimization of damaged areas. Comput. Simul. 37(09), 469–473 (2020) 2. Mao, Y.P., Yu, L., Guan, Z.J.: Multi-focus image fusion based on fractional differential. Comput. Sci. 46(S2), 315–319 (2019) 3. Wu, Y.C., Wang, Y.M., Wang, A.H.: Light field all-in-focus image fusion based on edge enhanced guided filtering. J. Electron. Inf. Technol. 42(09), 2293–2301 (2020) 4. Zhai, H., Zhuang, Y.: Multi-focus image fusion method using energy of Laplacian and convolutional neural network. J. Harbin Inst. Technol. 52(05), 137–147(2020) 5. Zhao, D.D., Ji, Y.Q.: Multi-focus image fusion combining regional variance and EAV. Chinese J. Liq. Cryst. Displays 34(03), 278–282 (2019) 6. Zeng, Z.X., Liu, J.: Microscopic image segmentation method of C.elegans based on deep learning. J. Comput. Appl. 40(05), 1453–1459 (2020) 7. Cao, J., Chen, H., Zhang, J.W.: Research on multi-focus image fusion algorithm based on super resolution. Comput. Eng. Appl. 56(03), 180–186 (2020) 8. Chen, Q.J., Wang, Z.B., Chai, Y.Z.: Multi-focus image fusion method based on improved VGG network. J. Appl. Opt. 41(03), 500–507 (2020) 9. Xie, Y.X., Wu, Y.C., Wang, Y.M.: Light field all-in-focus image fusion based on wavelet domain sharpness evaluation. J. Beijing Univ. Aeronaut. Astronaut. 45(09), 1848–1854 (2019) 10. Liu, B., Han, G.L., Luo, H.Y.: Twin convolution neural network image fusion algorithm based on multi-scale details. Liq. Cryst. Disp. 36(09), 1283–1293 (2021)
Teaching Effect Evaluation Method of College Music Course Based on Deep Learning Lin Han1(B) and Yi Liao2 1 Nanchang JiaoTong Institute, Nanchang 330100, China
[email protected] 2 Music Department, Xi’an Shiyou University, Xi’an 710065, China
Abstract. Teaching effect evaluation is the key to deepen the teaching reform and improve the teaching quality of music course. In order to improve the teaching level of music course and realize the rapid popularization of music teaching in colleges and universities, the evaluation method of music teaching effect is studied. According to the application principle of the deep learning algorithm, the artificial neural network is constructed and the relevant learning indexes are combined to solve the Sofhnax regression condition. Based on this, the evaluation system model is perfected and the evaluation network based on deep learning is constructed. Determine the characteristics of music curriculum teaching evaluation and the scope of the teaching object. Through the statistics of the target elements, the evaluation mode is verified and evaluated, and the design of the evaluation method of music teaching effect based on deep learning is completed. The experimental results show that the cognitive load level of music teaching is promoted to 3.5 efficiency values under the effect of deep learning algorithm. In promoting the development of music teaching in colleges and universities can play an obvious role in promoting the impact. Keywords: Deep Learning · Music Course · Teaching Effect · Artificial Neural Network · Sofhnax Regression · Target Element
1 Introduction College education refers to professional education and vocational education on the basis of completing secondary education, and is the main social activity for cultivating senior professional talents and professional personnel. Higher education is one of the important parts of the education system that are interconnected. It usually includes various educational institutions with high-level learning and training, teaching, research and social services as their main tasks and activities [1]. Colleges and universities are the cradle of cultivating talents with all-round development in morality, intelligence, physique and labor. Under the requirements of the new era of educational reform, the implementation of public music education has been strengthened so as to balance the aesthetic education with other moral, intellectual and physical education courses. Compared with other © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 212–223, 2024. https://doi.org/10.1007/978-3-031-50574-4_15
Teaching Effect Evaluation Method of College Music Course
213
developed countries, the development level of domestic public music education is still in the primary stage, and the public music education in colleges and universities is in a slow state of development, which does not match the development of other professional disciplines [2]. However, with the popularization of quality education in the new era, public music education, which aims at improving the overall quality of college students, is bound to mushroom. The reform of public music education in domestic colleges and universities will also be accelerated. Therefore, it is necessary for the administrative department of education to formulate a scientific and systematic evaluation method of teaching effect of public music course in colleges and universities. Therefore, it is of great significance to study the status quo of college public music education. At present, there are many methods for evaluating the effectiveness of courses in English, physical education and other subjects. Due to the particularity of music course evaluation factors, the teaching effect evaluation of music subject is less. In order to further improve the teaching level of music courses, this paper proposes a method for evaluating the teaching effect of music courses in colleges and universities based on deep learning. Firstly, the artificial neural network is constructed using deep learning algorithm. On this basis, a deep learning music course evaluation network is constructed. The target elements of music course teaching are counted, and the evaluation of music teaching effect is completed.
2 Teaching Effect Evaluation Network Based on Deep Learning To evaluate the effect of music teaching in colleges and universities, we can improve the evaluation network with the support of deep learning algorithm. This chapter will focus on the above contents. 2.1 Artificial Neural Networks In the field of deep learning algorithms, artificial neural networks are computational models that mimic the animal’s central nervous system, especially the brain, enabling machines to learn and recognize information like the human brain. Neural networks are calculated by a large number of interconnected neurons. Often used to model complex relationships between inputs and outputs, or to explore intrinsic patterns of data. First of all, take the supervision goal of music course teaching in colleges and universities as an example, for a labeled data sample set (q(α), w(α)). Deep learning algorithms use this hypothetical model to fit sample data by building a complex nonlinear hypothetical model RE,χ (q, w) with parameters E, χ . You can start with the simplest single neuron to describe the neural network model architecture of the deep learning algorithm. Figure 1 is the simplest network model that contains only one neuron.
214
L. Han and Y. Liao
Fig. 1. Single neuron architecture for deep learning algorithms
This single neuron is a unit of computation. Its input condition is the data sample of music teaching effect of to be evaluated. Its activation function condition is: δ(α) =
RE,χ (q, w) 1 + exp(βα)
(1)
In formula (1), α represents a randomly selected data sample of the teaching effect of music courses in colleges and universities. β represents the data sample activation condition. Because the level of music teaching effect data samples in colleges and universities can not be “1”, the neural network of deep learning algorithm can not contain only one neuron structure. A complex neural network is one that connects multiple single neurons, using the output of one neuron as the input to the next, as shown in Fig. 2.
Fig. 2. Complex neural networks for deep learning algorithms
Teaching Effect Evaluation Method of College Music Course
215
In the cognition of deep learning algorithm, each layer of neuron nodes is a linear one-dimensional arrangement structure, and all neuron nodes in each layer are fully connected. In complex neural networks, the neuron nodes between layers are not fully connected. Using the local spatial correlation between layers, the adjacent neuron nodes in each layer are only connected with the upper neuron nodes, that is, the local connections. Thus, the neural network structure needed to evaluate the effect of music teaching in colleges and universities is obtained. Because of the different audiences targeted by professional music education and public music education, the content and method of teaching chosen by teachers in the course of teaching will be different [3]. Music education courses for non-music majors in universities are generally general elective courses. Most of the lectures are in the form of music appreciation, and teachers will choose composers or representative works of art to appreciate and study. Elective courses for professional content generally will not involve too much, the classroom to teach the main, practical little. 2.2 Sofhnax Regression Conditions SoRmax regression is based on the complex neural network structure. Its aim is to solve multi-classification problems. In such problems, the types of training samples are usually more than two. Softmax regression can be used to classify handwritten digits in similar deep learning algorithms. The problem is to distinguish between the 10 handwritten numbers 0–9. Softmax regression is a supervised learning algorithm that can also be used in combination with either deep or unsupervised learning methods. In logistic regression, the data evaluation sample set for the teaching effect of music courses in colleges and universities is composed of labeled samples. Its definition is: φ(u + u + · · · + u )2 1 2 ε (2) = r|r = − |γ 2 −1√ | δ(α) In the expression, r represents a random sample parameter in a set of samples. u1 , u2 , · · · , uε represent ε randomly selected sample indices. And the condition of inequality of u1 = u2 = · · · = uε is constant. φ represents the sample identification coefficient based on the deep learning algorithm. γ represents the iterative transfer coefficient of the data samples to be evaluated in a complex neural network. And the value condition of γ = 0 is constant. The complex neural network based on deep learning is a multi-layer supervised learning neural network, and the implicit convolution layer and pool sampling layer are the core modules [5]. Courses are divided into theoretical courses and professional courses. Through the systematic music education curriculum to improve students’ professional literacy. Set r1 , r2 , · · · , rε as the sample parameters of ε unequal to or equal to 1 in the sample set of music teaching effect data evaluation. Yε represents the data sample learning vector when the value condition is ε. Y means the average value of the learning vectors. t means the discriminant coefficient. u means the marked features of the data of music teaching effect evaluation. In conjunction with the above physical quantities,
216
L. Han and Y. Liao
the SoRmax regression condition expression based on the deep learning algorithm can be defined as: +∞ r1 × r2 × · · · × rε t=1 (3) ϕ = (u − 1) +∞ 2 2 Yε − Y t=1
After the deep learning algorithm processing, we get the classification process of the teaching effect data of college music course. The domain multilayer perceptual feedforward neural network is similar. Data indices to be identified are used as input samples, and the final classification results are propagated forward layer by layer to output layer. Music can broaden the vision of college students and cultivate noble moral character. Music education is that teachers guide and inspire students with infectious music language, play the role of moral and aesthetic education, and integrate art and ideology together. Through music education to stimulate students’ interest and fighting spirit, cultivate students’ noble moral sentiment, and establish correct values. 2.3 Perfect Evaluation System The music education system in colleges and universities is mainly divided into two parts. They are professional music education for students majoring in music and public music education for non-music students. Professional music education, as the name suggests, refers to the education of music majors in major music schools and normal colleges. Public music education in colleges and universities is generally divided into a broad sense and a narrow sense [6]. It includes not only public music education courses offered in colleges and universities, but also various forms of public music education such as art clubs, online music education, and extracurricular literary and art organizations. Both are music education, and both have the same positive role in educating people. However, due to the difference in the absorptive capacity of the teaching objects for the course, there are also great differences between the two in terms of teaching content, methods, and curriculum settings. The layout of the complete teaching effect evaluation system of music courses in colleges and universities is shown in Fig. 3. The development of music education and teaching in ordinary colleges and universities is generally classroom teaching, and there are also a few general courses that choose online teaching, but follow the basic principles of teaching. In order to achieve the balance of theory and skills, the unity of culture and art, and the perfect combination of moral education and aesthetic education, the main subjects of music education are teachers and students. Through teachers’ explanations and demonstrations, relevant professional knowledge is imparted to students to enrich students’ cultural knowledge and professional quality. The teaching process always takes music education as the theme, and completes teaching tasks through different teaching methods and teaching practices.
Teaching Effect Evaluation Method of College Music Course
217
The evaluation system of music course teaching effect in colleges and univers ities
Teaching management
Music education
Course teaching model
Generalized music knowledge
Quality-oriented cultural education
Music knowledge in narrow sense
Principles of deep learning algorithms
Fig. 3. Evaluation system layout form
3 Teaching Effect Evaluation Mode and Audit Evaluation With the support of the deep learning algorithm model, the realization of the teaching effect evaluation method of music courses in colleges and universities also needs to define the evaluation characteristic performance behavior. And while determining the scope of the evaluation object, analyze the manifestations of its target elements. 3.1 Evaluation Characteristics According to the connotation of accreditation assessment, the biggest feature of accreditation assessment is the purpose of assessment, that is, whether the school has the qualification to run a school. According to the evaluation purpose, the accreditation evaluation mode requires the education department or professional evaluation agency to directly evaluate the teaching quality of the school according to the unified minimum evaluation standard. Focus on investigating the school’s school positioning, teaching resources and other investment. If the school input meets the evaluation criteria, it will be passed, otherwise it will be rejected [7]. Grading evaluation is to evaluate the “level” of colleges and universities. It is developed on the basis of certification, and is divided into different levels on the basis of meeting minimum standards. Through observation and analysis of a series of relevant indicators, colleges and universities are divided into four grades: excellent, good, qualified and unqualified according to the evaluation results. Let I denote the pass rate of the assessment of the teaching effect data of college music courses. U represents the unit cumulative amount of data to be evaluated. λ represents the coefficient of qualification for running a university. f represents the audit standard judgment parameter. With the support of the above physical quantities, formula (3) is simultaneously established. The characteristic solution expression of deep learningbased music course teaching effect evaluation in colleges and universities can be defined as:
218
L. Han and Y. Liao
P=f ×
I ϕ − U λ+1
2
(4)
According to the connotation of grading evaluation, the purpose of grading evaluation is to evaluate the educational level of colleges and universities. The graded evaluation model requires education departments or professional evaluation agencies to classify colleges and universities according to the purpose of evaluation. Formulate corresponding evaluation standards according to different categories, directly evaluate the teaching effect of music courses in colleges and universities, and focus on the output of college students’ learning outcomes. Compared with the certification evaluation model, the subject of graded evaluation is also the education department or professional institution. But it can also be implemented by a consortium of universities or by individual universities themselves. The grading evaluation must also refer to the evaluation indicators, but it is not only to judge whether the college meets the minimum standards, but to give an accurate grade judgment. The graded assessment is also broad, but it focuses on process and output, and measures the quality of teaching through educational output. The graded assessment will publish most of the specific data in the assessment results, which is convenient for comparison between universities. 3.2 Evaluation Object and Scope By reviewing whether the talent training goals of colleges and universities meet the needs of social and personal development, whether the teaching resources such as teachers are sufficient, whether the undergraduate teaching quality assurance system is perfect, whether students and employers are satisfied, etc., it covers all links and elements that affect the quality of talent training. In a nutshell, the audit evaluation focuses on the quality of personnel training and focuses on the construction of the quality assurance system. It can be seen from this that the concept of outcome-oriented education has many similarities with the basic concept of review and evaluation. But the most prominent feature should be that both emphasize the degree of conformity and achievement of talent training goals, which can also be expressed as the degree of conformity and achievement of students’ learning outcomes. To this end, with student learning outcomes as a link, the combination of outcome-oriented education concepts and audit evaluation is shown in Fig. 4. Let κ denote the selection coefficient of the teaching effect evaluation object of college music courses, and the inequality condition of κ = 0 is always established. s represents the evaluation range metric. ι represents a randomly selected teaching effect evaluation authority parameter. dι represents a standard value for dividing the evaluation range based on the coefficient ι. hˆ represents the characteristics of teaching effect evaluation of college music courses based on deep learning. g1 and g2 represent two randomly selected evaluation benchmark values, Combined with the above physical quantities, the definition formula of the evaluation object and scope of the teaching effect of music courses in colleges and universities is deduced as:
Teaching Effect Evaluation Method of College Music Course
219
Decision Student le arning effect
Eva luation of te aching effe ct
Persona l development needs
Design of te aching mode l
Teaching objectives of music courses in colleges and unive rsities
Course arrangement and design
Fig. 4. Criteria for dividing the scope of teaching effect evaluation
A=
κ 2
(dι · P) ˆ g12 + g22 h · s=1
+∞
(5)
The scope of review and assessment is not a uniform standard, and each school can understand the scope of assessment according to its own actual situation. This again shows that the audit assessment emphasizes “measure yourself with your own ruler”. In the teaching process of existing music courses in colleges and universities, the focus of review and evaluation is on three aspects: how well the quality assurance system is built, how well it operates, and how well it works. Specifically, it is the degree of conformity between quality standards and quality objectives, and the degree of conformity between quality assurance systems and quality standards. The degree of compliance of the operation effect of the quality assurance system with the quality expectations. In a nutshell, it is the degree of conformity and achievement of personnel training objectives. 3.3 Target Element The general goal of music teaching training in colleges and universities is the basis for formulating professional training goals, and the professional training goal is the general outline specific to the training of a certain professional. It determines how colleges and universities build professional knowledge structure, how to set up curriculum system and how to carry out teaching activities. The formulation of professional training goals should not only reflect all aspects of talent training, but also clearly express the goals that students can achieve in terms of knowledge, ability, and quality (graduation requirements). It should also reflect the expected development that students will achieve in the future after entering the society. Generally speaking, professional training objectives should be clearly stated in the professional training plan. Let j represent the value vector of the teaching goals of college music courses based on deep learning. j represents a supplementary explanation condition for the coefficient j. ˜l represents the feasibility evaluation index of the teaching effect data. ˙l represents
220
L. Han and Y. Liao
the feasibility audit feature of the teaching effect data. μ represents the audit evaluation coefficient. The solution expression of the target elements of the teaching effect evaluation of college music courses based on deep learning is: A ˜2 ˙l (6) l K = 2 − μ j − j2 If the quality standard of professional training builds a structural framework for professional training in terms of knowledge, ability, quality, etc., the teaching link is the basic structural unit to realize this framework. The quality standard of the teaching link is essentially the decomposition of the quality standard of professional training. In other words, the collection (merging and strengthening) of the quality standards of each teaching link is the quality standard of professional training [10]. Therefore, the quality standard of professional training is the basis for the formulation of the quality standard of each teaching link. The school should formulate quality standards (or basic requirements) for the courses and main practical links (experiments, practical training, graduation design, etc.) in the training plan according to the quality standards for professional training. The quality standards of each teaching link are the support for the formulation of quality standards for professional training. The specific requirements for the training of students in terms of knowledge, ability and quality in the professional training quality standards. These should be fully reflected in the quality standards of teaching links.
4 Case Analysis In order to verify the ability of the teaching effect evaluation method based on deep learning to promote the popularization of music course teaching in colleges and universities, the following experiments are designed. First, record students’ cognitive level of music course teaching without evaluation method. Then, the teaching effect evaluation method based on deep learning is applied to intervene students’ learning behavior, and the students’ cognitive level of music course teaching is recorded again. Finally, compare the experimental results of the two groups and summarize the experimental rules. The equipment used in this experiment is shown in Table 1. In the experiment of this chapter, the subjects can choose two time periods of the day to complete the additional efficiency test. The purpose is to allow the subjects to more accurately and intuitively understand the cognitive load input and efficiency at different times of the day through EEG signals. Here are three subjects as an example (see Fig. 5 for the initial cognitive load-efficiency mapping relationship for the teaching of music courses in colleges and universities. The three subjects chose 9 am–10 am and 3 pm–4 pm). Participate in the test in a period of time. In this section, scatter plots are used to count the experimental results. Analysis of Fig. 5 shows that the three students have a relatively high distribution level of cognitive load scatter when they complete the efficiency test in the morning. The distribution of cognitive load scatter points was lower when the efficiency test was completed in the afternoon session.
Teaching Effect Evaluation Method of College Music Course
221
Table. 1 Experimental equipment selection Serial number
Device name
1
Harmonic wave analyzer
FR300R mass spectrometer
2
Brain wave analysis equipment
PLC controller
3
Monitoring the host
Tp-ipc 4U rack mounted industrial computer
4
Control equipment
610E-E
5
Network control system
Windows 10
6
The processor
Intel i7
7 6
Model
Subject number one Subject number two
Efficiency value
Subject number three 5 4 3 2 1 0
9am
10am 3pm Time
4pm
Fig. 5. Initial cognitive load-efficiency mapping relationship
Using the deep learning-based teaching effect evaluation method of music courses in colleges and universities, the learning ability of the three subjects was assessed, and the test time was kept unchanged. The changes of the cognitive load-efficiency mapping curves of the three subjects were recorded, as shown in Fig. 6. Analysis of Fig. 6 shows that compared with the initial cognitive load level of students in music course teaching. With the application of the teaching effect evaluation method based on deep learning, the learning ability of the three subjects has been significantly improved, and the unit increase is 3.5 efficiency values. This means that this method has a certain promoting effect in improving the teaching level of music courses. To sum up, the conclusion of this experiment is that the teaching effect evaluation method of college music courses based on deep learning can significantly improve the teaching level of music courses, and meet the practical application needs of rapid popularization of college music teaching. The features extracted from EEG data are used to establish the corresponding relationship between cognitive load and efficiency value distribution, and finally determine the students’ music learning ability. In the design of
222
L. Han and Y. Liao Subject number one Subject number two
8
Subject number three 7
Efficiency value
6 5 4 3 2 1 0
9am
10am 3pm Time
4pm
Fig. 6. Experimental results of cognitive load-efficiency mapping relationship
the scheme, the general process of the whole scheme is first introduced, and then the experimental paradigm is designed in detail, including the delineation method of the moving time window and how to calculate the efficiency value within the time window. Finally, the cognitive load-efficiency distribution mapping relationship of the three subjects was analyzed. Finding the individual optimal cognitive load level for each subject from the perspectives of the mean and stability of the distribution of efficiency values lays the foundation for the subsequent application of the model to teaching evaluation.
5 Conclusion In view of the problems exposed in the evaluation of music teaching in colleges and universities, it is urgent to explore how to construct a quality assurance system with colleges and universities as the main body. Based on the deep learning algorithm and the audit and evaluation model, the paper studies the construction of college music teaching security system and draws the following conclusions. (1) The idea of deep learning is a brand-new educational idea with the result as the core. Its biggest thinking revolution lies in the shift from the emphasis on resources to the students’ learning achievements. It emphasizes the student-centered, continuous improvement, learning output of students and the conformity and attainment of talent training goals. (2) Audit and evaluation is a new evaluation model, which emphasizes the quality of personnel training as the core, attaches importance to continuous improvement, emphasizes their own ruler, not horizontal comparison. The key inspection university music curriculum teaching quality safeguard system’s validity and the continuity. The review and evaluation emphasizes that schools are the main body responsible for the quality of talent training, and the aim of evaluation is to promote teaching reform and improve the quality of talent training. The key point of the evaluation
Teaching Effect Evaluation Method of College Music Course
223
is the process of personnel training, and the teaching quality is evaluated indirectly through the evaluation of the internal quality assurance system. (3) The quality guarantee system of music teaching in colleges and universities is a complex system, which mainly consists of target guarantee, resource guarantee, process guarantee and management guarantee. These four elements include several sub-elements, which together constitute the elements of undergraduate teaching quality assurance system. In addition, the outline of quality assurance standards, the framework of quality assurance structure and the basic process of quality assurance are the core and pillar of this complex system. (4) Based on the deep learning algorithm and audit and evaluation model, this paper constructs the quality assurance system model of music teaching in universities from three aspects. That is, according to the needs of social and personal development and the orientation of running a school, colleges and universities should establish quality objectives and quality standards, and carry out the construction of music teaching resources and teaching process management. And through the effective internal quality assurance system to continuously improve the quality of talent training in all aspects and all factors, and constantly improve the quality of college talent training. Acknowledgements. 1. Science and technology research project of Jiangxi Provincial Department of Education: The Design and Implementation of the Application of J2EE Hierarchical Thought in College Music Remote Assistant Teaching System (Project No.: GJJ2203112). 2. School-level projects of NanChang JiaoTong Institute: Research on the innovative path of VR technology into college music classroom teaching (Project No.: XJJG2022-23).
References 1. Yu, H.: Online teaching quality evaluation based on emotion recognition and improved AprioriTid algorithm. J. Intell. Fuzzy Syst. 40(4), 7037–7047 (2021) 2. Qianna, S.: Evaluation model of classroom teaching quality based on improved RVM algorithm and knowledge recommendation. J. Intell. Fuzzy Syst. 40(2), 2457–2467 (2021) 3. Gao, P., Li, J., Liu, S.: An introduction to key technology in artificial intelligence and big data driven e-learning and e-education. Mob. Netw. Appl. 26(5), 2123–2126 (2021) 4. O’Donoghue, J.: Stories from the lab: development and feedback from an online education and public engagement activity with schools. J. Chem. Educ. 97(9), 3271–3277 (2020) 5. Tissenbaum, M., Ottenbreit-Leftwich, A.: A vision of K–-: 12 computer science education for 2030. Commun. ACM. ACM 63(5), 42–44 (2020) 6. Yang, K.H., Ma, Q.: To Improve the quality of distance education and the simulation of effective resources under the Big Data. Comput. Simul. 34(4), 212–215+334 (2017) 7. Fang, C.: Intelligent online English teaching system based on SVM algorithm and complex network. J. Intell. Fuzzy Syst. 40(2), 2709–2719 (2021) 8. Yuan, T.: Algorithm of classroom teaching quality evaluation based on Markov chain. Complexity 2021(21), 1–12 (2021)
Recognition of Running Gait of Track and Field Athletes Based on Convolutional Neural Network Qiusheng Lin1(B) and Jin Wang2 1 Guangzhou Huali College, Guangzhou 511325, China
[email protected] 2 State Grid Shanghai Municipal Electric Power Company, Shanghai 200120, China
Abstract. With the continuous development of competitive sports, higher requirements have been put forward for the athletic level and technical movements of track and field athletes. Running gait is a key factor that affects the competitive level and technical action of athletes. Therefore, a research on the recognition method of running gait of track and field athletes based on convolutional neural network is proposed. According to the internal noise type of running gait image, the least mean square filtering algorithm is selected as the preprocessing method of running gait image, and it is used to remove the noise of running gait image. Based on this, the LB P feature, Hu moment invariant feature and Haar like feature are extracted as the running gait characteristics, and the convolutional neural network model is designed. With the above designed convolutional neural network model as a tool, the running gait recognition program of track and field athletes is formulated, And determine the calculation formula of the relevant parameters, in order to obtain accurate results of track and field athletes’ running gait recognition. The experimental data show that the maximum accuracy of running gait recognition of track and field athletes obtained by using the proposed method is 94%, which fully proves that the proposed method has better performance in running gait recognition. Keywords: Gait Recognition · Athletes · Convolution Neural Network · Running Gait · Feature Extraction · Recognition Accuracy
1 Introduction Athletics refers to the general term of all round sports consisting of walking, running, jumping, throwing and other sports as well as other partial events. Track and field sports have a long history, which originated from the basic survival and life activities of human beings. The earliest track and field competition was held in the ancient Greek village of Olympia in 776 BC. Since then, track and field has become one of the official events [1]. By 648, the Olympic Games had added jumping, bidding guns, discus throwing and other events. In 1894, a modern Olympic Games organization was established in Paris, France. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 224–238, 2024. https://doi.org/10.1007/978-3-031-50574-4_16
Recognition of Running Gait of Track and Field Athletes
225
In 1896, the first modern Olympic Games was held in Athens, Greece. Twelve events, including walking, running, jumping and throwing, were listed as the main events of the conference. The successful holding of the 1st Olympic Games marked the establishment of a modern track and field sports system. Humans have acquired the ability to walk upright through the alternate support of their feet. As the most stressed tissue organ, the pressure on their feet during the whole process of landing, supporting and leaving the ground transits from heel to forefoot, and finally to big toe. According to some research data, the supporting reaction force of an adult’s foot during walking can reach 1.5 times of his weight, and it can reach 3–4 times of his weight during running. Therefore, whether the force on the foot is appropriate for the rationality of its activities plays an important role. Generally speaking, most of the current research focuses on the analysis of foot movement through gait analysis. The plantar pressure distribution is a reflection of the whole body posture control and foot structure function. As a mainstream foot research method, plantar pressure analysis has been applied in many fields, such as sports biomechanics, rehabilitation medicine, shoemaking and orthopaedic surgery. The plantar pressure analysis can not only enable us to have a deep understanding of the causes of normal walking gait, but also be an important reference in competitive sports, national fitness and other fields, and can also provide a normal baseline for the research of foot pressure of sick feet. Athletes are prone to problems in their plantar pressure performance, such as pain caused by excessive local force, potential sports injury risk caused by excessive foot turnover, etc., because they have been trained with large amount of exercise and high intensity for a long time. At present, most of the research directions at home and abroad are aimed at athletic competition and injury rehabilitation, and few of them analyze the gait characteristics and plantar pressure of track and field athletes at ordinary times. In the domestic literature on gait characteristics and plantar pressure of track and field athletes, their experiments often have certain deficiencies for various reasons, such as the selected research objects cannot accurately represent the research group or only a small sample of a few individuals, the indicators measured by the equipment are not consistent with the international research standards, and there is no in-depth study on the impact of plantar pressure and gait characteristics on sports injuries [2]. Therefore, this study takes track and field athletes as the research object. Through the measurement and analysis of the plantar pressure distribution and gait characteristics, it can more accurately and more carefully evaluate the movement mode of the feet and even lower limbs of track and field athletes, find out the relationship between the pressure, gait and sports injury, and the risk factors of lower limb sports injury, and accordingly propose personalized correction plans, Guide track and field athletes to form correct gait and provide adequate protection for athletes’ feet. In the grand occasion of the development of competitive sports, the competitive level and technical movements of track and field athletes around the world are improving and perfecting day by day. At present, it is believed that in order to achieve good sports results, it is necessary to emphasize the economy of running, increase the stride frequency while maintaining a larger stride length, and maintain a stable body weight during running. Therefore, how to improve the scientificity and rationality of athletes’ running gait is the key to enhance athletes’ competitive level. The existing methods can not accurately
226
Q. Lin and J. Wang
identify the running gait of athletes due to the backward application means, so the research on the recognition method of running gait of track and field athletes based on convolutional neural network is proposed.
2 Research on the Recognition Method of Running Gait of Track and Field Athletes 2.1 Running Gait Image Preprocessing The images of running gait of track and field athletes are obtained through cameras or mobile phones equipped with CMOS devices. Due to the existence of shooting conditions, device defects, recording equipment, image content and other factors, it is inevitable that the image data implementing the recognition method will have different levels of noise interference. The direct visual effects caused by image noise interference usually include: image structure information destruction, boundary pollution, detail loss, image pixel value distribution complexity deepening, etc. These influences will directly cause great trouble to the subsequent running gait feature extraction, increase the complexity of the operation, and then lead to a sharp decline in the recognition accuracy. Choosing an appropriate image preprocessing method for running gait recognition can help improve the data quality, reduce the processing difficulty of subsequent research, and greatly improve the efficiency of running gait recognition. In the running gait image, the image noise mainly comes from the image acquisition or transmission process, and is affected by the performance of the imaging sensor, the environmental conditions in the image acquisition process, the quality of the sensor components and other factors. For example, when using a charge coupled device (CCD) camera to capture a constructed image, the light level and sensor temperature are the main factors that cause the amount of noise in the resulting image. Image data is polluted in transmission, usually due to the interference in the transmission channel, and then noise is introduced to pollute the image. For example, image transmission under the condition of using radio network transmission is vulnerable to noise interference caused by light or other weather factors. There are generally three types of noise classification methods, including additive noise and multiplicative noise, external noise and internal noise, stationary noise and internal noise. In the process of image denoising, noise is mainly distinguished by its corresponding noise characteristics, which are usually divided into two categories: spatial characteristics and frequency characteristics. The parameters that define the spatial characteristics of noise and whether the noise is related to the image are discussed. The image frequency characteristic refers to the frequency content of the noise in the Fourier domain (that is, the frequency relative to the electromagnetic spectrum) [3]. In addition to the spatial periodic noise, the academia usually assumes that the noise is independent of the spatial coordinates, and the noise is not related to the image itself (that is, the pixel value is not related to the value of the noise component). Generally, the common noise models in running gait images are Gaussian noise, Poisson noise, salt and pepper noise, etc. Among them, Gaussian noise is the noise category whose image noise distribution conforms to the Gaussian distribution. Gaussian distribution was first used to describe
Recognition of Running Gait of Track and Field Athletes
227
the probability distribution of random fluctuations in physical processes, named after Karl Friedrich Gauss, a German physicist. Gaussian distribution in image noise is mainly used to describe the distribution of resistance performance change process caused by voltage change. When the image acquisition device undergoes electrical variation and conforms to the Gaussian distribution, it is considered that the image may be polluted by Gaussian noise. The probability density function of Gaussian random variable Z is given by the following formula: P(Z) =
1 √ 2σ
−Z−Z /2σ 2 (1)
In formula (1), P(Z) represents the probability density function expression of Gaussian random variable Z; Z represents the gray value of running gait image; σ represents the standard deviation of the gray value of the running gait image; Z represents the average gray value of the running gait image. Poisson noise is also a kind of noise interference caused by electronic devices, which can be simulated by Poisson model. In electronics, shot noise originates from the discrete nature of charges. The shot noise also exists in the photon counting process of optical devices. The shot noise is related to the particle properties of light. The reason for the existence of shot noise is that light and current are composed of discrete particle beams [4]. The size of shot noise increases according to the square root of the expected number of events, such as the intensity of current or light. However, as the strength of the signal itself increases faster, the relative proportion of shot noise decreases and the signalto-noise ratio increases. Therefore, shot noise is most common at low current or low light intensity. Poisson noise is mainly controlled by Poisson distribution model, and the specific formula is as follows: F(α; λ) =
λα e−λ α
(2)
In formula (2), F(α; λ) represents the Poisson noise distribution model expression; α represents the auxiliary parameters; λ represents the average number of random events per unit time; e represents the Poisson noise distribution coefficient. According to the type of internal noise in the running gait image, the minimum mean square filtering algorithm is selected to remove the noise, improve the signal to noise ratio of the running gait image, and provide some convenience for the subsequent running gait feature extraction. The minimum mean square error filtering algorithm is based on the fact that both image and noise are random variables. The goal is to find an estimate fˆ (x, y) of polluted image f (x, y), and find the optimal filtering template by minimizing the mean square error between them. The error measure formula is: 2 2 ˆ (3) E = H f (x, y) − f (x, y) In formula (3), E 2 represents the error; H {·} represents the parameter expected value.
228
Q. Lin and J. Wang
Based on the calculation results of formula (3), determine the optimal filter template, and the expression is: ⎡ ⎤ 2 |β(x, y)| 1 ⎦ · (4) ζ (x, y) = ε0 ⎣ χ (x,y) β(x, y) |β(x, y)|2 + δ(x,y)[P(Z)+F(α;λ)]
In formula (4), ζ (x, y) represents the optimal filter template; ε0 represents the scale factor with the value range [0, 1]; β(x, y) represents the degradation function; χ (x, y) represents the power spectrum of noise; δ(x, y) represents the power spectrum of the undegraded image. The optimal filter template ζ (x, y) determined by formula (4) is applied to remove the noise of running gait image, the expression: g(x, y) =
f (x, y) + E2 9 ∗ ζ (x, y)
(5)
In formula (5), g(x, y) represents a running gait image after noise removal. The above process completes the pre-processing of the running gait image, that is, removes the noise in the running gait image, improves the overall signal-to-noise ratio and clarity of the image, and lays a solid foundation for the subsequent running gait feature extraction. 2.2 Running Gait Feature Extraction Based on the above preprocessed running gait image g(x, y), the relevant parameters of running gait are defined, and the running gait features are extracted to provide a basis for the subsequent design of convolutional neural network model. In the teaching of track and field sports, the running stage is generally divided into supporting stage and flying stage. The supporting phase includes: landing phase (the moment when the swinging leg touches the ground) to landing phase (the moment when the supporting leg leaves the ground); The flying phase includes: the phase of the support leg leaving the ground to the phase of landing. The parameters involved in the running gait of track and field athletes are as follows: First, time parameters: A gait cycle is just the time from foot following to heel landing on the same leg. The normal gait cycle can be divided into two phases: the supporting phase and the swinging phase. Support phase: from heel landing to toe off the ground, the contact time between the foot and the support surface accounts for about 60% of the gait cycle. Swing phase: from toe to heel landing, the time of foot leaving the supporting surface accounts for about 40% of the gait cycle. Second, spatial parameters: The stride feature of gait is the space feature of foot landing, including step length, stride length, stride width and stride angle.
Recognition of Running Gait of Track and Field Athletes
229
Step length: linear distance from one foot to the other [5]. Step length is related to height. The higher the height, the greater the step length. Step length: the linear distance between the same leg and the heel again. The stride length of a normal person is twice the stride length, about 100–160 cm. Step width: the width between the two foot travel lines. Step angle: the included angle between the line from the middle of the heel to the second toe and the line of travel, generally less than dexterity. Step frequency: refers to the number of steps taken per minute when walking. Third, plantar pressure zone: The plantar partition is shown in Fig. 1.
7 6 8
2
10
5 4
9
1 3
Fig. 1. Schematic diagram of the plantar partition
As shown in Fig. 1, 1 represents the first toe, 2 represents the second to fifth toes, 3 represents the first phalanx, 4 represents the second phalanx, 5 represents the third phalanx, 6 represents the fourth phalanx, 7 represents the fifth phalanx, 8 represents the midfoot, 9 represents the inner heel, and 10 represents the outer heel. It can be seen from the above analysis that there are many parameters related to running gait. If the running gait feature is extracted on this basis, the amount of calculation is too large to be calculated. Therefore, based on the running gait image g(x, y), LBP feature, Hu moment invariant feature and Haar like feature are extracted as the running gait feature. Among them, the LBP feature is the full name of the local binary feature of the image, which is used to describe the local texture of the image. The original LBP operator is defined through the 3 * 3 matrix window. The threshold value is the template center pixel value. When the pixel value around the center pixel is larger, the surrounding point is marked as 1, otherwise it is 0. After 8 comparisons of 3 * 3 size templates, 8 LBP comparison points can be generated, and then an 8-bit binary number is formed by encoding. The binary number is converted to decimal to obtain the LBP code, that is, the LBP value of the center pixel of the window [6]. This value can be used to further describe the regional texture information for image recognition. The formula is: (6) 2p sgn Rp − Ri LBP(xi , yi ) = p=1
In formula (6), LBP(xi , yi ) represents the LBP value corresponding to the central pixel (xi , yi ); p represents a random constant. It should be noted that the value is an
230
Q. Lin and J. Wang
integer; sgn(·) represents function; Rp and Ri represent the pixel values a symbolic corresponding to pixels xp , yp and (xi , yi ). The calculation rules for the symbol function sgn(·) are as follows: 1 Rp − Ri ≥ 0 (7) sgn Rp − Ri = 0 Rp − Ri < 0 The Hu invariant moment is a feature characterizing the shape of an image profile. The definition formula of the a + b-order moment of the 2D stochastic strain volume is as follows: Gab
+∞ +∞ = xa yb g(x, y)dxdy
(8)
−∞ −∞
In formula (8), g(x, y) is a running gait image function and a piecewise bounded function. When g(x, y) changes with translation, rotation or scale, the a + b order matrix produces adjoint changes. a and b represents the feature order of the image. To obtain the invariance features, the central moment is defined as follows: ηab
+∞ +∞ a b x − xˆ y − yˆ g(x, y)dxdy =
(9)
−∞ −∞
In formula (9), ηab represents the central moment of the running gait image; xˆ , yˆ represents the central point of the running gait image. Image scale invariance is obtained by normalization, and on this basis, multiple Hu invariant moments combined are generated, with invariance for scale, translation, and rotation. Haar like feature was first proposed by Papageorigiou et al., which mainly reflects the change of image gray level, and is a feature description operator commonly used for face description in the field of computer vision research. Haar like features can be calculated from templates of different sizes and composition modes, mainly including linear features, edge features, point features (center features) and diagonal features [7]. Generally, Haar like feature is calculated by integral graph, and the formula is as follows: D = R(1) + R(2) − (R(3) + R(4)) (10) In formula (10), D represents an image block for Haar-like features with four vertices 1, 2, 3 and 4, and in R(j) stores the sum of all pixels in the upper right corner of R(j), and j values 1 to 4. The above process completes the extraction of running gait features, mainly including LBP features of running gait image, Hu invariant moment features and Haar-like features, providing data support for the identification of the final running gait.
Recognition of Running Gait of Track and Field Athletes
231
2.3 Convolutional Neural Network Model Design Based on the above extracted running gait features, a convolutional neural network model is designed to provide a tool support for the subsequent realization of running gait recognition. The design of convolutional neural network model includes two parts: building CNN layer by layer and training CNN network. The construction process of the network is as follows: The first layer is the input of the network. The CNN network can independently learn the characteristics of the two-dimensional image. The original image is a depth image, which can be directly used as the input of the network. The convolution layer is convolved by n × n size filters and offset, and characteristic graphs are obtained. All the down sampling layers are obtained by the following processing in turn: summing four pixels per neighborhood, weighting (convolution kernel element), adding offset, and then generating a feature map approximately four times smaller through a sigmoid activation function. According to the identification of running gait requirements, the convolutional neural network model is designed, with the number of network layers of 5, the number of filters is 3 and 6, respectively, and the filter size is 55 = 25. Specifically, the construction of each layer of networks is explained, as shown in Fig. 2.
Fig. 2. Process diagram of convolutional neural network model design
As shown in Fig. 2, C1 is a convolutional layer, and the first layer mainly extracts the low-level features of the image. Three 5 × 5-size filters were convolved with the original image to obtain three feature maps, each with a size of 116 × 92. C1 has 7 × 8 trainable parameters (each filter each has 5 × 5 = 25unit parameter and one bias parameter, a total of 3 filters, a total of (5 × 5 + 1) 3 = 78 trainable parameters). S2 is a down sampling layer. Down sampling reduces the amount of information processing while retaining useful information according to the principle of image local
232
Q. Lin and J. Wang
correlation. Downsampling mainly includes the following methods: average value, maximum value or some linear combination. In this paper, the method of linear combination is used to add four pixels in the neighborhood, multiply them by a trainable parameter, and then add a trainable offset. Therefore, the size of the three feature maps on layer S2 is 58 × 46, 3 × (1 + 1) = 6 training parameters and 12 × 27 × 21 = 6804 connections. The lower sampling layer also needs an activation function. The introduction of the activation function is to improve the nonlinear characteristics of the network and decision function. In layer S2, sigmoid activation function is selected. C3 is also a convolutional layer, convolved with six 5 × 5 filters layer S2 to obtain six feature maps, each of size 54 × 42. Similarly, for the calculation of training parameters in layer C1, layer C3 has (5 × 5 + 1) 6 trainable parameters. S4 is a subsampling layer. The subsampling was performed in the same way as the S2 layer, thus resulting in 27 × 21 size six feature maps. There are N /m training parameters and 12 × 27 × 21 = 6804 connections. Also there is an activation function, this layer of activation function still selects the sigmoid activation function. F5 is a fully connected layer, with each output cell and all the input units connecting the [8, 9]. The last layer also requires an activation function, in which the ReLu function is selected as the activation function. The final resulting feature graph is then arranged as a column vector to obtain the final eigenvector. If the number of network layers is more than 5, the following network layers only use convolution layer and the last full connection layer. This is because a large part of useless information has been removed from the previous layers. If the down sampling process is used again, too much information will be lost. If the network structure is more than 5 layers, the number of filters will increase by an equal number sequence with a common ratio of 2. The above process completes the design of the convolutional neural network model, and introduces its composition in detail, providing a tool support for the follow-up research. 2.4 Development of Running Gait Recognition Procedure for Track and Field Athletes Using the convolutional neural network model designed above as a tool, the program for recognizing the running gait of track and field athletes is formulated, and the calculation formula of the relevant parameters of the convolutional neural network model is determined, so as to obtain accurate results of recognizing the running gait of track and field athletes. The whole process of running gait recognition is as follows: collect sample images and mark them, input training samples of each type of running gait into CNN to train the model, adjust the network until convergence: change the output layer into a softmax classifier, input test images, and identify the accuracy of verification results. The running gait recognition program of track and field athletes is shown in Fig. 3. As shown in Fig. 3, the convolutional neural network model is a key tool for running gait identification in track and field athletes, and its performance is directly related to the accuracy of the identification results. Therefore, the calculation formula for the relevant
Recognition of Running Gait of Track and Field Athletes Training running gait image
Test running gait image
CNN Training
Running gait feature extraction
233
Test running gait characteristics
Gait characteristics of training run
softmax classifier training
softmax classification Model
Running gait recognition results
Fig. 3. Running gait recognition program of track and field athletes
parameters of the convolutional neural network model needs to be determined before the program can be executed. One is the gradient calculation of the convolutional layers. Assuming that each convolutional layer 1 is connected to a subsampling layer 1 + 1, to obtain the corresponding weight update value for each neuron of layer 1, it is necessary to find the sensitivity ϑ of each neuron node of layer 1 first. For this sensitivity ϑ, it is necessary to sum the sensitivity ϑ 1+1 of the nodes in the next layer first, and then multiply it by the weight corresponding to these connections. Multiplied by the derivative value of the activation function q of the input u of the neuron node in the current layer 1, the sensitivity ϑ 1 corresponding to each neural node in the current layer 1 is obtained. However, because of the existence of down sampling, the sensitivity ϑ corresponding to one pixel of the sampling layer corresponds to one pixel (sampling window size) of the output image of the convolution layer. Therefore, each node of an image in layer 1 is connected to only one node of the corresponding image in layer 1 + 1. In order to effectively calculate the sensitivity of layer 1, it is necessary to upsample the sensitivity image corresponding to the downsample layer, so that the sensitivity image p size is consistent with the image size of the convolution layer, and then multiply the partial derivative of the activation value of the image of layer 1 by the sensitivity image obtained from the upsampling of layer 1 + 1 element by element. The weights of the subsampled layer image all take the same value uji , and it is a constant. So only multiplying the results obtained from the previous step by a υji to complete the calculation of layer 1 sensitivity ϑ. The same computation procedure can be repeated for each feature image j in the convolutional layer. Image matching the corresponding subsampling layer yields: (11) ϑji = υji+1 q uji+1 · up ϑji+1 In formula (11), ϑji represents a feature image of the subsampling layer; up(·) represents an upsampling operation.
234
Q. Lin and J. Wang
The gradient of each convolutional layer, the sensitivity, is calculated by formula (11). The second is the selection of activation function. The activation function is introduced to improve the nonlinear characteristics of the network and decision function without affecting the receptive field of the convolution layer. If there is no such layer, the whole network is linear. Activation function is an important part of neural network. Through function transformation of neural network input, appropriate output can be obtained. Generally, activation function injects nonlinear factors into neural network with poor linear expression ability, so that data can be divided under nonlinear conditions, and data can also be sparsely expressed, so that data can be processed more efficiently [10–12]. According to the requirements of running gait recognition, sigmoid activation function and ReLu function are selected. Through the application of activation function, the training speed of convolution neural network is several times faster. Taking a four layer convolutional neural network as an example, comparing the number of iterations required when the conventional activation function reaches 25% of the training error rate, the six iterations of the network using sigmoid activation function and ReLu function make the training error rate reach 25%. The third is the setting of softmax classifier. Softmax regression can be understood as a multi class classifier, which can be used in both supervised and unsupervised machine learning. Softmax is a generalization of the logistic regression model to multiple classification problems, in which the class label can take more than two values. logistic regression, assuming that the set of training samples is 1In 1general x , y , · · · , (xm , ym ) , m is the sample label, and the input feature is xi . Since logistic regression is specific for dichotomy problems, the class marker yi ∈ {0, 1}. The assumption function is as follows: k(x) =
1 1 + exp −ψ T x
(12)
In formula (12), k(x) represents the softmax classification function; ψ represents the classification parameter. Training the parameter ψ enables to minimize the cost function: m 1 i y log2 xi + 1 − yi log2 1 − ψ × xi (13) J (ψ) = − m i=1
In formula (13), J (ψ) represents the cost function of the parameter ψ. For convenience, the symbol ψ is also used to represent all the parameters. When implementing softmax regression, it is convenient to represent ψ with a matrix of k × (n + 1), obtained by listing ψ1 , ψ2 , · · · , ψk by rows, as follows:
Recognition of Running Gait of Track and Field Athletes
⎤ −ψ1T ⎢ −ψ T ⎥ 2 ⎥ ⎢ ψ =⎢ . ⎥ ⎣ .. ⎦
235
⎡
(14)
−ψkT Substitute the convolution layer gradient, activation function and softmax classifier determined above into the established track and field athlete’s running gait recognition program, and execute the program to obtain the track and field athlete’s running gait recognition results, providing assistance for the improvement of track and field athletes’ competitive ability [13–15].
3 Experiment and Result Analysis 3.1 Selection of Experimental Objects In order to verify the application performance of the proposed method, 100 track and field athletes were selected as experimental objects, and images were collected from the continuous monitoring video of track and field athletes during running. The image extraction frame was 10 s. A total of 5000 images were extracted, of which 3000 images were used as the training set, and 2000 images were used as the experimental set. They were randomly divided into 10 experimental groups, as shown in Table 1. Table 1. Experimental Group Setting Table Experimental group number
Average stride/cm
Average step frequency/Step/minute
1
100
180
2
112
175
3
105
164
4
101
150
5
120
145
6
135
168
7
144
120
8
150
114
9
160
152
10
139
150
As shown in Table 1, the average step and average step frequency of the 10 experimental groups are quite different, which meets the requirements of the application performance test of the proposed method.
236
Q. Lin and J. Wang
3.2 Convolution Neural Network Model Training The proposed method designs a convolutional neural network model, which needs to be trained before the experiment to ensure the accurate operation of the convolutional neural network model, so as to obtain more accurate experimental conclusions. The training process of convolutional neural network includes four steps and is divided into two stages: Phase I: Forward propagation process (1) Take a sample from the sample set and input it into the network; (2) Calculate the corresponding actual output. In this stage, the input information is transformed layer by layer and transmitted to the output layer. The calculation process performed by the network is actually the point multiplication of the input and the weight matrix of each layer, plus some deviations, to get the final output result. Phase II: Back propagation process (1) Calculate the difference between the actual output and the expected output; (2) The weight matrix is propagated and adjusted in the direction of minimizing the error. The training process of the network includes forward propagation and back propagation. Forward propagation is mainly about feature extraction and classification calculation. Back propagation is the back feedback of error and the update calculation of weight. After image input, the neurons on all layers should be initialized first. Convolution and sampling are used to extract and map image features. Multiple convolution and sampling processes can be used here. The multi-level extraction process can extract useful information from the image. After feature extraction is completed, the extracted features will be fed back to the full connection layer. The full connection layer contains multiple hidden layers. The result is fed back to the output layer through the transformation and calculation of data information in the hidden layer. The output layer carries out some calculations and gets the test results. Compare the test results with the expected results, and output the classified results if they are consistent. If the test results do not conform to the expected results, the weights and deviations need to be propagated back. From the output layer to the full connection layer and the convolutional sampling layer, it is passed back once. Until each layer gets its own gradient. Then the weight value is updated to start a new round of training process. The training process of convolutional neural network model is shown in Fig. 4. The training of the convolutional neural network model is completed to ensure the reliability of the proposed method.
Recognition of Running Gait of Track and Field Athletes
237
Training set input
Convolution and sampling process Backpropagation
Forward propagation
Full connection layer
Output layer
N Does it meet expectations? Y Output results
Fig. 4. Flow chart of convolutional neural network model training
4 Analysis of the Experimental Results Based on the above set experimental conditions and the trained convolutional neural network model, the running gait identification experiment of track and field athletes was conducted, and the running gait recognition accuracy data of track and field athletes are shown in Table 2. Table 2. Data Table of Track and Field Athletes’ Running Gait Recognition Accuracy Experimental group number
Put forward the method
Minimum limit
1
85%
75%
2
89%
65%
3
90%
60%
4
91%
50%
5
94%
45%
6
89%
62%
7
88%
59%
8
78%
64%
9
80%
70%
10
85%
63%
As shown in the data in Table 2, compared with the given minimum limit value, the recognition accuracy of running gait of track and field athletes obtained by applying the
238
Q. Lin and J. Wang
proposed method is higher, with the maximum value of 94%, which fully proves the feasibility and effectiveness of the proposed method.
5 Conclusion This study introduced convolutional neural network model and proposed a new research on the recognition method of running gait of track and field athletes. The experimental data shows that the proposed method has greatly improved the recognition accuracy of running gait of track and field athletes, which can provide more effective method support for the measurement and improvement of athletic ability of track and field athletes, and also provide some reference and help for related research.
References 1. Qiu, Y., Gao, Z.: Recognition of abnormal gait active image sequences based on low rank decomposition. Comput. Simul. 38(6), 415–418 (2021) 2. Liu, S., Liu, D., Muhammad, K., Ding, W.: Effective template update mechanism in visual tracking with background clutter. Neurocomputing 458, 615–625 (2021) 3. Ferguson, E.L.: Multitask convolutional neural network for acoustic localization of a transiting broadband source using a hydrophone array. J. Acoust. Soc. America 150(1), 248–256 (2021) 4. Dong, S., Jin, Y., Bak, S.J., et al.: Explainable convolutional neural network to investigate the age-related changes in multi-order functional connectivity. Electronics 10(23), 3020 (2021) 5. Liu, S., Zhu, C.: Jamming recognition based on feature fusion and convolutional neural network. J. Beijing Inst. Technol. 31(2), 169–177 (2022) 6. Liu, S., Wang, S., Liu, X., et al.: Fuzzy detection aided real-time and robust visual tracking under complex environments. IEEE Trans. Fuzzy Syst. 29(1), 90–102 (2021) 7. Halász, M., Gerak, J., Bakonyi, P., et al.: Study on the compression effect of clothing on the physiological response of the athlete. Materials 15(1), 169–169 (2022) 8. Zhu, Z., Yao, C.: Application of attitude tracking algorithm for face recognition based on OpenCV in the intelligent door lock. Comput. Commun. 154(5), 390–397 (2020) 9. Lai, X., Rau, P.: Has facial recognition technology been misused? A user perception model of facial recognition scenarios. Comput. Hum. Behav. 124(8), 106894 (2021) 10. Liu, S., et al.: Human memory update strategy: a multi-layer template update mechanism for remote visual monitoring. IEEE Trans. Multimedia 23, 2188–2198 (2021) 11. Luo, W., Ning, B.: High-dynamic dance motion recognition method based on video visual analysis. Sci. Program. 2022, 1–9 (2022) 12. Tha, B., Sk, B., Mt, B., et al.: Comparing subject-to-subject transfer learning methods in surface electromyogram-based motion recognition with shallow and deep classifiers. Neurocomputing 489, 599–612 (2022) 13. Sattar, N.Y., Kausar, Z., Usama, S.A., et al.: FNIRS-based upper limb motion intention recognition using an artificial neural network for transhumeral amputees. Sensors 22(3), 726–733 (2022) 14. Zhang, K., Zhao, D., Liu, W.: Online vehicle trajectory compression algorithm based on motion pattern recognition. IET Intel. Transp. Syst. 16(8), 998–1010 (2022) 15. Muhammad, U., Yu, Z., Komulainen, J.: Self-supervised 2D face presentation attack detection via temporal sequence sampling. Pattern Recogn. Lett. 156(4), 15–22 (2022)
Research on Action Recognition Method of Traditional National Physical Education Based on Deep Convolution Neural Network Liuyu Bai1 , Wenbao Xu1(B) , Zhi Xie1 , and Yanuo Hu2 1 Air Force Engineering University, Xi’an 710051, China
[email protected] 2 Xianyang City Qindu District Teachers Training School, Xianyang 712046, China
Abstract. With the continuous development of machine vision and image processing technology, more and more attention has been paid to human action recognition in physical education teaching. In order to improve the performance of action recognition in traditional P. E. teaching, the method of action recognition based on deep convolution neural network is proposed. The length of elbow joint and shoulder joint was calculated by using the distance between the camera and the action image. According to the range characteristics of sports teaching action, monitoring sports teaching action. Deep convolution neural network was introduced to predict the state variables of PE teaching action, and the coordinate data information of all related nodes was obtained. Based on the theory of Deep Convolution Neural Network, this paper transforms and deals with the action posture of traditional national sports teaching in colleges and universities. Through the probabilistic value of the motion image pixel of the traditional PE teaching in colleges and universities, the motion characteristics of PE teaching are extracted. Through detecting the extreme point of PE teaching action in the scale space, locate the extreme point of PE teaching action range. Using the probability of the range of sports teaching action outside the exercise area, we can identify the traditional sports teaching action in colleges and universities. The experimental results show that the method can successfully identify the traditional national sports teaching behavior. This method has good performance in the precision of motion feature extraction, recognition rate and recognition speed. It perfects the problem of low precision in sports action recognition. Keywords: Deep Convolutional Neural Network · Physical Education · Action Recognition · Feature Extraction · Action Monitoring
1 Introduction With the rapid development of artificial intelligence technology in computer vision, mobile Internet, big data analysis and other fields. Combined with the new characteristics of deep learning and cross-border integration, it has gradually become a new focus of © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 239–253, 2024. https://doi.org/10.1007/978-3-031-50574-4_17
240
L. Bai et al.
international competition. At the same time, artificial intelligence technology is used in computer vision to segment, classify and recognize the objects. And its application in virtual reality and human-computer interaction, especially in traditional national physical education in colleges and universities, has become a hot topic in industry and academia [1]. Through the traditional sports teaching in colleges and universities, the students’ sports training and competition video can be analyzed and evaluated. Through the analysis of the skills of students of sports normative movement, physical training and other aspects of targeted. At the same time, in the sports meeting, the students’ movements and positions are detected, tracked and analyzed, which promotes the improvement of technical level. Basketball is the most basic sports item in the traditional sports teaching in colleges and universities. In basketball movements, the basic movements include dribbling, shooting and lay-up. Among them dribble is the most basic action in basketball, shooting is the key to score in the whole game. The accuracy of the basic movements has a great influence on the score of the match [2]. With the development of basketball sports, the algorithm of human posture estimation and the algorithm of action recognition are combined. It plays a vital role in helping to improve the scoring rate. Human posture estimation is to detect and estimate the position, direction and scale information of the target human body from the image, which needs to be translated into digital form. And output the current human posture movement. However, motion recognition is based on the results of attitude estimation as input object to judge whether a person’s motion is standardized and how to improve the standardization. In the domestic research, He Bingqian et al. [3] proposed the network structure based on batch normalization transform and Goog Le Net network model to solve the problems of complex feature extraction and low recognition rate. The normalization of image classification is applied to the field of motion recognition to improve the training algorithm and realize the network input of video action training samples. The process of normalization was carried out. The method takes RGB image as input of space network and optical flow field as input of time network. Then the final recognition result is obtained by fusing the spatiotemporal network. Experiments on UCF101 and HMDB51 datasets show 93.50% and 68.32% accuracy, respectively. Experimental results show that the improved network architecture has a high recognition accuracy in video human motion recognition. Liu Guoping et al. [4] In order to realize the target of dumbbell action classification recognition, an inertial sensor module is added to dumbbell to collect the motion signals during dumbbell training. After signal normalization, filtering and segmentation based on initial static vector period, 5 kinds of dumbbell action characteristic vectors are extracted. An improved Relief F feature selection algorithm is used to select the optimal feature vector. Support vector machine based on balanced decision tree is used to recognize different dumbbells. Through the laboratory independently developed dumbbell action recognition system for testing. The results show that the system can recognize dumbbells in a single dumbbell action cycle, and the recognition rate can reach 90%. Provides the more individualized dumbbell movement instruction to lay the foundation. In foreign research, Ding W et al. [5] proposed a square grid based on skeleton. Data used to transform dynamic skeletons into three-dimensional mesh structures so that CNN can be applied to these data. In order to enhance the ability of depth feature to
Research on Action Recognition Method of Traditional National Physical Education
241
capture the correlation between 3D mesh data, a joint based square mesh and a rigid body based square mesh sequence are used to construct a dual flow 3D CNN. Three data sets, NTU RGB D, Kinetics Motion and SBU Kinect interactive dataset, are used to verify the effectiveness of the model in motion recognition. Experimental results show the effectiveness of the proposed method and its superiority over the existing methods. Gao P et al. [6] In order to improve the accuracy of small-scale human motion recognition and the computational efficiency of large-scale data sets in video, a multi-dimensional data model based on deep learning framework for video image motion recognition and motion capture is proposed. Firstly, the moving foreground of the target is extracted by Gaussian mixture model and the human body is recognized by gradient histogram. In the second layer, according to the integration of global coding algorithm and convolution neural network, dense trajectory feature and deep learning feature are fused. In the deep learning feature, the fusion of the video feature and the video RGB feature is the feature of the deep learning. Simulation results based on large scale real dataset and small scale gesture dataset show that the algorithm has high recognition accuracy for large scale dataset and small scale gesture. In addition, the average classification accuracy is 85.79% when the human behavior dataset is used in computer vision and learning laboratory. The algorithm can run at about 20 frames per second. At home, with the rapid improvement of computing technology, action recognition in traditional sports teaching has gradually become a new subject in the field of computer simulation. It has the extremely vital research value in the university nationality traditional sports teaching domain. At present, many research institutions in developed countries are carrying out relevant research and development. It is mainly committed to the development of boxing, table tennis, skiing and other sports movement amplitude larger. Moreover the university national tradition sports teaching movement recognition, may bring the enormous convenience to student’s sports training. For example, students can observe the execution details of standard movements from different angles in the process of sports training. At the same time, the differences between actual athletes and standard movements were compared to better assess the effect of student completion. However, in the actual teaching of traditional sports in colleges and universities in the process of movement recognition, one of the key issues is how to effectively identify the key points of sports movement. The optimal tracking method of 3D visual motion amplitude in sports is used to obtain the time series model of motion amplitude, and the probability of 3D visual motion amplitude is given. Based on this, the action recognition in the teaching of traditional national sports in colleges and universities is the fundamental way to solve the above problems, which arouses the attention of many experts and scholars. Based on the above research background, in order to improve the accuracy of sports action feature recognition, this paper designs an action recognition method for traditional physical education teaching in traditional colleges and universities by applying deep convolutional neural network. According to the length and distance of the elbow joint and the shoulder joint, combined with the characteristics of physical education teaching actions, the teaching action monitoring is carried out. Predicting Action State Variables Using Deep Convolutional Neural Networks. Actions in the physical education teaching
242
L. Bai et al.
process are identified based on the acquired joint coordinate data. Thereby promoting the development of computer vision recognition sports.
2 Design of Action Recognition Method in National Traditional Physical Education Teaching in Colleges and Universities 2.1 Monitoring Physical Education Actions Before recognizing the teaching action of college traditional national sports, we should monitor the teaching action of college traditional national sports to obtain the process of the teaching action. According to the two-dimensional data of students’ motion joint characteristic points in traditional P. E teaching, the plane distance between camera and P. E teaching action image is obtained. The calculation formula is: 2 2 2 ψ ×f∗ Xi − Xj + Yi − Yj + Zi − Zj (1) uij = dis · X ∗ ψ represents the scale factor. dis indicates the distance between the center of the motion image and the center of the camera. X ∗ stands for camera coordinates. (Xi , Yi , Zi ) represents the coordinates of the action image for physical education. Xj , Yj , Zj represents the coordinates of the action image after the transformation. f ∗ represents the focal length of the camera. The formula is: d0 f∗ = ∗ Xi − Xj + Yi − Yj (2) S × dij Here, d0 represents the camera lensdiameter. dij indicates the spacing between monitor ing points. Xi − Xj and Yi − Yj represent the image coordinates of the monitoring point, and S ∗ represents the area of the camera lens section. Suppose λmax represents the maximum threshold of the monitoring point of physical education. λmin represents the minimum threshold of movement monitoring points in physical education. The following formula can be used for binary processing of physical education teaching action, expressed as: λ0 =
(λmax − λmin ) ∗ N + (y) N∗
(3)
Among them, N represents the limit threshold of monitoring points in physical education. N ∗ represents the number of sports teaching actions. (y) stands for Monitoring Point of Physical Education. When monitoring the movement of PE teaching, the result of binarization should be considered. Use the following formula to calculate the length between the elbow and shoulder. The formula is: (4) L∗ = f (X1 − X2 ) + (Y1 − Y2 ) + (Z1 − Z2 ) (X1 , Y1 , Z1 ) represents the elbow coordinates of the student’s motion. (X2 , Y2 , Z2 ) indicates the coordinates of the shoulder joint when the student is exercising.
Research on Action Recognition Method of Traditional National Physical Education
243
Suppose k represents the range of action in physical education. If the movement range Jn of PE teaching is detected in the range of display, the predictive error p , v , a of the state variable of movement range can be calculated. Predicts the range of physical education teaching action, namely: k p , v , a =
∂(p) · Jn {Rk } p , v , a
(5)
Among them, {Rk } represents the three-dimensional movement data of the kmovement. ∂(p) stands for Candidate Monitoring Points for Physical Education Teaching Movements. After obtaining the display scope of the physical education teaching movement, the range characteristics of the physical education teaching movement are extracted in the area, which are expressed as follows: F(i, j) =
n
i × k
(6)
i=1
Among them, k represents the range of movement of physical education production area. If the student sports discrete space is expressed as RGB. Then we can use the following formula to classify the color in the sports teaching action image, namely: ei (x, y) (7) P(y/x) = exp RGB · h(e) Among them, ei (x, y) represents the color distribution of the students’ skin at the current time. h(e) represents the color distribution function of action images in physical education. According to the result of color classification of sports teaching action image, monitoring sports teaching action, expressed as: σ (x, y) =
G ∗ · P( yx ) G0 (x, y)
(8)
Among them, G ∗ means the observation value of the movement characteristics of physical education. G0 (x, y) indicates the normalized observation error between movement features when the movement parameter of PE teaching is x. The length of elbow joint and shoulder joint was calculated by using the distance between the camera and the action image. Through the binarization of PE teaching action, the range of PE teaching action is obtained. According to the range characteristics of sports teaching action, monitoring sports teaching action. 2.2 Constructing the Action Model of Physical Education Teaching According to the monitored physical education actions, the data of physical education actions are collected, which are expressed as: ˆ k+1 =
1 k φ
(9)
244
L. Bai et al.
In the above formula, φ is the structural parameter of physical education teaching action. k is the covariance of physical education action in frame k. Following the acquisition of movement data for physical education, a deep convolution neural network [7] was introduced and expressed as follows: ⎛ ⎞ j=1 xi τhij + ζj ⎠ (10) Bj = ky ⎝ i=1
In the equation, Bj represents the output state vector of the hidden layer of the deep convolution neural network. xi represents the input state vector of the deep convolution neural network. τhij represents the connection value from the input layer of the deep convolution neural network to the hidden layer. ζj represents the threshold of hidden layer neurons in deep convolution neural networks. ky stands for the excitation function. Output state vector Zk from Bj : ⎛ ⎞ j=J Zk = ky ⎝ Bj τojk + φk ⎠ (11) j=1
In the equation, τojk represents the connection weights from the input layer to the output layer of the deep convolution neural network. φk represents the threshold value of the output layer of the deep convolution neural network. Using the structure of deep convolution neural network, the state variables of physical education action are predicted and expressed as follows: Zk =
Vk gk−1 + ς ∗ Mk J
(12)
In formula, Zk is the dynamic information of PE teaching action. gk−1 is the dynamic vector of physical education. J is the transformation matrix. ς ∗ stands for transfer matrix. Mk represents the identification factor. Vk The error value predicted for the state variable. It is assumed that all the gray values of the action image satisfy the expectation value of μ and the variance of the gray distribution is ε∗ normal distribution. The Gaussian distribution of motion image pixels is a single variable. Then in the action image frame of PE teaching, the target and background probability models composed of all pixels can be expressed as follows:
X −μ 1 (13) p(X |si ) = √ exp ε∗ 2π ε∗ p(X |si ) is a univariate normal distribution. si is the eigenvalue of the gray information of the action image in physical education. In the action image frame of PE teaching, all pixels are used to form the target and background probability models with normal distribution. Updating the mean and variance of the target and background of the action image in physical education [8], the formula can be expressed as follows: μ1k+1 =
k −1 1 1 1 μk + μk k k
(14)
Research on Action Recognition Method of Traditional National Physical Education
245
k −1 2 1 2 εk + εk (15) k k According to the actions in the traditional P. E teaching in colleges and universities, the action characteristics of P. E teaching are calculated. The formula is as follows: 2 εk+1 =
μk+1 = (1 − ϑ)μ∗k + ϑμk
(16)
εk+1 = (1 − ϑ)εk∗ + ϑεk
(17)
Among them, ϑ is the covariance of movement characteristics in physical education. μk is the k frame PE teaching action image. μ∗k represents the movement characteristics of physical education. εk represents the characteristic vector of physical education. εk∗ indicates the range characteristics of PE teaching movements. According to the calculation results of the movement characteristics of physical education, the coordinate data information of all related nodes of physical education is obtained. Build a physical education action model, expressed as: Zi (k + 1) =
1 (k + 1) ϑ
(18)
Among them, (k + 1) represents the covariance matrix of movement recognition in physical education. The state variables of physical education action are predicted by collecting physical education action and introducing deep convolution neural network. Using the target and background probability models composed of all pixels, the coordinate data information of all related nodes in physical education action is obtained. Constructs the sports teaching movement model. 2.3 Extracting Movement Characteristics of Physical Education Based on Deep Convolution Neural Network This paper makes use of the theory of deep convolution neural network to extract the action features in the traditional national physical education teaching. The extracted action feature points are classified and processed, combined with the convolution layer of deep convolution neural network [9]. This paper transforms the posture of traditional sports teaching in colleges and universities as follows: U(i,j) (g, h) = |V (i, j) × W (g, h)|
(19)
In the above formula, U(i,j) (g, h) represents the filtered result after processing the action image of traditional national physical education teaching in colleges and universities. V (i, j) represents the action image of traditional national physical education in colleges and universities. W (g, h) stands for filter combination result. In the process of extracting the movement features of physical education, the depth convolution neural network is used to calculate the characteristic parameters of each movement posture of students in physical education. The formula is as follows: Zij − min Zij (20) Zij = ∗
246
L. Bai et al.
Among them, ∗ represents the movement image of physical education. min Zij represents the characteristic parameters of each student’s movements in the traditional sports teaching in colleges and universities. Using the characteristic parameters of each action posture of students in physical education. This paper defines the m + n-rank matrix of the teaching action image I (x, y) in the traditional national sports teaching in colleges and universities, which is expressed as: I (x, y) (x − x)m (y − y)n (21) χmn = x
y
Among them, (x, y) represents the center coordinate of the action image of college traditional national physical education. According to the order matrix of the teaching action image, m + n acquires the pixel information of the teaching action image of the traditional national physical education. The pixels of the resulting image are: f (dB ) =
γa 8
f (dF ) = 1 − f (dB )
(22) (23)
Among them, f (dB ) represents the characteristic distribution function of the pixel value HB of the action image of the traditional national physical education. γa represents the pixel information of the action image in physical education. f (dF ) represents the characteristic distribution function of the pixel value HF in the action image of the traditional P. E. When f (dB ) ≥ f (dF ), the motion image features of PE teaching are regarded as foreground pixels. When f (dB ) < f (dF ), the action image of PE teaching is used as background pixel. The characteristic point (i, j) in the coordinate system of the action image of the traditional P. E teaching is obtained by iterative processing [10]. The pixel probability of the feature point in time t is: t (24) fs aijt |αij,s f aijt = s=g,h
In the above formula, aijt represents the pixel value of the action image of traditional t national physical education teaching in colleges and universities at time t. fs aijt |αij,s represents the actual pixel probability of the action image of national traditional physical education teaching in colleges and universities at any time t. At time t, the mixed model of the action feature pixel (i, j) in the national traditional physical education teaching in colleges and universities is expressed as: 1 t t (25) η aij , δij,k , Rtij,k fb aijt = t ij,k k∈M
In the above formula, k represents the number of recognition models of traditional t represents the weight of characteristic national physical education teaching images. ij,k
Research on Action Recognition Method of Traditional National Physical Education
247
t vectors of traditional sports teaching in universities. δij,k represents the characteristic t vector value of the traditional sports teaching action. Rij,k represents the covariance matrix of the movement characteristics of traditional national physical education in colleges and universities. Through the weight analysis of the mixed model of action feature pixel (i, j), the fitness value of the action image of traditional P. E. Looking for the suitable movement characteristic distribution model in the university national traditional sports teaching. Then in any time t, the probability value of action pixel (i, j) of traditional sports teaching in colleges and universities is expressed as follows:
1 1 (26) Q(y, e, κ) = √ exp − (y − e)T 2 |κ|
Among them, y represents the characteristic vector of the coordinates (i, j) of the national traditional physical education teaching. e represents the probability value of the coordinates. κ represents the vector matrix of teaching actions. After obtaining the probability value of pixel (i, j), the motion image of traditional P. E teaching is compared with the standard motion image. The characteristics of traditional sports teaching in colleges and universities can be expressed as follows: fu (y) = h
n
Q(y, e, κ)δ[ε(xi ) − ξ ]
(27)
i=1
Among them, h is the normalized processing result of action image of traditional national physical education in colleges and universities. ξ is the characteristic value of the teaching action image. ε(xi ) represents the central pixel of the action image of the traditional national physical education. Based on the theory of Deep Convolution Neural Network, this paper transforms and deals with the action posture of traditional national sports teaching in colleges and universities. According to the characteristic parameters of each action posture in physical education, the pixel information of action image in traditional physical education is obtained. Through the probabilistic value of the motion image pixel of the traditional PE teaching in colleges and universities, the motion characteristics of PE teaching are extracted. 2.4 Designing Recognition Algorithm of Physical Education Teaching According to the movement characteristics of PE teaching, the extremum of the movement in the scale space is detected, that is: G(x, y, ϕ) − (x, y) (28) W (x, y, ϕ) = I (x, y) L(x, y, ϕ) Among them, I (x, y) represents the initial coordinates of the action image in physical education. L(x, y, ϕ) represents the convolution of sports teaching action. G(x, y, ϕ) represents the variable Gaussian function of physical education movement in the scale
248
L. Bai et al.
space. (x, y) represents the spatial coordinates of physical education. ϕ represents the scale coordinates of physical education. In the scale space, according to the extreme point of the physical education teaching action, the extreme point of the range of the physical education teaching action can be located and expressed as: di · C(w) × dmin Da,b × κ ∗
μ ≤ (κ ∗ ) =
(29)
Da,b represents the Euclidean distance between a and b, the characteristic point of movement amplitude. κ ∗ represents the distance threshold between the range features of physical education movement. di represents the subproximity of the range of motion in physical education. dmin represents the closest distance to the range of movement in physical education. C(w) represents the scale of movement in physical education. After locating the extreme point of the range of movement of PE teaching, the probability of the range of movement of PE teaching outside the display area is calculated. The formula is as follows: vx , yy · χ (P) × (vx ,yy ) (30) E x , y = , x
y
Among them, x and y represent the registration area of physical education. vx , yy represents the speed of movement in physical education. χ (P) represents the number of iterations of the PE teaching action. (vx ,yy ) represents the exact position of the physical education movement. According to the above process, identify the traditional national sports teaching actions in colleges and universities, namely: α × (vx ,yy ) σ x , y = g(t)
(31)
Among them, g(t) represents the time series of the action state of physical education. α represents the weighted average processing weight of movement recognition in physical education. To sum up, through the detection of PE teaching action in the scale space of the extreme point, positioning to the extreme point of PE teaching action range. Using the probability of the range of sports teaching action outside the exercise area, we can identify the traditional sports teaching action in colleges and universities.
3 Experimental Analysis 3.1 Experimental Data Set This article selects the UTD-MHAD data set and the MSR Daily Activity 3D data set as the experimental data set. The UTD-MHAD data set contains depth information, skeletal joint position data, RGB video sequence and inertia data. The data set includes 27 different actions, such as right-arm sliding to the left, right-arm sliding to the right,
Research on Action Recognition Method of Traditional National Physical Education
249
right-hand waving, two-handed forehand racket, right-arm throwing, cross-arm in chest, basketball shooting, right-hand drawing x, right-hand drawing circle (clockwise), righthand drawing circle (counterclockwise), drawing triangle, bowling (right hand), foreboxing, right-hand baseball swing, tennis right-hand forehand swing, arm curling (both arms), tennis service, two-handed push, right-hand knocking door, right-hand grasping an object, right-hand picking up and throwing, jogging in place, walking in place, sitting, standing, forward sprint (left foot forward), squatting (two arms extended). Each action was performed four times by eight people, of which three corrupted sequences were removed, and a total of 861 action sequences were performed, making the dataset challenging because 27 actions were performed along the line of sight (perpendicular to the video plane). The MSR Daily Activity 3D dataset contains RGB video, depth information, and skeletal joint position data. The dataset contains 16 types of actions: drinking, eating, reading, talking on a cell phone, writing, using a laptop, using a vacuum cleaner, cheering, sitting still, throwing paper, playing games, lying on the sofa, walking, playing the guitar, sitting up, sitting down. Each type of action was performed by 10 people, each person carried out two activities, one standing mode, one sitting mode, a total of 960 documents. So strictly speaking, there are 17 action categories for this dataset. Because meditation in the execution of the two types of action, meditation and standing. The dataset was photographed in a real environment with background objects, and the subject’s distance from the camera was not fixed. 3.2 Position Motion Recognition Sensor Selects 8 sensors to monitor the university nationality traditional sports teaching movement. Eight nano CMOS image sensors are attached to eight parts of the body, as shown in Fig. 1.
Fig. 1. Sensor placement
250
L. Bai et al.
3.3 Result Analysis In order to highlight the performance of this method, the action recognition method based on neural network and the action recognition method based on improved ReliefF algorithm are compared. Get the following results. Recognition of Physical Education Get the standing long jump in the UTD-MHAD dataset and the running in the MSR Daily Activity 3D dataset. According to the sensor arrangement of 2.2, the sensor is tied to the human body to recognize the action of different positions. The results are shown in Figs. 2 and 3.
Action recognition method based on improved ReliefF algorithm
Text recognition method
Action Recognition Method Based on Neural Network
Fig. 2. Action recognition results on the UTD-MHAD dataset
1
1
4 2
3
2
3
6 5
6
8
Text recognition method
4 2
3
5
7
1
4
6 5
8 7
Action recognition method based on improved ReliefF algorithm
8 7
Action Recognition Method Based on Neural Network
Fig. 3. Action recognition results on the MSR Daily Activity 3D dataset
In these results, the green circle represents success and the red circle represents failure. The results in Figs. 2 and 3 show that only the method in this paper can recognize the physical education actions of all monitoring points in two data sets. Action recognition methods based on neural network and improved ReliefF algorithm have two or three cases of failure. Because the method in this paper can predict the state variables of sports teaching actions through deep neural networks. Thus, the coordinate data information of all relevant nodes is obtained. Based on these coordinate data, physical education teaching actions can be accurately identified.
Research on Action Recognition Method of Traditional National Physical Education
251
Performance Contrast In the MSR Daily Activity 3D dataset, a group of running movements were selected as experimental subjects. According to the sensor arrangement in Fig. 1, the performance of 8 positions is monitored. The method is compared with other two methods. The accuracy, recognition rate and recognition speed of movement feature extraction in PE teaching were tested, and the results were as follows. 100 90
Accuracy of motion feature extraction/%
80 70 60 50 40 Action recognition method based on improved ReliefF algorithm
30
Text recognition method 20 Action Recognition Method Based on Neural Network 10 0 1
2
3
4
5
6
7
8
Monitoring point No.
Fig. 4. The accuracy of action feature extraction in physical education
From the results in Fig. 4, it can be seen that the accuracy of motion feature extraction can be controlled above 90% before the method in this paper recognizes sports. The feature extraction accuracy of the action recognition method based on neural network and improved ReliefF algorithm is lower than 80%. According to the results in Fig. 5, it is compared with the neural network-based action recognition method and the improved ReliefF algorithm. The action recognition rate of the two methods compared is lower than 71%, while the recognition rate of the method in this paper is higher than 90%, which can provide higher technical support for the analysis of physical education teaching. The result of Fig. 6 shows that the recognition speed can be raised to more than 180 frames per second. The higher recognition speed ensures the quality of action images in physical education. In summary, the method in this paper has certain advantages in the accuracy of motion feature extraction, motion recognition rate and motion speed recognition. Because the method in this paper monitors the physical education teaching action according to the joint distance and the range characteristics of the physical education teaching action. The motion characteristics of physical education teaching are extracted by taking the probability value of the moving image pixels in traditional physical education teaching in colleges and universities. The proposed method has better performance in recognizing actions.
252
L. Bai et al. 100 90 80
Action recognition rate/%
70 60 50 40 Action recognition method based on improved ReliefF algorithm
30
Text recognition method 20 Action Recognition Method Based on Neural Network 10 0 1
2
3
4
5
6
7
8
Monitoring point No.
Fig. 5. Physical education action recognition rate 200 180
Recognition speed of human motion Frames
160 140 120 100 80 Action recognition method based on improved ReliefF algorithm
60
Text recognition method 40 Action Recognition Method Based on Neural Network 20 0 1
2
3
4
5
6
7
8
Monitoring point No.
Fig. 6. Speed of movement recognition in physical education
4 Conclusion In this paper, the deep convolution neural network is applied to the recognition of the traditional national physical education in colleges and universities. The performance of this method is better in recognizing the teaching action of traditional national physical education. But there are still a lot of deficiencies in this study, in the future research, we hope to be able to analyze the joints in advance. According to the motion direction of the joint, identify the action.
Research on Action Recognition Method of Traditional National Physical Education
253
References 1. Basak, H., Kundu, R., Singh, P.K., et al.: A union of deep learning and swarm-based optimization for 3D human action recognition. Sci. Rep. 12(1), 5494 (2022) 2. Ma, C., Fan, J., Yao, J., et al.: NPU RGB+ D dataset and a feature-enhanced LSTM-DGCN method for action recognition of basketball players. Appl. Sci. 11(10), 4426 (2021) 3. He, B., Wei, W., Zhang, B., et al.: Improved deep convolutional neural network for human action recognition. Appl. Res. Comput. 36(10), 3107–3111 (2019) 4. Liu, G., Wang, N., Zhou, Y., et al.: Dumbbell motion recognition based on improved ReliefF algorithm. Sci. Technol. Eng. 19(32), 219–224 (2019) 5. Ding, W., Ding, C., Li, G., Liu, K.: Skeleton-based square grid for human action recognition with 3D convolutional neural network. IEEE Access 9, 54078–54089 (2021) 6. Liu, W.: Simulation of human body local feature points recognition based on machine learning. Comput. Simul. 38(6), 387–390+395 (2021) 7. Tsai, M.F., Chen, C.H.: Spatial temporal variation graph convolutional networks (STV-GCN) for skeleton-based emotional action recognition. IEEE Access 9, 13870–13877 (2021) 8. Yuan, Y., Yu, B., Wang, W., et al.: Multi-filter dynamic graph convolutional networks for skeleton-based action recognition. Procedia Comput. Sci. 183, 572–578 (2021) 9. Xie, J., Xin, W., Liu, R., Sheng, L., Liu, X., Gao, X., Miao, Q.: Cross-channel graph convolutional networks for skeleton-based action recognition. IEEE Access 9, 9055–9065 (2021) 10. Liu, S., Liu, D., Muhammad, K., Ding, W.: Effective template update mechanism in visual tracking with background clutter. Neurocomputing 458, 615–625 (2021)
Personalized Recommendation Method for the Video Teaching Resources of Folk Sports Shehuo Based on Mobile Learning Ying Cui1(B) and Yanuo Hu2 1 Sports Department, Shaanxi University of Chinese Medicine, Xianyang 712046, China
[email protected] 2 Xianyang City Qindu District Teachers Training School, Xianyang 712046, China
Abstract. Xunxian Shehuo is a kind of sports project which gathers many kinds of folk customs, and it is the key to develop the national fitness strategy. Folk sports social fire video teaching resources are so large that it is difficult for learners to find the content they are interested in from a lot of information. Based on mobile learning theory, this paper constructs a learner model by analyzing learner characteristics, collecting learner data and representing learner characteristics. Weighted the learner behavior, obtained the characteristics of learner interest preferences, and calculated the similarity between learner interest preferences and teaching resources. Through collaborative filtering recommendation algorithm to obtain the best teaching resources personalized recommendation results. The experimental results show that the maximum recall rate and the maximum accuracy rate are 96% and 98%, which fully proves the effectiveness of the proposed method. Keywords: Mobile Learning · Folk Sports · Social Fire Video · Teaching Resources · Personalized Recommendation
1 Introduction The Xunxian Shehuo has a long history and profound cultural background, which fuses excellent folk sports items in different periods. Nowadays, Xunxian Shehuo are numerous and in different forms, and at the same time meet the needs of different age groups for physical exercise. In order to promote the orderly implementation of the national fitness strategy, improve people’s living standards and physical fitness.Folk sports is an important part of national fitness, and social fire activities is an indispensable part of folk folk sports to show style [1]. At the same time, it is also proposed to “vigorously develop sports popular with the masses, encourage the development of sports with characteristics suitable for different groups and regions, and support the promotion of Taijiquan, fitness qigong and other ethnic and folk traditional sports”. At present, the main activities of Xunxian Shehuo County include lion dance, dragon dance, Danhua basket dance, Yangko dance, dry boat dance, stilt dance, bamboo dance, dragon lantern dance, pavilion dance, waist drum dance, drum dance, etc. Making use of © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 254–267, 2024. https://doi.org/10.1007/978-3-031-50574-4_18
Personalized Recommendation Method for the Video Teaching Resources
255
these contents of folk physical exercise can not only effectively promote the implementation of national fitness strategy in the folk. Moreover, social activities can better adapt to the new situation of the improvement of living standards and rapid economic development after integrating entertainment, fitness, leisure and tourism. Extensive development of folk social fire sports activities, on the one hand, is the reality of the majority of people’s pursuit of folk beliefs, on the other hand, the desire to achieve high-quality life of the real exposure. Compared with Western sports projects, Xunxian Shehuo has its unique advantages because of its distinctive regional features, broad public base and long history. In the development of the new era, these advantages can be fully utilized to serve the general public. With the wide application of the Internet and the rapid development of network education, network teaching resources increase exponentially, especially folk sports social video teaching resources. The popularization of network provides every learner with the ability to obtain the information of teaching resources anytime, anywhere. People pay more and more attention to the co-construction and sharing of digital educational resources, and the demand for high-quality educational resources is increasing. The country also lists the construction of digital education resources as the focus of education information development. At the same time, it puts forward the idea of “bringing modern information technology into full play to promote the sharing of high-quality teaching resources”. In recent years, governments at all levels, together with related units, have invested a lot of resources in the construction of educational resources for folk sports. At present, the construction of digital education resources platform has become an important part of education information and a hot practice. It incorporates modern information technology and is characterized by digitalization, multimedia, networking and intelligence [2]. Digitization transforms complex educational information into measurable numbers or data. Multimedia provides more forms for information exchange in teaching process. Network makes educational resources can be shared, and educational activities are not limited by time and space. Intellectualization makes the teaching behavior humanized. Through the use of digital education resources platform, in addition to break through the traditional way of education in time and space constraints, to provide flexible learning methods. It also provides learners with rich and high-quality educational resources and creates more learning opportunities and better learning environment. However, more and more convenient access to information, but also a series of problems. Every day a large number of pictures, videos, text and other teaching resources have been published on the network. With the rapid accumulation of these information, it is difficult for learners to find their own interesting content from a lot of information, which seriously reduces learning efficiency and resource utilization. In order to overcome the problem of “information overload” caused by mass educational resources, this paper puts forward the personalized recommendation method of folk sports social fire teaching resources based on mobile learning. Based on the mobile learning theory, this paper collects learner characteristic data and characterizes learner characteristics, and builds a learner model. According to the characteristics of learners’ interest preferences, the similarity between learners’ interest preferences and resources is calculated. The
256
Y. Cui and Y. Hu
collaborative filtering recommendation algorithm is used to obtain the best personalized recommendation results of teaching resources.
2 Research on Personalized Recommendation Method of Video Teaching Resources in Folklore Sports Club 2.1 Construction of Teaching Resources for Mobile Learning There is no definitive consensus on the definition of mobile learning. Experts and scholars in the field have given different definitions from different perspectives. Through in-depth analysis, we can see that “learners learning anytime, anywhere” is considered as the core characteristics of mobile learning. According to several scholars, mobile learning is defined as the use of mobile terminal devices by learners. With the support of mobile communication technology, learning mode of anytime and anywhere is an extension of digital learning. The position of mobile learning in the learning style is shown in Fig. 1.
Mobile Learning
Face to face learning
Network learning
Digital learning
study
remote learning
Distance learning by correspondence
Fig. 1. Schematic diagram of the position of mobile learning in learning methods
As shown in Fig. 1, mobile learning is a new learning model for distance learning. It is based on the development of digital learning, digital learning is an expansion of “distance education” and “digital learning” understanding can help us better grasp the mobile learning. The realization of mobile learning mode is based on intelligent mobile terminal, mobile communication technology and Internet technology. The development of software and hardware constitutes the mobile learning environment. It is also an important factor to promote the development of mobile learning. Mobile learning has the characteristics of digital learning, its core features, but also its unique and unique characteristics are: its learning environment is mobile, learning is very flexible, can help learners learn anytime, anywhere. At the same time, because the intelligent mobile terminal is private to each learner, it can realize personalized learning more easily. To grasp the definition and connotation of mobile learning is an important theoretical basis for building high-quality mobile learning resources.
Personalized Recommendation Method for the Video Teaching Resources
257
Mobile learning is based on digital learning, effectively combined with mobile technology, bring learners a new feeling of learning anytime, anywhere. Mobile learning is widely recognized as an indispensable learning model for future learning, with the following main features: (a) Mobility Mobile learning tools are intelligent mobile terminals, which make the learning activities of learners not limited to a fixed context, learning activities can occur at any time, any place. Learners can learn anywhere and anytime through mobile devices, as well as access to information on the Internet through wireless networks. (b) Personalise Learners’ foundation, motivation and style are different. Teaching students in accordance with their aptitude is a concept pursued by educationists in past dynasties. Traditional education has its congenital deficiency in this regard. However, the differences between the private nature of mobile terminals and learners’ personalities doomed mobile learning to have personalized characteristics. Mobile learning is not just about enabling learners to learn anywhere, anytime. It can also help learners to customize their learning content, pace and progress according to their interests, characteristics and needs. (c) Situational Learning is not only the passive acceptance of knowledge, but also the internalization of knowledge, that is, the construction of new knowledge based on the original knowledge. According to the Situational Cognition Learning Theory, only in meaningful practical situations can learning be a real learning, can we really promote the construction of knowledge and help learners to master knowledge. In the process of mobile learning, learners may be in any practical situation, they can learn knowledge and understand the essence of knowledge according to the problems in real life. (d) Collaborative In the process of distance learning, emotional education in the process of learning has always been a matter of concern. In mobile learning, learners can share resources and communicate efficiently based on various channels of mobile network, and can also interact face-to-face. Secondly, the feedback and evaluation between learners and teachers are flexible and diverse. The learners’ learning effect on the knowledge content and the affective factors in the learning process can be solved in time. Based on the description of the above content, based on mobile learning theory to build folk sports social video teaching resources. It needs to meet the usability of strong, anytime and anywhere learning needs, to enhance learner autonomy and other principles. The structure of the construction model of mobile learning teaching resources is shown in Fig. 2. According to the process shown in Fig. 2, the construction of mobile learning teaching resources is completed. It lays a solid foundation for the realization of personalized recommendation of subsequent teaching resources.
Y. Cui and Y. Hu Pre-phase Analyses Intelligent mobile terminal analysis Learning needs analysis Prototype design of teaching resource content Overall architecture design Interface design Navigation design Content presentation design Overall design and implementation of teaching resources Preparation of teaching resources Development and production of teaching resources
Feedback and modification
Mobile learning teaching resource model
258
Testing and evaluation of teaching resources Implementation of teaching resources Evaluation of teaching resources
Fig. 2. Structure diagram of mobile learning teaching resources construction mode
2.2 Learner Model Building Learner model is the basis of personalized recommendation system of folk sports social fire video teaching resources, and provides the essential basis for personalized service demand. Through a more in-depth study of learners to select folk sports social fire video teaching resources of various factors. It can better capture the personalized needs of learners with different learning goals and preferences for teaching resources [5]. When the learners choose the video teaching resources of folk sports in network environment, they are influenced by the time, place, platform, teaching resources and client. The internal factors such as cognitive ability, knowledge mastery and interest preference play a decisive role. Therefore, this research mainly embarks from learner’s own characteristic, completes the learner model the construction. The process of building the model includes the analysis of learner characteristics, the collection of learner data and the representation of learner characteristics. The model provides a basis for personalized recommendation of teaching resources. Learner Characteristics Analysis The learner model in personalized recommendation of teaching resources is studied. It is found that there are some problems in the present learner modeling, such as incomplete description of learner characteristics and preference information. This research aims at the above questions, the union learner actual application situation, according to the study style theory. Part of the existing learner model specification is expanded and re-classified. Four characteristics, including basic information, learning style, cognitive level and interest preference, are used to describe the learner model. As static data, the basic information is invariable and common, and can not express the degree of learners’ personalized characteristics. Therefore, this study will not focus on data analysis. Through describing the three characteristics of the learner model, a multi-level personalized learner model is designed. The model consists of four layers: data acquisition layer, data analysis layer and presentation layer. The learner model building process is shown in Fig. 3.
Personalized Recommendation Method for the Video Teaching Resources
259
Data acquisition layer Learning Style Questionnaire
Learning interest tab selection
Extended information
Preference information
Dimension analysis of learning style
Learning interest label analysis
Assessment of social fire knowledge points
Learner behavior data
Data layer Performance information
Extended information
Data analysis layer Analysis of social fire knowledge points
Analysis of resource browsing behavior
Analysis of test results
Resource evaluation behavior analysis
Resource download behavior analysis
Presentation layer Learning style characteristics
Static interest preference characteristics
Characteristics of cognitive level
Dynamic interest preference characteristics
Fig. 3. Learner model building process diagram
As shown in Fig. 3, this section clarifies the learner model building process. Provide support for subsequent learner data collection and feature representation. Learner Feature Data Collection Learner characteristic data is the data base of learner model. Therefore, for the construction of the model, data is essential and important information, the determination of learners’ personalized characteristics is based on data collection as a starting point. In this study, the basic information, learning style and initial interest preference characteristics of the learner model are obtained by registration, questionnaire and questionnaire. The characteristics of dynamic interest preference and cognitive level are that the computer accesses the system Web logs to dynamically obtain the learners’ learning behaviors and their results during the learning process [6]. According to the four characteristics of the learner model, the learner database of the course learning platform is collected. It provides a data base for the representation of personalized features in the learner model. Learner information includes: basic information, learning style information, initial interest information, learning behavior information, assessment data information. Among them, the basic information is manually entered by the learner registration system. As static data, basic information is invariable and common, which can hardly reflect the individuation degree of learners. Therefore, this study will not focus on data analysis. Store student learning style data in a database. This part of information can reflect the learner’s preference for the type of resource media and the abstract level of the resource content. Assign the questions and answers to different codes. Question stem code by αi , each question contains two answers (A, B). Then the answer to question α1 is expressed in β1 and β2 . By analogy, i-question stem, the answer contains 2i answer code, each answer corresponding to different dimensions. Learning style information is the basic data for the subsequent dimension analysis of learning style.
260
Y. Cui and Y. Hu
When the learner registers as a new user, the system provides a standardized tag feature library composed of resource features. Learners according to individual needs to check the interest of the way the knowledge tag. This part of the information reflects the learners’ initial preference for learning resources, which belongs to static information. Learner’s initial interest label record table is the basic data for subsequent interest analysis. Learner’s assessment data, including the learner’s name, learning resources, resource names, depth of investigation and so on, are stored in the learning resource database. After the learner registration system completes the learning of each chapter, the result of knowledge point answer is obtained through knowledge point test as the data basis for analyzing learning cognitive level. The test questions of social fire knowledge come from the test questions database of learning resource model. Through the analysis of learners’ knowledge points and the results of their answers, we can get the learners’ grasp of learning resources in the actual learning process, that is, learners’ learning cognitive level [7]. Learners’ learning behavior information starts from the user login recommendation system. Learners record their whole learning behavior in the process of autonomous learning using learning resources. Learners’ Web log will get the Learners’ access to resources time, duration of continuous learning, learned resource coding, test time, the frequency of clicking test questions. The learner’s learning behavior data can not only reflect the learner’s preferences, but also can be related to the characteristics of learning resources. This part of information is used as the data base to analyze the dynamic characteristics of learners’ interest in learning resources. Based on the description above, the corresponding learner characteristic data are collected to provide data support for the learner model construction. Learner Feature Representation This research chooses Felder- Silverman style model as the theoretical guidance. The matching Solomon Learning Style Scale was used to measure learners’ learning styles. Firstly, learning style is divided into four dimensions: perception, input, processing and understanding. Then we analyze two different styles and their preferences in each dimension of the four dimensions to show the individual learning style characteristics. Learners are tested by questionnaires when they enter the recommending system of teaching resources. Based on the questionnaires, the features of learning style are static. The specific process of acquiring learner style features is as follows: Step 1: Learners’ learning styles are described in terms of four dimensional vectors to describe the style types and learning style propensity values in four different dimensions. The expressions are: χ = {(δ1 , ε1 , δ2 , ε2 , δ3 , ε3 , δ4 , ε4 )|εi ∈ [−1, 1] }
(1)
In formula (1), χ refers to the learner’s learning style. δi , εi represents the values of four different dimensions. δi represents a dimension. εi represents the values corresponding to the learning style dimension δi . Step 2: When learners fill out the Salomon Style Questionnaire, there are two choices for each question. The result of the answer is defined as φij , the result of each question
Personalized Recommendation Method for the Video Teaching Resources
261
is ϕ, i is the item number, j is the result of selection, and the value in the computer is 1 or 0. Step 3: When the learner has completed the questionnaire and submitted the answer data, the computer begins to analyze the data retrieved from the database. The results of four dimensions were screened, the number of j was classified and accumulated, and the final results were represented by C and D. Step 4: The scale is used to determine the size of C and D. The formula for calculating the difference is as follows: γ =C −D
(2)
In formula (2), γ represents the difference between C and D. Based on the result γ of formula (2), if γ ≥ 0, transpose step 1. If γ < 0, the result γ will be passed to φij , you can get specific learning style conclusion. Step 5: According to the final test results expressed as quadruples. Pass to the database to get the learner’s learning style characteristics. The above process completes the construction of learner model, and makes sufficient preparation for personalized recommendation of teaching resources. 2.3 Interest Preference Feature Acquisition Based on the learner model, the learner behavior is weighted. Based on this, the characteristics of interest preference are obtained to provide reference for personalized recommendation of follow-up teaching resources [4]. The behavior of each learner to each folk sports social fire video teaching resources can reflect his preference to resources. Based on the weight of each operation behavior, this section presents a weighted representation of the learners’ preferences for different labels. Create a learner -tag matrix with the following expressions: ⎤ ⎡ λ11 λ12 · · · λ1n ⎢ λ21 λ22 · · · λ2n ⎥ ⎥ ⎢ (3) λm×n = ⎢ . .. .. ⎥ ⎣ .. . λ . ⎦ ij
λm1 λm2 · · · λmn
In formula (3), λm×n represents the learner -tag matrix. λij represents the cumulative behavior weights of the learner ui on the instructional resource label tj . Based on the learner’s preference analysis of single resource label and the representation of learner -resource scoring matrix. Interest preference characteristics of a single learner based on learning behavioural weights for all historical resources will now be analysed [8]. Known sets of teaching resources, learner behavior, and labels are X , Y , and T . In addition, time factor can be introduced to adjust learners’ interest bias. In this algorithm, learners’ characteristics based on time parameters are defined. The idea of adaptive time attenuation function and Ebbinghaus forgetting curve are adopted. The formula for learners ui and labels based on the time factor is: f time = ϑ + (1 − ϑ)e
− d1 −dtj
(4)
262
Y. Cui and Y. Hu
In formula (4), f time is the value of the time weight. ϑ represents auxiliary parameters. With the decrease of ϑ value, time factor has more influence on learners ui and label tj . d1 is the current time. dtj indicates the last time that tag tj marked learner ui . Based on the above analysis, the characteristic vector expression of learner interest preference is: ηui =
λm×n · ωi τ ∗ f time
(5)
In formula (5), ηui is the characteristic vector of learners’ interest preference. ωi represents the weights of learner behavior. τ is the cofactor of interest preference feature acquisition, whose value range is [1, 10]. The above process completes the acquisition of learners’ interest preference features, and provides support for the implementation of personalized recommendation of subsequent teaching resources. 2.4 Personalized Recommendation of Teaching Resources Based on the above obtained interest preference feature vectors. The similarity between learners’ interest preferences and teaching resources and the connection between teaching resources are calculated. Through collaborative filtering recommendation algorithm to obtain the best teaching resources personalized recommendation results. There are many ways to calculate the similarity between learners’ interests and teaching resources. Collaborative filtering of learner data is based on learner data that have had common behavior with teaching resources. Cosine similarity is used to calculate the similarity between learners’ interests and teaching resources. The formula is as follows:
μ ηui , Xj =
ωi · ωj 1 ∗ σˆ · N (ui ) |ωi | · ωj
(6)
In Formula (6), μ ηui , Xj indicates the similarity between ηui and j Teaching Resource Xj , which is the preference of ui learners. σˆ represents the similarity factor and needs to be set according to the actual situation. N (ui ) represents the total number of learners ui . ωj represents the weight coefficient of the j teaching resources. In the calculation of similarity, the higher the similarity value μ ηui , Xj of teaching resources and learners’ interest preference is, the closer the teaching resources and learners’ interest preference are. Then this teaching resources are worth recommending. The connection between teaching resources is used to measure the close relationship between teaching resources. On the recommendation of teaching resources, the more closely the students’ cognitive knowledge points are connected with the teaching resources. The more this resource meets learners’ learning needs, the more it is recommended. For the same teaching resources include knowledge points, we believe that two knowledge points in the path of equal status. That is, there is no sequence of two knowledge points, their shortest path distance set to 1. If there is no path between Xj and Xk or Xj and Xk are the same knowledge point, set the shortest path distance to 0. When
Personalized Recommendation Method for the Video Teaching Resources
263
there is no path between two teaching resources, there is no direct relationship between them. When the two teaching resources are the same, it is considered that the learner has already learned the teaching resources and there is no need to repeat the learning, so the shortest path is set to 0 [9]. Remove the above from any two teaching resources Xj and Xk . Their shortest path distance is the shortest path distance that can be retrieved. The formula for calculating the connection degree ζjk of teaching resources is: ⎧ ⎪ Xj = Xk ⎨ 0 1 Xj , Xk ∈ KXi ζjk = (7) ⎪ ⎩ d (Xj ,Xk ) other d0
In Formula (7), KXi represents the i-type of teaching resources. d Xj , Xk is the path distance between Xj and Xk . d 0 represents a unit of path distance to measure auxiliary parameters. Based on the calculation result ζjk of formula (7), the connectivity between Xj and Xk of teaching resources shall be determined. The formula shall be: in|Xk |
ζjk ζjk = 0 Xj , Xk = (8) 0 ζjk = 0
In Formula (8), Xj , Xk represents the connection between teaching resources Xj and Xk . in|Xk | is the degree of Xk . Based on the above calculated similarity between learners’ interest preference and
and teaching resources is cal, X teaching resources, the connection between μ η u j i
culated as Xj , Xk . Based on collaborative filtering recommendation algorithm, the personalized recommendation results of the best teaching resources are obtained. In order to make better recommendation of folk sports social video teaching resources, and avoid the shortcomings of collaborative filtering algorithm based on learners and teaching resources. This study proposes the following algorithms, which combine learner -based and instructional resource-based collaborative filtering algorithms. The hybrid recommendation model is used to complete the personalized recommendation of teaching resources. The specific steps are as follows: Step 1: Input learner-instruction resource matrix. Learners a teaching resource matrix recorded as W (uN , XM ). Among them, N represents the total number of learners, M represents the total number of teaching resources. The element Wij corresponding to column j in line i of the matrix represents the learner ui ’s rating of instructional resource Xj . If the learner ui does not grade the instructional resource Xj , set the corresponding element to null. Step 2: Formation of similar teaching resource sets. resources Xj and Xk in the teaching resources, get the connection degree
For any Xj , Xk . Generally speaking, in the process of teaching recommendation, the relationship between teaching resources will not change much within a certain time range. Therefore, the connection degree Xj , Xk between teaching resources can be calculated offline in advance. And stored in a special database table, regularly updated. For any teaching resource Xj , search on the whole teaching resource set and select the first
264
Y. Cui and Y. Hu
l teaching resources with the largest connectivity Xj , Xk . It is integrated to obtain similar teaching resources.
Step 3: Formula (6) is used to calculate the similarity μ ηui , Xj between learners’ interests and teaching resources. Combined with the score results of learner ui on teaching resource Xj . Select appropriate teaching resources to recommend to corresponding learners. Through the above process, the personalized recommendation of video teaching resources of folk sports club fire can be realized. Provide assistance for the development and promotion of Shehuo folk sports.
3 Experiment and Result Analysis 3.1 Experimental Condition Settings In order to verify the application performance of the proposed method, 10 different experimental conditions were set to improve the accuracy of the experimental conclusions. The experimental conditions are set up as shown in Table 1. Table. 1 Experimental condition setting table Condition number
Number of teaching resources/piece
Number of learners
1
2356
124
2
3012
258
3
4015
345
4
2894
201
5
2101
198
6
2548
154
7
3057
320
8
3491
426
9
5201
388
10
5478
295
As shown in the data in Table 1, in the 10 experimental conditions, the number of learners is different from the number of video teaching resources of Folklore Sports Club. It meets the needs of personalized recommendation experiments for teaching resources. 3.2 Determination of Experimental Parameters The parameter σˆ determines the calculation accuracy of the similarity between learners’ interest preferences and teaching resources. Therefore, it is necessary to determine its optimal value before the experiment to ensure the best performance of the proposed method.
Personalized Recommendation Method for the Video Teaching Resources
265
Similarity calculation accuracy /%
90
85
80
75
70
65
60
0.1
0.2
0.3 0.4 0.5 Parameter value
0.6
0.7
Fig. 4. Schematic diagram of the relationship between parameter σˆ and similarity calculation accuracy
The relationship between the parameter σˆ and the similarity calculation accuracy is shown in Fig. 4. As shown in Fig. 4, when the parameter σˆ value is 0.21, the calculation accuracy of the similarity between the learner’s interest preference and the teaching resource reaches the maximum value of 89.5%. Therefore, it is determined that the optimal value of parameter σˆ is 0.21. 3.3 Analysis of Results Based on the experimental conditions set above and the determined experimental parameters, a personalized recommendation experiment of teaching resources is carried out, and the recall rate and accuracy rate are selected as evaluation indicators. Among them, the recall rate is the ratio of the number of teaching resources preferred by learners in the recommendation result set to the number of teaching resources preferred by all learners. The accuracy rate is the ratio of the number of successful recommendation results to the number of all recommended teaching resources. Under normal circumstances, the larger the recall rate and the accuracy rate, the better the personalized recommendation effect of teaching resources. On the contrary, the smaller the recall rate and the accuracy rate, the worse the personalized recommendation effect of teaching resources. The recall and accuracy data obtained through experiments are shown in Fig. 5. As shown in Fig. 5 (1) data, the recall rates obtained by applying the proposed method are greater than the given minimum limit, with the maximum reaching 96 per cent. As shown in Fig. 5 (2) data, the accuracy rates obtained usby aing the proposed method are greater than the given minimum limits, with the maximum reaching 98 per cent. The experimental results show that the proposed method has a higher recall and accuracy than the given minimum, which fully proves the effectiveness and feasibility of the proposed method.
266
Y. Cui and Y. Hu
Fig. 5. Recall and precision data graph
4 Conclusion In order to improve the accuracy of the recommendation of folk sports community fire teaching video resources. This study introduces the theory of mobile learning and proposes a new method of personalized recommendation of teaching resources in folk sports communities. By analyzing the characteristics of learners, a learner model is constructed. Based on the similarity between learners’ interests and resources, the collaborative filtering recommendation algorithm is used to obtain personalized recommendation results. The experimental results show that the recall rate and accuracy rate of this method are higher than 96%. It not only contributes to the dissemination of folk sports social fire videos, but also provides support for the development of folk sports. In the subsequent research, we will consider collecting the implicit scoring information of users, such as users’ favorites, browsing time, download records, etc., to reduce the dependence on users’ explicit scoring.
References 1. Zhang, Y., Wang, D.: Integration model of English teaching resources based on artificial intelligence. Int. J. Contin. Eng. Educ. Life-Long Learn. 30(4), 398 (2020) 2. Tucker, B.V., Kelley, M.C., Redmon, C.: A place to share teaching resources: speech and language resource bank. J. Acoust. Soc. Am. 149(4), 147 (2021) 3. Yang, X.: Research on integration method of AI teaching resources based on learning behavior data analysis. Int. J. Contin. Eng. Educ. Life-Long Learn. 30(4), 492 (2020) 4. Li, H., Zhong, Z., Shi, J., et al.: Multi-objective Optimization-based recommendation for massive online learning resources. IEEE Sens. J. 21(22), 25274–25281 (2021) 5. Pei, Z., Wang, Y.: Analysis of computer aided teaching management system for music appreciation course based on network resources. Comput.-Aided Des. Appl. 19(S1), 1–11 (2021) 6. Li, J., Zhang, Y., Qian, C., et al.: Research on recommendation and interaction strategies based on resource similarity in the manufacturing ecosystem. Adv. Eng. Inf. 46(1), 101183 (2020)
Personalized Recommendation Method for the Video Teaching Resources
267
7. Wang, H., Fu, W.: Personalized learning resource recommendation method based on dynamic collaborative filtering. Mob. Netw. Appl. 26, 473–487 (2021) 8. She, X.B., Huang, S., Liu, C.Q.: Web resource priority collaborative filtering recommendation based on deep learning. Comput. Simul. 39(2), 431–435 (2022) 9. Atawneh, S., Al-Akhras, M., AlMomani, I., et al.: Collaborative mobile-learning architecture based on mobile agents. Electronics 9(1), 162–162 (2020)
Intelligent Monitoring System of Electronic Equipment Based on Wireless Sensor Minghua Duan1(B) and Caihui Wu2 1 Department of Urban Rail Transit and Information Engineering, Anhui Communications
Vocational and Technical College, Hefei 230051, China [email protected] 2 China Construction Third Bureau Group South China Co., Ltd., Guangzhou 510000, China
Abstract. Aiming at the problem of poor stability of the intelligent monitoring system for electronic equipment operation status, an intelligent monitoring system for electronic equipment operation status based on wireless sensors is designed. Optimize the system hardware structure, build an online monitoring and analysis platform based on wireless sensors, and optimize the system software functions. This paper presents a mathematical model for online monitoring of electronic equipment operation status based on wireless sensors, and improves the conventional model. The dimension information table is compressed by encoding and stored in the fact table to reduce the storage overhead. The test results show that the performance of the intelligent monitoring system of electronic equipment running state based on wireless sensors is significantly better than the traditional methods. Keywords: Wireless Sensor · Electronic Equipment · Monitoring System · Running State
1 Introduction At present, the domestic research on the operation status monitoring of electronic equipment is mostly focused on offline mode, while online monitoring is relatively small, and some detection functions are realized through virtual instruments, which are only suitable for local detection and cannot complete remote monitoring. In view of the above problems, it is of practical significance to propose an online monitoring system for the operation status of electronic equipment. Literature [1] carried out research on online non-contact comprehensive temperature monitoring system that does not affect the safe operation of high-voltage switchgear, transformer and other electrical equipment to effectively ensure the safety of electrical equipment operation. Monitoring based on omni-directional scanning technology can not only monitor the monitored electrical equipment globally, but also focus on the bus connection points, cable joints and other parts, and can also improve the effectiveness of the monitoring system and the reliability of electrical equipment operation, however, the signal monitoring and abnormal diagnosis functions of the system need to be improved. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024 Published by Springer Nature Switzerland AG 2024. All Rights Reserved B. Wang et al. (Eds.): ICMTEL 2023, LNICST 533, pp. 268–284, 2024. https://doi.org/10.1007/978-3-031-50574-4_19
Intelligent Monitoring System of Electronic Equipment
269
Literature [2] designs a wireless network-based remote monitoring system for the status of electrical equipment. The system collects the status signals of electrical equipment remotely through wireless sensor networks, and de-noises the signals, then extracts the features from the status signals of electrical equipment, and introduces machine learning algorithm to fit the relationship between the features and the status of electrical equipment, and establishes a monitoring model for the status of electrical equipment, so as to obtain the status results of electrical equipment. Finally, the monitoring results of electrical equipment status are sent to the management personnel through the Internet, and the remote monitoring system of electrical equipment status is simulated and tested. The results show that the system can carry out high-precision remote monitoring of the status of electrical equipment, but the corresponding solution cannot be given in time when the system has a BUG. Literature [3] mainly studies the application of intelligent sensors in the monitoring of electrical equipment, describes the characteristics of electronic sensors, and the application of electronic sensors in the detection of electrical equipment, including data transmission, camera monitoring, information display, automatic control, as well as the application of engine control, safety system, navigation control, and drive system in the intelligent automobile. Based on this, this paper briefly designs an intelligent monitoring system for the operation status of electronic equipment based on wireless sensors, hoping to optimize the online monitoring performance.
2 Intelligent Monitoring System for Electronic Equipment Operating Status 2.1 System Hardware Structure Configuration The hardware design of the system adopts B/S architecture as the application layer, the cloud computing platform is added to the whole architecture to open cloud services for the monitoring system, integrate the equipment to be monitored according to the distributed layout, and complete the unified management of multiple transactions. Among them, it is necessary to complete the sharing and exchange of data in real time among the application devices, and make equivalent connections between mainstream SQL Server motherboards, MYSQL converters and Oracle modules to ensure the reliability and security of output information. A distributed intelligent system for efficient monitoring is established, and the internal network layout is divided into three levels. The first layer is the management application layer, the second layer is the data transmission layer, and the third layer is the field device layer. The specific layout style is shown in Fig. 1: The wireless sensor network nodes in the system are composed of five parts: sensor module, processor module, wireless communication module, interface module and power management module, as shown in Fig. 2. The sensor module is responsible for the collection and data conversion of information in the monitoring area: the processor module is responsible for controlling the operation of the entire sensor node, storing and processing the data collected by itself and the data sent from other nodes, including data security, communication protocols,
270
M. Duan and C. Wu Data transport layer
Field equipment layer
The server
Switch
Data collector
Field workstation
Switch
Data collector
Base station
Switch
Data collector
Management application layer
Website access
Connect serial port
Fig. 1. Overall structure of the internal network layout of the system
Interface module
JTAG
UART
I2C
Battery Processor module ATmega128L AMS111 7
Communication module CC2420
Sensor module
Transfor mer
Fig. 2. Hardware design module diagram of wireless sensor network node
synchronous positioning, power management, task management, etc.; The wireless communication module is responsible for wireless communication with other sensor nodes, exchanging control messages and receiving and transmitting collected data; the power management module provides all power required for the operation of sensor nodes. The traditional status monitoring platform mainly includes the following three parts: data acquisition layer; data storage and analysis layer; data access layer. Among them, the data acquisition layer transfers business data such as production management and monitoring information to the data warehouse in real time through processes such as extraction, transformation, and loading. The data storage and analysis layer selects conventional models, ancestor data, and uses OLAP to read business data in the data warehouse. The functions of the data access layer are mainly to provide data mining and statistical queries for staff and users. It should be noted that the traditional condition monitoring platform mainly uses the conventional snowflake model or star model to organize data, which has
Intelligent Monitoring System of Electronic Equipment
271
poor scalability and high cost, which is not enough to meet the storage and optimization needs of massive power data. Constructing a new condition monitoring platform using the technology and big data technology, see Fig. 3
Sensor
CAC
CAG
HDFS
Mysql
Hive
MapReduce
Impala
MPP
Hbase
OLAP
Auxiliary decision making Statistical query Data mining
Fig. 3. Condition monitoring platform
The main work of the data acquisition circuit is to collect and process the power generation parameters and environmental parameters of electronic equipment and transmit the processed data frames to the terminal node, including the current and voltage generated by electronic equipment, the temperature, humidity, wind speed and irradiance parameters of the current environment. Relevant parameters of various sensors are shown in Table 1: The input signal of the sensor is mostly voltage output, so A/D conversion is required to convert the voltage signal into a digital signal. There are many A/D conversion channels required in the acquisition terminal, so STM32F103ZET6 is selected. It has three 12 bit analog-to-digital converters, 21 input channels, and 112 multi-functional 1O ports, which can fully meet the needs of the monitoring system. As shown in the table, the upper limit of the sensor’s output signal is 5 V, while the upper limit of the A/D port’s voltage conversion is 3.3 V, so it is necessary to reduce the voltage of the sensor’s output signal. The RS232 serial port module is designed to realize the communication between Zigbee coordinator and 3G module. The circuit adopts the MAX232 drive level conversion chip of Maxim’s company. The RS232 standard generally adopts a 9-wire system, namely DB9 (nine-pin serial port). MAX232 is mainly composed of four parts: charge pump circuit, RS232 driver, RS232 receiver and power supply part. The charge pump part is
272
M. Duan and C. Wu Table 1. Sensor related parameters
Sensor type
Measuring range
Output signal
Current transformer
0–90 A
0.6 V
Voltage transformer
0–320 V
0–6 V
Irradiance sensor
0–1600 W/m2
0–2.6 V
Temperature and humidity sensitivity
−40 ºC–80 ºC
digital signal
0–100%RH Wind speed sensor
0–32.5 m/s
0.5–2 V
responsible for generating the level in accordance with the RS232 communication standard, this part is composed of 1–6 pins of MAX232 plus four capacitors. On the basis of studying the single operation status monitoring system of power equipment in the past, a specific implementation scheme of the operation monitoring system based on the current big data technology is proposed. The functional structure of the system mainly includes multiple monitoring sets, node performance monitoring, operation monitoring and early warning modules. Monitoring is accomplished through agents and monitoring plug-ins. These plug-ins include data acquisition modules, data processing modules and data transmission modules, as well as monitoring plug-ins (Including data receiving module and visualization module) [4, 5]. The functional structure of the system is shown in Fig. 4.
Big data platform power equipment operation status monitoring system
Multivariate monitoring set
Node performance monitoring
Monitoring agent
Data acquisition module
Data processing module
Operation monitoring
Early warning module
Monitoring plugin
Data sending module
Fig. 4. Functional structure of hardware equipment of the system
The field equipment layer mainly uses the data collector at the basic level to collect the data of the field equipment, and needs to upgrade the field electricity meter. The field
Intelligent Monitoring System of Electronic Equipment
273
device layer collects the real-time data through the power data collector and transmits it to the collector via RS485 bus. The data transmission layer uses the concentrator to centrally process the received data and transmit it to the server through TCP/IP. The user can log in to the designated management account through the PC to access, manage and monitor the data of each monitoring point. The management application layer is mainly a human-computer interaction interface, providing users with various application functions. 2.2 System Software Function Optimization In realizing the monitoring of the operation status of power electronic equipment, it is necessary to use Web-related technologies to complete the work of real-time monitoring of the work of power electronic equipment, and at the same time, it can provide analysis of the corresponding abnormality through the system. In-depth analysis and application of technology, the more typical Web service technology, which has strong practicability, and can meet the system’s demand for Web services in the early stage of system planning. In addition, the technology is frequently used in software development, has many powerful functions, and integrates a number of practical programmable applications, which can meet the needs of any development environment, meet the open XML standards, and provide distributed and interoperable applications [6, 7]. Based on the analysis of the characteristics of the wireless sensor network monitoring system, a wireless sensor network based monitoring system for the running state of electronic equipment is designed. The system is composed of two parts: a field acquisition unit and a remote control unit. The field acquisition unit includes a reduced function node (RFD) sink node (Sink) and a gateway. Among them, RFDs only collect data without routing, and different RFDs cannot communicate with each other; The sink node is responsible for collecting the data sent by RFD and selecting the most suitable route to send the data; The gateway sends the received wireless signal to the remote control unit through the wired network. Remote control consists of router, data storage server and remote control terminal. The router coordinates the communication between different on-site acquisition units and the remote control terminal; the data storage server not only collects and stores the data of different networks, but also provides backup for the data of each individual network, which strengthens the reliability of the network system; the remote control terminal is an engineering technology personnel provide real-time data and issue control commands to the network. The existing wireless sensor is used to connect the field acquisition unit and the remote control unit, which ensures the reliability and timeliness of the wireless sensor. The overall design scheme of the online comprehensive monitoring system for equipment operating status based on the integration of wireless sensor information is shown in Fig. 5. The system is divided into field layer, platform layer and presentation layer. The field layer takes the integrated monitoring device of electronic equipment as the core to realize the collection, analysis, display, control, alarm and other functions of various monitoring information of electronic equipment; The platform layer is the coordination and dispatching center of the whole system, which is composed of a variety of microservices to realize the collection and storage of field layer data and the interaction with presentation layer data; The presentation layer is mainly used to receive user requests,
274
M. Duan and C. Wu Computer terminal of monitoring center Portable application tablet end
Inter net
Back-up services
Cloud database
Data collection service
Box transformer 1
Box transformer 2
Box transformer n
Integrated monitoring device of box type substation
Integrated monitoring device of box type substation
Integrated monitoring device of box type substation
Telemetry function module
Telemetry function module
Telemetry function module
Remote signaling function module
Remote signaling function module
Remote signaling function module
Remote adjustment function module
Remote adjustment function module
Remote adjustment function module
Remote control function module
Remote control function module
Remote control function module
Fig. 5. The overall design scheme of the online integrated monitoring system for the operation status
respond to data display, and provide users with interfaces to interact with machines [8]. Based on multiple deployments and system functional structures, workflow can be divided into performance monitoring workflow and job operation monitoring workflow, as follows: performance monitoring procedures include multiple monitoring sets, performance performance monitoring and node performance monitoring, and various indicators can be monitored through performance plug-ins, such as monitoring network services and monitoring system host resource operation. If Icings has no monitoring function, all monitoring needs to be completed by installing plug-ins. The plug-in will feed back the monitored results to icings, and analyze these results in the form of Webpage text for users to view. At the same time, the plug-in also realizes fault warning effect. Icings use the NRE to obtain information about the performance of the monitored hosts. NRPE includes two parts: the inspection part, which is located on the monitoring host; the acquisition part, which is arranged on the terminal power equipment to collect information. Its overall flow chart is shown in Fig. 6. Based on the working principle of wireless sensor, intelligent operation of equipment is realized through remote collection and centralized processing. But in the long distance transmission of signals, noise signals will inevitably be introduced. The noise source may be the fixed frequency noise generated by the operation of the generator, or it may be the white noise coupled with the transmission line [9, 10]. No matter what kind of
Intelligent Monitoring System of Electronic Equipment
275
Start
Icinga startup
Are there any new monitoring events added?
N
Y Run check nrpe to monitor events
The nrpe daemon returns the check to the check nrpe plug-in
Check nrpe connects to remote nrpe dacmon via SSL
Received restart or shutdown signal command
N
Y Ieinga monitoring logic
Display on the front desk and send an alarm
End
Fig. 6. System performance monitoring workflow
noise, it is not conducive to the high-quality transmission of the signal. If the noise problem is not handled well, it will restrict the operation status monitoring function of the generator set of LSTM neural network. In order to eliminate noise interference, the wavelet algorithm is used to process in the frequency domain to realize signal noise reduction. If (x) ∈ L2 (R) assumes L2 (R) can be used to describe a square integrable space real number, ϕ(a) is Fourier transform, when ψ can make the following equation true: +∞ ψ(n)g(x) ≤ 1 (1) μ= ϕ(a)L2 (R) − −∞
Calling (n) as the base wavelet, it can be deduced from Eq. (1) that the base wavelet must satisfy ϕ(n) = 1, so g(x) must have bandpass properties. a and b respectively represent the two load points of the power distribution system; c represents that the two load points a and b constitute an electronic equipment node. A value of 0 indicates the power supply amount that the user does not obtain at this load point. Therefore, its value result is as follows:
276
M. Duan and C. Wu
⎧ γ1 ⎪ ⎪ ⎪ ⎨γ 2 ⎪ γ3 ⎪ ⎪ ⎩ γ4
=1 = mb /μ(ma + mb ) = μma /(ma + mb ) =μ−1
(2)
The calculation process of the working probability of the system node under normal conditions is as follows: P(n) = P(γ1 , γ2 , γ3 , γ4 )
P, γ1 = 1|γ2 , γ3 , P(γ4 ) =μ P(γ1 , γ2 , γ3 , γ4 ) =μ
b
P(γ1 , γ2 , γ3 , γ4 ) P, γ1 = 1|γ2 , γ3 = 0, P(γ4 = 0)
(3)
According to the network model shown in the formula, it conforms to the principle of reliable operation of electronic equipment, and the reliable power supply performance of the power distribution system can be monitored through wireless sensors. Under normal circumstances, the ID3 and C4.5 algorithms are used to find the most suitable segmentation threshold, and then provide a basis for the power consumption information collection, operation and maintenance monitoring system. When the ID3 algorithm is executed, it can only process split data, tend to have more attribute values and discrete data. Compared with this algorithm, the C4.5 algorithm has no information gain feature in attribute selection, but calculates the information gain as a standard. Information gain is defined as: gain− ratio(Z) =
gain(Q) − P(n) split− inf(AZ)
(4)
Use the quick sort method to sort the data attributes, and finally calculate the information gain to determine the local threshold. The specific calculation steps are as follows: (1) Arrange the data set Q in ascending order according to the value of the continuous attribute gain Z, assuming that the value sequence of the sorted attribute value Z is: {A1 , A2 , · · · , An }, where n = |AZ|. (2) For all Ai and 1 ≤ i ≤ n − 1, divide Q into two subsets: Q1 = Aj |Aj ≤A
and Q2 = Aj |Aj >A according to the value A=(Ai + Ai + 1) 2, and calculate the information gain GainA after division. (3) Find the largest A in GainA, and record it as the local threshold Amax . (4) Find the threshold value closest to Amax in {A1 , A2 , · · · , An }, and divide the threshold value to complete the discrete differentiation of Z. The algorithm can be used to classify and analyze the characteristics of power supply units, user groups and distribution stations; On the basis of data statistics, the declared capacity information of newly installed users shall be recorded, and the power consumption information shall be increased or decreased in time to realize the real-time monitoring of power consumption information acquisition, operation and maintenance.
Intelligent Monitoring System of Electronic Equipment
277
Aiming at the state information acquisition and processing system required by the combination of traditional power electronic equipment and embedded technology, it can analyze and diagnose the implementation running state of power electronic equipment. The implementation and operation status information data of the equipment is collected, and the information data collected in real time is transmitted in time, and then the system can realize the follow-up information diagnosis and analysis operations. The information data collection and information data transmission included in this system are established. It is realized on the basis of wireless sensor technology, which is divided into two aspects: information collection and information transmission. In the information collection part, the parameter terminal node of the electronic equipment needs to be detected. The specific software operation process is shown in Fig. 7.
Start
System initialization
Port initialization
Serial port initialization
Open area a / D conversion N Whether there is analog signal Y Wait for the conversion result
Is the conversion successful
N
Y Edit data frame
Transmission data frame
End
Fig. 7. Monitoring terminal software design process
From the above workflow, we can see that the main program of this system will determine the system configuration information after startup, and initialize the system
278
M. Duan and C. Wu
modules. At this time, various instructions can be configured, and when the configuration is completed, specific configuration instructions will be executed. If no configuration is made, the system will work according to the initial configuration of the system. In a specific working mode, it communicates with the information acquisition device of power electronic equipment, acquires the real-time operation status information data of specific equipment through the power electronic equipment end, and transmits the information to the system server. In the controlled mode, the status information is collected and transmitted through the collection instructions sent by the server, and the server is responsible for sending specific information interaction instructions. 2.3 Realization of Intelligent Monitoring of Electronic Equipment Operation The field layer is a collection of electronic equipment distributed in different physical locations based on the Internet of Things connection. Each electronic equipment is installed with a comprehensive monitoring device to realize the monitoring of various electrical quantities, Monitoring and intelligent control of non-electrical quantities and switch states. The platform layer is a set of management analysis software and data collection deployed on the ECS, which is the coordination and dispatching center of the whole system and consists of a variety of microservices. In addition, the background service can accept the control command from the presentation layer to control the electronic devices on the field layer in the way of 4G communication; The SQL Server cloud database is used to store various types of monitoring data of various electronic devices at the field level, including user table, electronic device information summary table, electronic device model information table, and storage table of various monitoring information corresponding to each electronic device. The physical model of the database is shown in Fig. 8. User management
User management
Database management
Real time status display
Data analysis
Fault prediction
Data table maintenance
Information maintenance Database backup Permission management
Status monitoring
Historical data query
Remote control
Box transformer geographic information query
Alarm fault query
Fault diagnosis
Fault diagnosis status evaluation
Fig. 8. Presentation layer HMI software functions
Users in the user table can be divided into administrator-level users and ordinary users. Administrator-level users can maintain the database, including adding/deleting users, assigning user rights, adding/deleting electronic devices, and creating corresponding monitoring information storage tables/delete etc. Ordinary users can query and control the information of electronic equipment within the scope of authority, and the scope
Intelligent Monitoring System of Electronic Equipment
279
of user authority is specified by the code list in the code field of electronic equipment. The presentation layer is mainly used to receive user requests, respond to data display or realize remote control of electronic equipment on the field layer, and is a man-machine interface for users to interact with the integrated monitoring system. The presentation layer provides two human-computer interaction modes: the monitoring center computer terminal and the mobile application tablet terminal. The computer side is a Web application based on the browser/server (BS) structure developed by Jave Script 5th Generation Hypertext Markup Language (HTML5) and Cascading Style Sheets (CSS); The tablet end is an application (App) developed based on the Jave language, which is installed in the tablet computer or mobile phone to facilitate the operation and maintenance personnel to carry and maintain at any time. In terms of functions, the computer side has added two items of database maintenance and user management compared with the tablet side, and only users who log in as system administrators have the right to operate on these two items. Ordinary users can only browse information and perform remote control with limited permissions. The specific functions are shown in Fig. 9. Taking the state monitoring of electronic equipment as an example, considering the large amount of electronic equipment state monitoring, set up a state monitoring main IED for one electronic equipment, and set up a state monitoring main IED for one electronic equipment, and for the dissolved gas monitoring in oil, partial discharge monitoring, electronic equipment on-load voltage regulation tap The functions of switch monitoring and winding hot spot temperature monitoring are respectively set up with sub-IEDs, and the main IED realizes the functions of data collection and preliminary diagnosis of each sub-IED. Therefore, for the condition monitoring system, the bay layer mainly includes electronic equipment/reactor condition monitoring main IED capacitive equipment/metal oxide arrester condition monitoring main IED, circuit breaker/GIS condition monitoring main IED, etc. to realize the relevant monitoring data of the equipment summary, processing, comparison, early warning and other preliminary self-diagnosis functions. In addition, according to the monitoring state quantities covered by the monitoring system mentioned above, the bay layer should also deploy the environmental monitoring IED to obtain the relevant temperature and humidity information: deploy the measurement and control IED, and obtain the comprehensive analysis of the relevant power information. The master station layer mainly includes the master station monitoring unit, remote diagnosis, decision support and other application systems. The monitoring unit of the master station gives a comprehensive evaluation of the current health status of the equipment by analyzing the status monitoring data of the whole network. While the data analysis and evaluation results are released throughout the network, they can also be transmitted to remote diagnosis, decision support and other application systems to provide a basis for equipment risk assessment and maintenance strategy formulation.
280
M. Duan and C. Wu
Dissolved gas in oil monitoring sub IED
Transformer status monitoring main IED
Partial discharge monitoring sub IED
On load tap changer monitoring sub IED
Winding hot spot temperature monitoring sub IED
Fig. 9. Transformer Status Monitoring IED
3 Analysis of Experimental Results In order to prove whether the intelligent monitoring system designed this time can complete the real-time monitoring of the fault under the running state of the distribution electronic equipment, solve the problem that the long-term information in the fault area cannot be covered, and ensure the normal power consumption of the non-faulty equipment, the method of comparative experiment is used to carry out verify. In order to ensure the authenticity and reliability of the experimental results, the MATLAB software is used to process the experimental data and obtain standardized data results. The main part of the hardware configuration of the system test is the network content, which mainly includes the selection of Huawei Fecal RH2288V2 server or a storage server of the same level as the server. The hardware on the client side is mainly PC, its CPU needs to reach 4 cores, the minimum frequency needs to be 2.7 GHZ, the hard disk space needs to be at least 500G, and the computer memory is DDR48G memory. The server operating system is based on linux 3.5.1. The relational database management system is oracle 0.19 commercial version. The server system has VM and VPN support and services. The operating system of the client side is Microsoft Windows, and the browser needs to use the browser with IE10 core. The platform used for information acquisition and transmission of power electronic equipment is required to adapt to the basic requirements of Datasocket library and basic C language library in terms of performance test. By allowing more users to access the system and process some work processes, the system’s reaction efficiency is evaluated and the expected process processing efficiency is evaluated. The system performance is calculated and analyzed based on the actual data obtained from the evaluation. If the calculated data can meet the corresponding standards, the system performance test is passed.In the system performance test, it is necessary to inspect the operation stability of the system, and at the same time, it is necessary to analyze the data and information standards obtained in the test process to see if it can meet the test standards, which is used to judge the perfection of the system development. Set the system to run continuously for 12h at
Intelligent Monitoring System of Electronic Equipment
281
rated 150k. By collecting the temperature change value of the load electronic equipment in different states, set the operation data to be collected every 30min. The results are shown in Table 2: Table 2. Transformer temperature changes under different complex loads (unit: ºC) Interval
1.0 load
1.5 load
2.0 load
3.0 load
30 min
43.7
45.3
48.2
53.5
60 min
47.5
51.5
55.6
60.5
90 min
52.6
56.3
61.3
70.3
120 min
58.5
68.2
67.6
79.5
According to the table, the operating temperature of electronic equipment is collected when the load is increased by 0.5 times. Basically, the temperature changes with the load. As the winding hot spot temperature of the transformer can limit its load capacity, once exceeding its internal temperature, it will accelerate the aging of electronic equipment materials, reduce the effective utilization efficiency of electronic equipment, and lead to power interruption of the entire line. Input the electronic equipment data collected by each group into the monitoring system, and monitor the actual ambient temperature and hot spot temperature of the electronic equipment under the 24-h uninterrupted operation condition. In the actual measurement and data collection process of electronic equipment operation, it is ensured that the monitoring system can accurately record working condition information, understand the safety indicators of electronic equipment circuit operation under different states, and comprehensively evaluate its main influencing factors to ensure that under the monitoring system. Real-time data sharing can be achieved. When the load is large, the hot spot temperature of electronic equipment exceeds 65 °C, and equipment failure will occur at this time. The application effect of each group of 100 Room temperature change curve Heating value curve of transformer
Temperature change / º C
80
60
40
20
0 0
4
8
12
16
20
Monitoring time / h
Fig. 10. System monitoring results of this paper
24
282
M. Duan and C. Wu
monitoring systems under different room temperature conditions is shown in the Fig. 10 (Fig. 11). 100
Room temperature change curve Heating value curve of transformer
Temperature change / º C
80
60
Rest phase
40
20 Rest phase
0 0
4
8
12
16
20
24
Monitoring time / h
Fig. 11. Traditional system monitoring results
It can be seen from the figure that the statistical value of the two groups of traditional monitoring systems does not show a similar change curve with the change of room temperature. When the electronic equipment reaches the extreme value of the hot spot temperature, it cannot complete the early warning at the first time, so that it can continue to work in a high temperature environment, which is likely to cause damage to the equipment. The above monitoring data can prove that the monitoring system designed in this paper can complete the monitoring of electronic equipment with different loads under the condition of transformation, find out the fault location in time, and protect the longterm operation of power distribution equipment. Relevant functions of the developed system were tested, including four functional components: login monitoring module, signal acquisition, signal processing and analysis, and abnormal diagnosis. The results are as follows Tables 3 and 4): Table 3. System login module test Test title:
Testing procedure
Expected results
Test result
Complete the verification of the entered user name and password
Enter user name and password; Click the login button
If the user name and password are correct, you can jump. If the password is wrong, an error message will be returned
Meet the expected design
Intelligent Monitoring System of Electronic Equipment
283
Table 4. System running status monitoring functions Test title:
Testing procedure
Expected results
Test result
Test whether the communication is connected normally
Enter the account name Successfully logged and password to log in; in Click power transformer monitoring; View connection status
Meet the expected design
Normal display of monitoring signal
Enter the account name Successfully logged and password to log in; in Click power transformer monitoring; View the monitored waveform
Meet the expected design
According to the above functional test results, this paper improves the existing design problems, mainly repairs the bugs existing in the system functions, and then tests the performance of the design system. The corresponding performance test results are shown in Table 5: Table 5. System performance test results Test content:
Test indicators
Test result
Expected results
response time
Average response time of processing application
1.9 s