121 86 22MB
English Pages 545 [512] Year 2020
Smart Innovation, Systems and Technologies 180
Roumen Kountchev Srikanta Patnaik Junsheng Shi Margarita N. Favorskaya Editors
Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology Algorithms and Applications, Proceedings of IC3DIT 2019, Volume 2
123
Smart Innovation, Systems and Technologies Volume 180
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/8767
Roumen Kountchev Srikanta Patnaik Junsheng Shi Margarita N. Favorskaya •
•
•
Editors
Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology Algorithms and Applications, Proceedings of IC3DIT 2019, Volume 2
123
Editors Roumen Kountchev Department of Radio Communications and Video Technologies Technical University of Sofia Sofia, Bulgaria Junsheng Shi Department of Electro-Optical Engineering Yunnan Normal University Yunnan, China
Srikanta Patnaik Department of Computer Science and Engineering SOA University Bhubaneswar, Odisha, India Margarita N. Favorskaya Informatics and Computer Techniques Reshetnev Siberian State University of Science and Technology, Russian Federation Krasnoyarsk Krai, Russia
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-15-3866-7 ISBN 978-981-15-3867-4 (eBook) https://doi.org/10.1007/978-981-15-3867-4 © Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
International Program Committee
Honorary Chair Prof. Lakhmi C. Jain
General Chairs Prof. Dr. Srikanta Patnaik Prof. Dr. Junsheng Shi Prof. Dr. Roumen Kountchev
Organizing Chair Prof. Yonghang Tai
International Advisory Chair S. R. Dr. Roumiana Kountcheva
Founder of IRNet China Silai Zhou
v
vi
International Program Committee
Co-founder of IRNet China Er. Bin Hu
TPC Members Prof. Jian Wang, Kunming University of Science and Technology, China Prof. Xilong Qu, Hunan University of Finance and Economics, China Dr. Victor Chang, Teesside University, Middlesbrough, UK Prof. Vladicescu Popentiu, Florin, City University, UK Prof. Guangzhi Qu, Oakland University, USA Prof. Dr. Zhengtao Yu, Kunming University of Science and Technology Prof. V. S. S. Yadavalli, University of Pretoria, South Africa Prof. Bruno Apolloni, Università degli Studi di Milano, Italy Prof. Harry Bouwman, Delft University of Technology, Netherlands Prof. Shyi-Ming Chen, National Taiwan University of Science and Technology Prof. Yahaya Coulibaly, University Technology Malaysia, Malaysia Prof. Ing Kong, RMIT University, Australia Prof. Gerald Midgley, Centre for Systems Studies, University of Hull, UK Prof. Khubaib Ahmed, Hamdard University, Pakistan Prof. Moustafa Mohammed Eissa, Faculty of Engineering, Helwan University, Egypt Dr. Xilang Tang, Air Force Engineering University, China Dr. Yangjun Gao, Air Force Engineering University, China Dr. Fernando Boronat Seguí, Universitat Politecnica de Valencia, Spain Dr. Alexandros Fragkiadakis, Institute of Computer Science (FORTH-ICS), Greece Dr. Cristina Alcaraz, University of Malaga, Spain Dr. Mohamed Atef, Assiut University, Egypt Dr. Weilin Wang, University of Georgia, USA Dr. Bensafi Abd-Ei-Hamid, World Islamic Sciences and Education University, Jordan Dr. Yudi Gondokaryono, Institute of Teknologi Bandung, Indonesia Dr. Hadi Arabshahi, Ferdowsi University of Mashhad, Iran Dr. Qian Lv, Western Digital, USA Dr. Alojz Poredo, University of Ljubljana, Slovenia Dr. Mohamed F. El-Santawy, Cairo University, Egypt Dr. Tongpin Liu, University of Massachusetts Amherst, USA Dr. Seema Verma, Banasthali University, India
International Program Committee
vii
Sponsors Yunan Normal University IRNet International Academy Communication Center Interscience Research Network (IRNet) International Academy Communication Center is an independent, non-profit academic institute. It provides scholars, researchers of science, engineering and technology all over the world a professional, informational and intelligent academic platform and promotes academic research, communication and international cooperation.
Preface
The annual International Conference on 3D Imaging Technology (IC3DIT 2019) took place in August 15–18, 2019 in Kunming, Yunnan, China. The key aspect of this conference was the strong mixture of academia and industry. The IC3DIT 2019 provided a forum that brought together researchers and academia as well as practitioners from industry to meet and exchange their ideas and recent research development work on all aspects of images, their applications and other related areas. The conference established an effective platform for institutions and industries to share ideas and to present the works of scientists, engineers, educators and students from different countries. In Volume 2 of the Conference Proceedings are included papers which present new algorithms and techniques for 3D image analysis and reconstruction in various application areas, for big data (BD) processing and for the development of new VR technologies, 3D printers and displays. The volume comprises 60 chapters, which could be divided into the following five groups: 1. Chapters 1–13 present works in areas related to the processing of 3D images and data: 3D digital image correlation (DIC) technique for strain measurement in fiber reinforced polymer; quality detection and 3D evaluation model of wool fabric; wage supervision service based on BD and iris recognition; block segmentation and assembly technology of 3D printing structure; 3D reconstruction for common view field in hybrid vision system; 3D characteristics observation of ocean waves in coastal areas; 3D reconstruction based on left and right objects; 3D scene dynamic scanning technology of substation based on depth vision; fusion and localization technology of 3D model attitude and environment visualization of power inspection site; BD monitoring of enterprise growth; analysis of BD in library management and service; VR technology in the teaching of 3D animation course and properties of polymer materials based on the 3D printing technology; 2. Chapters 14–27 are aimed at the development of algorithms for: multi-scale pedestrian detection based on improved single shot multibox detector; activity recognition optimization using triaxial accelerometers; near duplicate video
ix
x
Preface
detection based on temporal and spatial key points; surface defect detection for the expanded thermoplastic polyurethane midsole based on machine vision; multi-view learning for glioblastoma image contrast enhancement; multi-focus image fusion based on convolutional sparse representation; collaborative sparse representation by pattern classification; method for irregular pupil positioning; adaptive segmentation of firefly’s fuzzy clustering of image; QR 2D code graphics correction based on morphological expansion closure and edge detection; new Hermitian self-orthogonal codes; graphic data analysis model oriented to knowledge base; eyeball image registration and fusion based on SIFT+RANSAC and digitization of Chinese calligraphy; 3. Chapters 28–36 present new algorithms in various application areas of the VR technologies: augmented reality guidance for ultrasound-guided puncture surgery training; assessment for the brain surgical simulation; visio-haptic simulation for the VR-based biopsy surgical navigation; process driving software for VR fusion; text to image synthesis with segmentation information guidance; VR technology in presentation of campus architecture animation; video mosaic system; brain cognition in the age of artificial intelligence and library book acquisition in the age of “Internet +”; 4. Chapters 37–53 are in the area of the control systems: fast and seamless image stitching of high resolution dunhuang murals; control system of double-wavelength high-power laser therapeutic instrument; application of DIC to shear crack measurement of concrete beam finite element modeling and analysis of seat; analysis of sensitivity and stiffness of the mobile car-door; reliability analysis and control of kinematic characteristics of automotive suspension; modeling and analysis of energy absorbing box; vehicle seat comfort scoring method; smart home system based on voice control; abnormal condition monitoring of UHV converter valve based on infrared/UV image; flight attitude acquisition system for multi-rotor small aircraft; vertical circulating parking garage; camera calibration based on real-time video; top line location of train pantograph; integrated operation and maintenance platform based on artificial intelligence; application of computer technology in garment production and decorative pattern design on the basis of parametric technology; 5. Chapters 54–60 are in the area of the new technologies related to 3D displays: binocular color fusion in 3D displays; measurement of binocular fusion limit in stereoscopic display; 3D display using complementary multiband bandpass filters and dynamic scanning backlight; time division multiplexing stereo display based on quantum dot scanning backlight; multi-pixels polymer-dispersed liquid crystal display; intelligent fusion technology of power equipment and business knowledge model for 3D display of grid terminals and visual fatigue related on parallax. All chapters of the book were reviewed and passed the plagiarism check. The editors express their thanks to IRNet for the excellent organization, coordination and support. Also, we thank the sponsors, the Organizing Committee members and
Preface
xi
the International Program Committee members of IC3DIT 2019 and the authors for their hard work needed for the preparation of this book. The book will be useful for the researchers, students and lecturers who study or work in the area of the contemporary 3D Imaging Technology.
Sofia, Bulgaria Bhubaneswar, India Yunnan, China Krasnoyarsk Krai, Russia
Editors Prof. Roumen Kountchev Prof. Srikanta Patnaik Prof. Junsheng Shi Prof. Margarita N. Favorskaya
Contents
1
2
3
4
5
6
7
3D-DIC Technique for Strain Measurement in FRP Strip Externally Bonded to Concrete Specimens . . . . . . . . . . . . . . . . . . . Liu Mei, Sen Guo, Tiansheng Shi, Zejie Pan, Weiwen Li, and Feng Xing
1
Establishment of Quality Detection and 3D Evaluation Model of Wool Fabric Based on Machine Vision . . . . . . . . . . . . . . . . . . . . Yang Chen, Jiajie Yin, Qiangqiang Lin, Shoufeng Jin, and Yi Li
13
The Research and Design of Wage Supervision Service Platform for Migrant Workers Based on Big Data and Iris Recognition Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dacan Li, Yuanyuan Gong, Guangshang Tang, Guicai Feng, and Linyi Jiang
23
Research on Block Segmentation and Assembly Technology of 3D Printing Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Qiao, Shangwei Liu, Qun Wei, Luting Wei, and Yingjie Wang
31
A Study on 3D Reconstruction Method for Common View Field in Hybrid Vision System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chang Lin, Haifeng Zhou, Wu Chen, and Yan Zhang
41
Three-Dimensional Characteristics Observation of Ocean Waves in Coastal Areas by Microwave Doppler Radar . . . . . . . . . . . . . . . Longgang Zhang
51
Three-Dimensional Reconstruction Based on Left Right Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiang Liu
59
xiii
xiv
8
9
Contents
Research on 3D Scene Dynamic Scanning Modeling Technology of Substation Based on Depth Vision . . . . . . . . . . . . . . . . . . . . . . . Yu Hai, Xu Min, Peng Lin, Liu Wei, Sun Rong, Shao Jian, and Wang Gang Fusion and Localization Technology of 3D Model Attitude and Real Physical Environment Visualization Information of Power Inspection Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xu Min, Yu Hai, Hou Zhansheng, Li Nige, Bao Xingchuan, Peng Lin, Wang Gang, Zhu Liang, Lu Hao, Wang Zhifeng, and Zhao Peng
10 The Research and Design of Big Data Monitoring Platform for Enterprise Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dacan Li, Yuanyuan Gong, Guangshang Tang, Guicai Feng, Yan Zhang, and Weimo Tian
67
75
81
11 Application Analysis of Big Data in Library Management and Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huang Weining
89
12 Application of Virtual Reality Technology in the Teaching of 3D Animation Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baiqiang Gan, Qiuping Dong, and Chi Zhang
95
13 Study on Properties of Polymer Materials Based on 3D Printing Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Weixiang Zhang and Lingming Yang 14 Improved SSD-Based Multi-scale Pedestrian Detection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Di Fan, Dawei Liu, Wanda Chi, Xiaoxin Liu, and Yongyi Li 15 Activity Recognition System Optimisation Using Triaxial Accelerometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Zhenghui Li, Bo Li, and Julien Le Kernec 16 Near-Duplicate Video Detection Based on Temporal and Spatial Key Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Diankun Zhang, Zhonghua Sun, and Kebin Jia 17 Surface Defect Detection Method for the E-TPU Midsole Based on Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Ruizhi Li, Song Liu, Liming Tang, Shiqiang Chen, and Liu Qin 18 A Multi-view Learning Approach for Glioblastoma Image Contrast Enhancement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Xiaoyan Wang, Zhengzhou An, Jing Zhou, and Yuchou Chang
Contents
xv
19 Multi-focus Image Fusion Based on Convolutional Sparse Representation with Mask Simulation . . . . . . . . . . . . . . . . . . . . . . . 159 Chengfang Zhang 20 A Collaborative Sparse Representation-Based Approach for Pattern Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Yaofang Hu and Yunjie Zhang 21 Research on an Irregular Pupil Positioning Method . . . . . . . . . . . . 179 Yong Zhao, Shouming Zhang, Huan Lei, Jingqi Ma, and Nan Wang 22 Adaptive Step Size of Firefly’s Fuzzy Clustering Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Yangyang Hu and Zengli Liu 23 Research on QR 2-D Code Graphics Correction Algorithms Based on Morphological Expansion Closure and Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Liu Peng, Liu Wen, Li Qiang, Duan Min, Dai Yue, and Nian Yiying 24 Some New Hermitian Self-orthogonal Codes Constructed on Quaternary Filed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Hongying Jiao, Jinguo Zhang, and Miaohua Liu 25 Graphic Data Analysis Model Oriented to Knowledge Base of Power Grid Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Liang Zhu, Lin Qiao, Li Bo Xu, and Bi Qi Liu 26 Eyeball Image Registration and Fusion Based on SIFT+RANSAC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Anqi Liu 27 On Digitization of Chinese Calligraphy . . . . . . . . . . . . . . . . . . . . . . 231 Zhiwei Zhu and Shi Lin 28 Augmented Reality Guidance System for Ultrasound-Guided Puncture Surgery Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Zhaoxiang Guo, Hongfei Yu, Zhibao Qin, Wenjing Xiao, Yang Shen, and Meijun Lai 29 Development of Subjective and Objective Assessment for the Brain Surgical Simulation—A Review . . . . . . . . . . . . . . . . . 247 Chengming Zhao, Hongfei Yu, Tao Liu, Yang Shen, and Yonghang Tai 30 Virtual Haptic Simulation for the VR-Based Biopsy Surgical Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Lin Xu, Chengming Zhao, and Licun Sun
xvi
Contents
31 An Experiment Process Driving Software Framework for Virtual–Reality Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Yanxing Liang, Yinghui Wang, and Yifei Jin 32 Text to Complicated Image Synthesis with Segmentation Information Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Zhiqiang Zhang, Yunye Zhang, Wenfa Liu, Wenxin Yu, Gang He, Ning Jiang, and Zhuo Yang 33 Application of VR Technology in Presentation of Campus Architecture Animation—A Case Study of City College of WUST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Yu Qian 34 Design of Video Mosaic System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Fei Yan, Wei-Qi Liu, Yin-Ping Liu, Bao-Yi Lu, and Zhen-Shen Song 35 Exploring the Formalization of Brain Cognition in the Age of Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Yang Libo 36 On the New Challenges Faced by Library Book Acquisition in the Age of “Internet +” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Dong Na 37 FSIS: Fast and Seamless Image Stitching of High-Resolution Dunhuang Murals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Ming Chen, Xudong Zhao, and Duanqing Xu 38 Design of the Control System of 532/940 nm Double-Wavelength High-Power Laser Therapeutic Instrument . . . . . . . . . . . . . . . . . . . 329 Ningning Dong, Jinjiang Cui, and Jiangen Xu 39 Application of DIC Technology to Shear Crack Measurement of Concrete Beam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Weiwen Li, Yujie Huang, Yuanhui Jiang, Tiansheng Shi, and Feng Xing 40 Finite Element Modeling and Analysis of Seat Comfort . . . . . . . . . 347 Kai Ma, Banghui Li, Zhipeng Yan, and Qiaoling Liu 41 Reanalysis of the Sensitivity and Stiffness of the Mobile Car Door . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Kai Ma, Banghui Li, Zhipeng Yan, and Qiaoling Liu 42 Reliability Analysis and Control Method of Kinematic Characteristics of Automotive Suspension . . . . . . . . . . . . . . . . . . . . 361 Kai Ma, Banghui Li, Zhipeng Yan, and Qiaoling Liu
Contents
xvii
43 Fast Modeling and Analysis of Energy-Absorbing Box . . . . . . . . . . 367 Kai Ma, Banghui Li, Zhipeng Yan, and Qiaoling Liu 44 Vehicle Seat Comfort Scoring Method . . . . . . . . . . . . . . . . . . . . . . 377 Kai Ma, Banghui Li, Zhipeng Yan, and Qiaoling Liu 45 The Smart Home System Based on Voice Control . . . . . . . . . . . . . 383 Chenwei Feng and Huimin Xie 46 Research on Abnormal Condition Monitoring System of UHV Converter Valve Based on Infrared/UV Image . . . . . . . . . . . . . . . . 393 Hai Yu, Min Xu, Lin Peng, He Wang, and Zhansheng Hou 47 Research and Design of the Flight Attitude Information Acquisition System for Multi-rotor Small Aircraft . . . . . . . . . . . . . 401 Kun Liu, Shiyou Li, Guangyuan Yang, and Jianbao Guo 48 Design of Vertical Circulating Parking Garage . . . . . . . . . . . . . . . . 409 Zhanbin Gu 49 A Camera Calibration Method Based on Real-Time Video . . . . . . . 419 Han Liu, Qi Sun, and Wei Wang 50 Top Line Location of Train Pantograph Based on Combination Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Zhen Tong, Guimei Gu, and Yinliang Yang 51 Design of Integrated Operation and Maintenance Platform Based on AIOps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Yuqiang Fan, Ke Xu, Dou Wu, Fan Yang, Yu Zeng, Zhenyu Tang, Jiazhou Li, and Xin Wang 52 Discussion on the Application of Computer Technology in Garment Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443 Xiaoxiu Liu 53 An Experiment on Decorative Pattern Design on the Basis of Parametric Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 Bing Xia and Fuye Sun 54 An Experimental Study on Binocular Color Fusion in 3D Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Hui Liu, Kai Chen, Qi Xiong, Junshen Shi, and Zaiqing Chen 55 A Quantitative Measurement of Binocular Fusion Limit in Stereoscopic Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Qi Xiong, Kai Chen, Hui Liu, Junshen Shi, and Zaiqing Chen 56 3D Display Using Complementary Multiband Bandpass Filters and Dynamic Scanning Backlight . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Bin Xu, Xueling Li, and Yuanqing Wang
xviii
Contents
57 Time-Division Multiplexing Stereo Display Based on Quantum Dot Scanning Backlight . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Hanshu Liu, Gangwei Chen, and Yuanqing Wang 58 Flexible Multi-pixels Polymer-Dispersed Liquid Crystal Display Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Duan Zhengguang, Wu Qinqin, Wang Yuanqing, and Liu Xuanyi 59 Research on Intelligent Fusion Technology of Power Equipment and Business Knowledge Model for 3D Display of Grid Terminals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Gang Wang, XiaoDong Zhang, ChengZhi Zhu, Hao Lu, ZhiFeng Wang, and Peng Zhao 60 Research on Visual Fatigue Related to Parallax . . . . . . . . . . . . . . . 513 Zihan Sun, Zerui Cheng, Haowen Liang, Hao Jiang, and Jiahui Wang Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
About the Editors
Prof. Dr. Roumen Kountchev, D.Sc., works at the Faculty of Telecommunications, Department of Radio Communications and Video Technologies—Technical University of Sofia, Bulgaria. He has published 341 papers in journals and conference proceedings, 15 books and 46 book chapters, and holds 20 patents. He is a member of the Euro Mediterranean Academy of Arts and Sciences; President of the Bulgarian Association for Pattern Recognition (member of IAPR); and editorial board member of the IJBST Journal Group, International Journal of Reasoning-based Intelligent Systems, and the international journal Broad Research in Artificial Intelligence and Neuroscience. Prof. Dr. Srikanta Patnaik works at the Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, India. He has published over 100 papers in international journals and conference proceedings, 2 textbooks and 32 edited volumes. He is the editor-in-chief of the International Journal of Information and Communication Technology, the International Journal of Computational Vision and Robotics (Inderscience Publishing House) and the book series Modeling and Optimization in Science and Technology (Springer), Advances in Computer and Electrical Engineering, and Advances in Medical Technologies and Clinical Practices (IGI-Global). Prof. Dr. Junsheng Shi is the Dean of the School of Physics and Electronic Information, Yunnan Normal University. He is a member of China Illuminating Engineering Society Image Technology Specialized Committee, the China Optical Technology Optical Society Professional Committee, and the Chinese Society of Image and Graphics Technical Committee on stereoscopic imaging. He also contributes to the journals Optical Engineering and the Journal of Display Technology. In 2004 and 2008, he received awards for scientific and technological progress and natural science in Yunnan Province. In the last five years, he has completed two major projects and published over 50 papers.
xix
xx
About the Editors
Prof. Dr. Margarita N. Favorskaya is the Head of the Department of Informatics and Computer Techniques at Reshetnev Siberian State University of Science and Technology, RF. She is an IPC member and has chaired invited sessions at over 30 international conferences. She is a reviewer for various journals: Neurocomputing, Knowledge Engineering and Soft Data Paradigms, Pattern Recognition Letters, and Engineering Applications of Artificial Intelligence, and an associate editor of Intelligent Decision Technologies Journal, International Journal of Knowledge-Based Intelligent Engineering Systems and the International Journal of Reasoning-based Intelligent Systems. She is also a reviewer and book editor for Springer.
Chapter 1
3D-DIC Technique for Strain Measurement in FRP Strip Externally Bonded to Concrete Specimens Liu Mei, Sen Guo, Tiansheng Shi, Zejie Pan, Weiwen Li, and Feng Xing
Abstract Three-dimensional digital image correlation (3D-DIC) is widely used in measuring the displacement and strain distribution due to its non-connected, highaccuracy and non-destructive property. In this paper, 3D-DIC method is used to study the bond performance between the bonded fiber-reinforced polymer (FRP) sheet and the concrete block during a single shear test. The strain distribution in FRP sheet during the whole loading process is obtained with DIC method. Based on the measured strain distribution, the local bond–slip behavior of FRP–concrete interface can be further analyzed. Therefore, the feasibility of the 3D-DIC system applied in experimental studies on the mechanical properties of civil engineering structures is proved.
L. Mei · S. Guo · T. Shi · W. Li (B) · F. Xing Guangdong Provincial Key Laboratory of Durability for Marine Civil Engineering, Shenzhen University, Shenzhen, China e-mail: [email protected] L. Mei e-mail: [email protected] S. Guo e-mail: [email protected] T. Shi e-mail: [email protected] F. Xing e-mail: [email protected] Z. Pan China Construction Design International (Shenzhen), Shenzhen, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_1
1
2
L. Mei et al.
1.1 Introduction Measuring displacement and strain distribution is a critical step in practical engineering. Digital image correlation (DIC), a contactless measuring technique, is a proper method to measure these parameters because of the outstanding features such as low equipment costs, short setup time, without difficulty in cabling, and high measurement accuracy. The DIC method was first proposed by Yamaguchi et al. [1]. In 1982, Peters and Ranson [2] proposed a new approach named 2D-DIC to transform digitized images into local surface displacements. The method has a high level of good accuracy when it is used to measure in-plane displacement. However, 2D-DIC approach will result in a very large error if the displacement is out of plane [3], because this technique is based on a single camera imaging system, which can only capture the in-plane displacement. Based on the binocular stereo vision system and following 2D-DIC approach, Luo and Chao et al. [4, 5] built and analyzed a three-dimensional DIC model through two cameras capturing the objective images simultaneously. Due to its high flexibility and accuracy, the technique, named 3D-DIC, has a wide application scope [6–8]. In this study, 3D-DIC is used to investigate the displacement and strain distribution of FRP strip which is bonded to a concrete block in a single-shear test (Fig. 1.1). The displacement and strain values of FRP sheets can be captured by 3D-DIC system without using strain gauges and potentiometers, then the bond–slip curve is obtained.
MTS Specimen
VIC-3D
Fig. 1.1 MTS and single-shear test
1 3D-DIC Technique for Strain Measurement …
3
1.2 Basic Principle and Advantages of 3D-DIC System 1.2.1 Basic Principle and Test Setup of 3D-DIC System The basic principle of 3D-DIC system is the binocular stereo vision system, which works like two eyes of human body. The system needs two cameras at different positions to shoot the same scene and capture two digital images. Through a variety of matching algorithms, the corresponding image pixel points can be obtained from these images. The coordinate of each point can be measured by using the triangulation method (Fig. 1.2) and binocular parallax, to analyze a series of triangles that are formed between the camera and the object. Finally, the surface morphology and 3D deformation can be obtained. The test setup of 3D-DIC system (Fig. 1.3) is composed of camera, optical lens, image acquisition card, computer, and corresponding image storage medium. Two cameras are used to collect the speckle image of the specimens before and after the deformation. Then 3D-DIC can obtain the coordinate, displacement, and strain of any point on the surface of the specimens. During the test, the speckle marking and the image calibration are the critical processes, which will influence the accuracy of data. Besides, the slight environmental vibration and other disturbances will also affect the accuracy.
Fig. 1.2 Basic principle of 3D-DIC
4
L. Mei et al.
Fig. 1.3 One of the 3D-DIC system
1.2.2 Advantages There are many other optical methods used to measure deformation, such as holographic interferometry, moire interferometry, and speckle electron interferometry [9]. It is difficult to apply these methods to the field measurement because they are based on the interference theory, which is easily affected by the environment. Compared with the mentioned methods, 3D-DIC system has the advantages, such as low environmental requirement, long observation period, and wide measuring range. Moreover, 3D-DIC records the full-field deformation of the specimens, which can directly reflect the full-field displacement and strain of the specimens.
1.3 Operation Process of 3D-DIC 1.3.1 Speckle Marking Before testing, speckle needs to be marked on the FRP strip. First, keep the surface of FRP strip clean and apply a thin coat of matte white spray paint evenly over the surface of the FRP test area. 10 min later, spray paint again. Repeat this process for three times until the FRP strips appear to have an even matte white surface (Fig. 1.4). Until the white paint dries, the speckle marking tool is used to mark the speckles with appropriate size on the test area of FRP. During the process, ensure that the
1 3D-DIC Technique for Strain Measurement …
5
Fig. 1.4 Matte white spray paint
Fig. 1.5 Speckle making
speckles have a certain density, neither too thin nor dense (Fig. 1.5). Finally, four red dots are drawn on the four corners of the effective bonding area, so that the scope of the bonding area can be easily identified in the following analysis.
6
L. Mei et al.
1.3.2 Operation Process The specific operation process of 3D-DIC system is listed as follows: (1) Equipment installation. First, fix the two lenses on the tripod and connect them with the computer. Ensure the two lenses are in the same horizontal position. Then, install two light plates and adjust the light plate angle to avoid the vertical irradiation of the specimen surface. Finally, turn on the computer and run the application “VIC-Snap”. (2) Lenses focusing. Click on the crosshair appearing in the GUI interface of the application to adjust the lens direction and keep the crosshairs of the lens align with the center of the specimen. After that, activate the focusing state by clicking on the “Sigma” sign on the image to help lens focusing. The focus lock of the lens is tightened to lock the focal length until the image is closest to purple. (3) Calibration image acquisition. Click on the “Calibration Image” button and then press the space key to acquire the calibration image. During the process of image acquisition, the calibration plate needs to change different positions and angles to ensure the accuracy. A total of 20–40 images are collected to calculate and analyze the error. When the error value is less than 0.1, the acquisition is successful. Otherwise, it needs to be corrected again. (4) Speckle image acquisition. Click on the “Speckle Image” button and set the frame rate of image capture in “Time Capture”. In this test, the frame rate is determined as one frame per second. Finally, click on “Start” to capture the speckle images of the specimen surface. At the same time, the 3D-DIC system and the MTS loading device need to run synchronously. (5) Image analysis and calculation. After collecting the calibration images and speckle images, start the 3D-DIC software and import the two groups of images. Then, select the test area for analysis and calculation. After the operation, establish the coordinate system and select the required points. Finally, export the data.
1.4 Specimens and Materials Several specimens are designed to study the bond–slip relation of FRP–concrete interface. All the specimens have the same size of 100 mm × 100 mm × 250 mm. The FRP sheet with a length of 350 mm, a width of 50 mm, and a thickness of 0.167 mm was used in this test, and a 50 mm × 50 mm FRP reinforcing sheet is pasted on each side of the loaded end. The bond length of the FRP sheet is 150 mm. The front view and the top view of the specimen are shown in Fig. 1.6.
1 3D-DIC Technique for Strain Measurement …
7
(a) The front view
(b) The top view
Fig. 1.6 Schematic diagram of the specimen
1.5 Results and Analysis 1.5.1 Strain Distribution of FRP Strips The effective bonding length of FRP of all specimens is 150 mm. Along the central axis of FRP strip from the free end to the loading end, totally 15 measuring points are selected at intervals of 10 mm. The 3D-DIC system can automatically analyze and calculate the strain value at each measuring point. Figure 1.7 shows the strain distribution diagram of FRP–concrete specimen surface drawn by the 3D-DIC system. The strain distribution of FRP in the loading process can be observed intuitively. According to legend on the right side of the figure, the maximum value of strain is red, while the minimum value is blue and purple. It can be seen that the strain distribution gradually expands downward from left to right, indicating that the shear stress at the interface gradually transfers from the loading end to the free end, and when the load is small, only the strain near the loading end shows a blue-green color (0–4000 με).With the increase of the load, the purple color at zero strain area gradually fades downward, and the strain value and the range near the loading end continue to increase. When the load reaches a certain
8
L. Mei et al.
6kN
9kN
12kN
15kN
22.51kN
Fig. 1.7 Strain distribution of specimen
value of 15 kN, the strain transfer range rapidly increases and the debonding accelerates. When the load is greater than 15 kN, the bearing capacity gradually decreases until the strain appears at the free end, and finally the debonding failure occurs. Figure 1.8 shows the strain distribution curve of FRP–concrete specimen surface based on the measuring data using the 3D-DIC method. At the beginning of the loading (0–14 kN), the strain distribution curve of FRP sheet near the loading end presents the trend of concave, and with the increase of the load, the trend becomes more pronounced. Away from the loading end, the strain of FRP is very small, indicating that the stress has not yet been transferred to this region. According to the curve, it can be inferred that the effective bond length of specimens is between 80 and 100 mm. Fig. 1.8 Strain distribution curve of specimen
10000
4kN 6kN 10kN 14kN 16kN 18kN 20.5kN 19.8kN
9000 8000
Strain (µε)
7000 6000 5000 4000 3000 2000 1000 0 0
20
40
60
80
100
120
Distance from loading end (mm)
140
160
1 3D-DIC Technique for Strain Measurement …
9
1.5.2 Bond–Slip Relation According to the strain distribution of FRP, it is able to obtain the bond–slip curve of FRP–concrete interface. The specific derivation process is shown as follows. With the mechanical analysis of FRP (Fig. 1.9), the equilibrium equation can be given by: τ b f d x + σ f b f t f = σ f + dσ f b f t f
(1.1)
where, τ the local bond stress the axial stress in the FRP plate σf b f , t f the width and thickness of the FRP plate. The local bond stress distribution can be calculated from Eq. (1.1): τ = tf
dσ dx
(1.2)
Because of σ = E f ε f , Eq. (1.1) can be expressed as: τ = Eftf where E f indicates the elasticity modulus of FRP The local slip can be calculated by:
Fig. 1.9 Mechanical analysis
dε f dx
(1.3)
10
L. Mei et al.
Fig. 1.10 Bond–slip curve of different measure point
14
45mm 105mm 115mm 125mm 135mm
Local bond stress (MPa)
12 10 8 6 4 2 0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Slip (mm)
x s = s0 +
ε f dx
(4)
0
where s0 means the slip of the free end. Figure 1.10 shows the bond–slip curves of different measure point. The shape of these curves varies from case to case; however, generally the curves can be divided into two parts: linear ascending section and nonlinear descending section. The bond stress reaches the peak at a small value of the slip of the interface. After reaching the peak value, the bond stress gradually approaches 0 MPa with the further development of the slip. But some measuring points still have bond stress even at a very large slip, which indicates that the debonding interface can still transfer a part of the bond stress due to the friction and interlock effect between the concrete aggregates. In addition, it can also be observed in the figure that the initial stiffness is bigger than the secant stiffness in maximum bond stress. At the early stage of loading, the FRP–concrete interface is in the elastic deformation stage. With the continuous development of the slip, many small cracks appear in the shallow interface of the concrete, resulting in the rapid degradation of the interface stiffness [10].
1.6 Conclusions In this study, 3D-DIC system is used to measure the strain values of FRP sheets. Several specimens were designed to investigate the bond performance of FRP–concrete interface. Based on the analysis and discussion presented above, the following conclusions can be drawn: • 3D-DIC system is simpler and more intuitive comparing to traditional displacement and deformation measuring system.
1 3D-DIC Technique for Strain Measurement …
11
• Strain distribution of FRP strips can be estimated with sufficient accuracy using 3D-DIC system. • According to analysis of the strain distribution of the specimens, the effective bond length is about 90 mm. When the bond strength reaches the maximum value, the slip is very small. With the development of the slip, the bond strength decreases rapidly. Acknowledgements The authors would like to gratefully acknowledge the National Natural Science Foundation of China (Grant nos 51678365, 51708359, and 51878415), the Shenzhen Science, Technology and Innovation Commission (SZSTI) Basic Research Program (no. JCYJ20170818100641730, 20170817102125407), and the Scientific Research Startup Fund of Shenzhen University (Grant no. 2016067) for financial support on this study.
References 1. Yamaguchi, I.: A laser-speckle strain gauge. J. Phys. E: Sci. Instrum. 14(11), 1270 (2000) 2. Peters, W.H.: Digital imaging technique in experimental stress analysis. J. Opt. Eng. 21(3), 427–431 (1982) 3. Sutton, M.A., Yan, J.H., Tiwari, V., et al.: The effect of out-of-plane motion on 2D and 3D digital image correlation measurements. J. Opt. Lasers Eng. 46(10), 746–757 (2008) 4. Luo, P.F., Chao, Y.J., Sutton, M.A., et al.: Accurate measurement of three-dimensional deformations in deformable and rigid bodies using computer vision. Exp. Mech. 33(2), 123–132 (1993) 5. Chao, Y.J.: Application of stereo vision to three-dimensional deformation analyses in fracture experiments. J. Opt. Eng. 33(3), 981 (1994) 6. Chen, Y.J., Sun, S.J., Ji, C.M.: Development and application of 3D digital image correlation (3D DIC) in deformation measurement of materials. J. Aeronaut. Mater. 37(4), 90–100 (2017) 7. Hore, S., Chatterjee, S., et al.: Neural-based prediction of structural failure of multi-storied RC buildings. Struct. Eng. Mech. 58(3), 459–473 (2016) 8. Chatterjee, S., Sarkar, S., et al.: Structural failure classification for reinforced concrete buildings using trained neural network based multi-objective genetic algorithm. Struct. Eng. Mech. 63(4) (2017) 9. Dai, S.H., Dong, Y.F.: Study on virtual photoelasticity experiment. J. Liaoning Tech. Univ. (2011) 10. Pan, Z.J.: Experimental Research on the Bond Behavior Between CFRP and Corroded Reinforced Concrete. Shenzhen University, Shenzhen (2018)
Chapter 2
Establishment of Quality Detection and 3D Evaluation Model of Wool Fabric Based on Machine Vision Yang Chen, Jiajie Yin, Qiangqiang Lin, Shoufeng Jin, and Yi Li
Abstract A method for detecting the surface quality of fluff fabrics based on machine vision using MATLAB is proposed. The original fluff and roller axial cutaways were obtained by the principle of light-cutting imaging. The maximum inter-class variance method and morphological method were used to segment the fluff region. The Canny operator effectively extracted the upper edge contour and thickness reference line of the fluff region. The thickness parameter in the vertical direction of the pile region and the coverage degree parameter model in the horizontal direction are established in three dimensions, and the surface quality of the pile fabric is comprehensively evaluated by parameters in two directions. We use MATLAB to obtain the three-dimensional shape of the surface of the fluff fabric, in order to comprehensively analyze and evaluate the surface quality of the fluff.
2.1 Introduction Traditional fluff fabric surface quality testing relies mainly on experienced people to judge with the naked eye and touch [1], which is difficult to achieve the objective measurement of fabric [2, 3]. Machine vision technology has become a popular choice for fabric quality testing [4]. The use of machine vision for fluff detection can avoid individual detection results, reduce detection grading errors, and improve efficiency and accuracy [5, 6]. T. J. Kang uses laser line scanning technology to obtain three-dimensional images of fabric surface on the fabric surface, and effectively extracts the number of hairballs, hairball area, and hairball density on the fabric surface [7]. Saharkinz uses a twodimensional fast Fourier transform to extract the pilling characteristics of the fabric surface in the frequency domain and realize the segmentation of the fabric structure and the pilling area [8]. Xu Bugao reconstructed the three-dimensional image of the fabric surface by using binocular vision technology, and used the seed growth method Y. Chen (B) · J. Yin · Q. Lin · S. Jin · Y. Li Mechanical and Electrical Engineering, Xi’an Polytechnic University, Xi’an, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_2
13
14
Y. Chen et al.
with the depth local maximum value as the growth point to detect the pilling state of the fabric [9]. J. M. Carstensen used the functional spectrum of Fourier transform to analyze the parameters of the hairball, and realized the objective analysis and evaluation of the pilling performance of the fabric [10]. Zhao Daxing et al. performed image sampling on the same section of the fabric surface, extracted feature values, and achieved grade evaluation of fabric surface quality through template matching [11]. After Liu Xiaojun filtered the color fabric and removed the pre-treatment of uneven illumination, the algorithm based on edge flow was used to effectively segment the pilling of the color fabric [12]. Yu Lingqi et al. proposed a method of microscopic optical sectioning. By moving the objective lens of the microscope vertically, the depth sequence images of the fabric surface under different focusing were obtained, and the information of the pilling of the fabric surface was obtained by reconstruction [13]. According to the surface characteristics of the fleece fabric, this paper proposes a method based on machine vision to evaluate the surface quality of fluff fabric. The tangential image is used to obtain the surface contour image of the fluff fabric, extract the edge features, obtain the edge coordinates of the fluff image, and obtain the thickness measurement principle: the thickness reference line. The three-dimensional parametric model of the villus surface was constructed from both vertical and horizontal directions to quantitatively analyze the villus mass, and the three-dimensional topography of the villus surface was synthesized using MATLAB.
2.2 Fluffy Surface Quality Measurement Principle 2.2.1 Measurement System Principle Introduction In order to obtain the surface profile of the pile fabric, the tangential imaging method of the fabric surface contour as shown in Fig. 2.1 is adopted. The fabric to be tested is laid on the conveying device, the light source is placed under the roller, and the industrial camera and the light source are, respectively, mounted on the fabric to be tested. On both sides, the backlight is formed to avoid interference with the surface texture and color characteristics of the fabric to be tested, and the fluff profile of the surface of the tested fabric is highlighted, and the obtained contour image of the pile fabric is as shown in Fig. 2.2a.
2.2.2 Thickness Measurement Principle Studies have shown that when the fabric is scratched, only the fibers on the outer surface of the fabric are grasped. The operation does not have a great influence on the thickness of the fabric itself. Therefore, in theory, the thickness of the pile fabric
2 Establishment of Quality Detection and 3D Evaluation Model …
15
Fig. 2.1 Tangential imaging of fabric surface contour
(a) Fluff image
(b) Roller tangential diagram
Fig. 2.2 Fleece surface contour image. a Fluff image. b Roller tangential diagram
after the hair is grabbed. It is equal to the thickness of the fabric itself plus the thickness of the pile. According to the principle of thickness measurement, as shown in Fig. 2.3, the thickness image of the pile fabric can be obtained by subtracting the
Fig. 2.3 Thickness schematic
16
Y. Chen et al.
image Fig. 2.2a of the pile fabric from the image Fig. 2.2b of the section of the roller, and the image of the axial section of the roller is obtained by the camera. In Fig. 2.2b, the test uses the roller axis tangent as a basis for assessing the thickness of the pile fabric.
2.3 Extraction of the Upper Edge of the Fluff and the Baseline The edge line of the upper edge and the background area of the pile area is the upper edge contour of the pile. The upper edge of the pile area is detected by the Canny operator [14]. The upper edge contour and the reference line of the extracted pile fabric are as shown in Fig. 2.4, wherein the difference between the edge curve and the reference line characterizes the thickness of the pile fabric, and the upper edge curve characterizes the distribution of the pile surface on the pile fabric. The upper edge curve and the reference line of the pile fabric in Fig. 2.4 are converted from image coordinates to rectangular coordinates as shown in Fig. 2.5. Selecting multiple continuous fluff pictures, using MATLAB software to synthesize the three-dimensional topography of the villus surface is shown in Fig. 2.6. The threedimensional topography of the villus surface reflects the height deviation of the pile region in the z-axis direction, and reflects the spacing of the surface features of the pile surface in the x and y-axis directions. The surface quality of the pile fabric was objectively evaluated by constructing a thickness parameter in the z-axis direction and constructing an undulation parameter in the x and y-axis directions.
Height/pixel
Fig. 2.4 Fleece edge contour extraction
1400
Upper edge of fluff Baseline
1300 1200 1100 200
400
600
800
1000
1200
1400
Height/pixel
Fig. 2.5 Contour edge curve in the coordinate system
1600
1800
2000
2200
2400
2 Establishment of Quality Detection and 3D Evaluation Model …
17
Fig. 2.6 Fluff detection system
2.4 Villus Surface Quality Estimation Parameter Model 2.4.1 Selection of Parameter Model (1) Villus Thickness Parameter Model In two dimensions, according to the thickness measurement principle, the difference between the distance from the point of the upper edge of the pile to the reference line and the thickness of the fabric is the thickness of the pile. Let the upper edge curve of the fluff be g(x, y) and the reference line be t(x, y), then the average thickness of the regional fluff h(x, y) is K h(x, y) =
n
|g(xi , yi ) − t(xi , yi )|
i=1
−k
n
(2.1)
where n is the number of edge points, k is the thickness of the fabric itself, and K is the resolution of the object surface. The thickness of the fluff region is characterized by the average thickness; the larger the value, the thicker the fluff region, and vice versa. Similarly, in the three-dimensional model of the pile fabric, the average thickness of the pile H (x, y) is as shown in Formula 2.2. n
H (x, y) =
i=1
n
hi (2.2)
(2) Coverage Parameter Model In the horizontal direction, the undulating state of the surface l (x, y, z) on the pile characterizes the coverage of the surface profile of the pile, Therefore, the least
18
Y. Chen et al.
squares surface of the upper edge of the pile is selected as the reference surface, and the fluff relief parameter is constructed [15]. The mathematical expression of the fluff coverage degree parameter Gq is as shown in the Formula (2.3). S is the orthographic projection of the sampling area at the reference plane ¨ 1 Gq = l 2 (x, y, z)d xd y S
(2.3)
S
Gq represents the standard deviation of the distribution of the points on the surface profile of the fluff [16]. This parameter can reflect the extent to which the contour of the upper edge of the fluff deviates from the reference plane, that is, the degree of undulation of the surface profile of the fluff can be expressed to some extent. The smaller the contour of the pile surface, the flatter the surface of the pile, and the better the coverage of the pile.
2.5 Test 2.5.1 Test Platform According to the research needs, the experiment selected Daheng industrial area array camera, the camera resolution is 1024 × 1024 pixel, and the frame rate is 59 fps. According to the detection range, size, and range of the actual fluff, the lens is finally determined to be a Computar lens with a focal length of 8 mm. The light source uses a strip LED light source. The construction of the fluff detection test platform is shown in Fig. 2.6. The system has a resolution of 12.5, with K = 0.047 mm/pixel.
2.5.2 Test Data Analysis Three different color fluff fabrics were selected for testing, namely blue fabric, white fabric, and black fabric, as shown in Fig. 2.7. The thickness of the fabric itself was measured to be 2.036 mm. The thickness variation of the pile region is shown Fig. 2.7 Fabric type map
(a) blue
(b) white
(c) black
2 Establishment of Quality Detection and 3D Evaluation Model …
19
in Fig. 2.8a, c, f. A continuous three-dimensional fluff picture was selected using MATLAB software to synthesize a three-dimensional model of the fluff surface as shown in Fig. 2.8b, d, f. It can be seen from the thickness curves Fig. 2.8a, c, e of Fig. 2.8 that in Fig. 2.8a the fabric fluff thickness is high and the thickness variation is the largest; in Fig. 2.8e the fabric fluff thickness is low, and the thickness variation fluctuation is minimum; in Fig. 2.8c the fabric fluff thickness and thickness undulation are between Fig. 2.8a, e. Since the average thickness of the fluff in the two groups Fig. 2.8c, e is close, it is impossible to effectively distinguish and compare the fluff quality of the two. As can be seen from the three-dimensional topography in Fig. 2.8, the degree of surface undulation of Fig. 2.8e, f is significantly lower than Fig. 2.8b. Therefore, Fig. 2.8b has the lowest surface coverage of the fluff. Figure 2.8d shows the fluency of the left side of the figure is relatively large, the right and middle undulations are small, and the degree of fluff coverage is good. Both sides of the figure (Fig. 2.8f) have a large degree of undulation, and the middle part has a low degree of undulation and good
Height / mm
8
Wool fabric thickness The average thickness
6
4 100
200
300
400
500
600
700
Number of pictures
(a) Blue fluff fabric thickness curve
Height / mm
5
(b) 3D topography of blue fluff fabric surface
Wool fabric thickness The average thickness
4 3 100
200
300
400
500
Number of pictures
600
700
(c) White fluff fabric thickness curve
Height / mm
5
(d) 3D topography of white fluff fabric surface
Wool fabric thickness The average thickness
4 3 2 1
100
200
300
400
500
Number of pictures
600
(e) Black fluff fabric thickness curve
700
(f) 3D topography of black fluff fabric surface
Fig. 2.8 Three-dimensional topography of the thickness curve of the fluff and the surface of the fluff
20
Y. Chen et al.
Table 2.1 Wool fabric thickness and coverage parameters Fabric type
H (x, y)/mm
Gq /mm
(a)
5.12
7.21
(b)
3.87
4.86
(c)
2.35
3.42
coverage. These two parameters are combined to compare the surface fluff quality of the fabric. It can be seen from Table 2.1 that the thickness of the fabric c is significantly smaller than the two fabrics a and b. The fabric a has the greatest degree of undulation. Both the thickness and the surface relief in the fabric b are between a and c. The test results show that the test results of the method are substantially the same as those of the fluff shown in Fig. 2.7, and specific parameter values for evaluating the quality of the fluff can be given.
2.6 Conclusion The tangential imaging method of fabric surface contour was designed. Through the backlight imaging, the image of the fluff area and the axial section of the roller were obtained without interference from the color and texture of the fabric. The complete fluff region and roller region were obtained by the maximum inter-class variance method and morphological method. The Canny operator was used to effectively extract the upper edge contour feature and thickness reference line of the fluff region. The three-dimensional evaluation model of pile thickness and pile coverage was established by thickness measurement principle to judge the surface quality of the pile. The method describes the surface quality of the fluff from both vertical and horizontal directions. The results show that the method is used to quantify the parameters of the surface fluff quality of the fabric after fluffing.
References 1. Yang, S.L., Ma, S., Ding, Z.P., Fan, H.L., Xue, H.H.: Test system for fabric surface fluff rate using machine vision. J. Text. E Res. 38(6), 118–123 (2017) 2. Zhang, G.H., Xin, B.J.: Application of image processing technology in yarn hairiness detection. J. Hebei Univ. Sci. Technol. 37(1), 76–82 (2016) 3. Sanaeifar, A., Bakhshipour, A., Miguel, D.L.G.: Prediction of banana quality indices from color features using support vector regression. Talanta, 54–61 (2016) 4. Li, P.F., Zhang, L., Ma, H.: Machine-based flat screen printing machine flower detection system. In: China International Wool Textile Conference and iwto Wool Forum, pp. 324–328 (2006) 5. Chen, F.D., Zhang, X.F.: Research on temperature control of glass fiber leakage plate based on machine vision technology. Sci. Technol. Innov. Rev. 6(11), 1 (2013)
2 Establishment of Quality Detection and 3D Evaluation Model …
21
6. Dong, C.W., Zhu, H.K., Zhao, J.W., Jiang, Y.W., Yuan, H.B., Chen, Q.S.: Sensory quality evaluation of bud-shaped green tea based on machine vision and nonlinearity. J. Zhejiang Univ.-Sci. B 6, 95–99 (2017) 7. Kang, T.J., Cho, D.H., Kim, S.M.: Objective evaluation of fabric pilling using stereovision. Text. Res. J. 74(11), 1013–1017 (2004) 8. Saharkhiz, S., Abdorazaghi, M.: The performance of different clustering methods in the objective assessment of fabric pilling. J. Eng. Fabr. Fibers (JEFF) 7(4), 35–41 (2012) 9. Yu, X.W., Wang, R.W.: Stereovision for three-dimensional measurements of fabric pilling. Text.E Res. J. 81(20), 2168–2179 (2011) 10. Carstensen, J.M., Jensen, K.L.: Fuzz and pills evaluated on knitted textiles by image analysis. Text. Res. J. 72(72), 34–36 (2002) 11. Zhao, D.X., Zhu, J.L., Li, J.L., Yan, Y.: Research on fabric detection method based on machine vision. Comput. Eng. Des. 29(1), 99–101 (2008) 12. Liu, X.J.: Design of textile pilling grading system. In: National Conference on Information Acquisition and Processing, pp. 691–694 (2009) 13. Yu, L.J., Wang, R.W.: Acquisition of fabric fluff depth information based on microscopic optical sectioning method. J. Donghua Univ. (Natural Science) 44(03), 54–59(2018) 14. Ji, Y.L., Zhang, T.S., Yan, Y.F., Zhang, F., Meng, C.X.: Research on color difference detection of textile fabrics based on machine vision. J. Xi’an Eng. Univ. (4), 438–442(2018) 15. Yang, J., Li, L.: Surface roughness measurement and 3D evaluation based on machine vision. Opt. Technol. 42(6), 491–495 (2016) 16. He, B.F., Wei, C.E., Liu, J.X., Ding, S.Y., Shi, Z.Y.: Characterization and application of threedimensional surface roughness. Opt. Precis. Eng. 26(08), 164–181 (2018)
Chapter 3
The Research and Design of Wage Supervision Service Platform for Migrant Workers Based on Big Data and Iris Recognition Technology Dacan Li, Yuanyuan Gong, Guangshang Tang, Guicai Feng, and Linyi Jiang Abstract With the development of society, more migrant workers are working in cities. The situation of migrant workers is complex, the forms of employment are various and the supervision is difficult. In recent years, we often hear about the problem of arrears or arbitrary deduction of wages of migrant workers, which greatly affects the legitimate rights and interests of migrant workers. In order to better solve the problems of wage payment and wage supervision of migrant workers, and better safeguard the legitimate rights and interests of migrant workers, it is necessary to design and develop a set of service platform for migrant workers’ wage supervision to deal with the above problems.
3.1 Background Introduction Migrant workers refer to those who are peasants but mainly engaged in nonagricultural work and whose main source of income is salary during the period of economic and social transformation in China. The vast number of migrant workers working in cities not only promoted the development of urban economy, digested the surplus labor in rural areas, but also increased farmers’ income, thus changing the backward situation in rural areas and enhancing China’s comprehensive national strength [1, 2]. In recent years, the problem of migrant workers’ wage-asking has seriously troubled the government and the society. Various policies have been issued from the central government to the local government to ensure that migrant workers’ wages are paid in full and on time. However, in practice, due to the lack of government manpower, it is difficult to achieve real-time monitoring. When problems are exposed, a D. Li · Y. Gong (B) · G. Tang · G. Feng Shiyuan College of Nanning Normal University, Nanning 530226, China e-mail: [email protected] L. Jiang Guizhou Lutai Technology Co., Ltd., Guizhou Shuanglong Airport Economic Zone, Guiyang 550002, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_3
23
24
D. Li et al.
series of problems not only affect social stability but also increase the difficulty for the government to deal with them [3, 4]. Therefore, the issue of wage supervision of migrant workers has become a major pain point in society. In order to solve the above problems, we designed this platform, hoping to realize the real-time supervision and analysis of the status of migrant workers, to realize the dynamic management of site, migrant workers’ attendance and farmers’ wages, and to ensure the scientific, accurate and efficient supervision of migrant workers’ wages. The online implementation platform can effectively solve the problem of arrears of migrant workers’ wages, which is of great practical significance for safeguarding the legitimate rights and interests of migrant workers.
3.2 Technology Introduction 3.2.1 Big Data Technology In the era of big data, the application of data has penetrated into all walks of life. Traditional data mining and analysis can no longer meet the needs of industry development. Big data technology has brought a new perspective for business analysis and industry development, and will fully stimulate the impact and promotion of data on social development [5]. The system of big data technology is huge and complex. The basic technologies include data acquisition, data pre-processing, distributed storage, database, data warehouse, machine learning, parallel computing, visualization and other technical categories and different technical levels. The general data processing framework is mainly divided into the following aspects: data acquisition and pre-processing, data storage, data cleaning, data query analysis and data visualization, as shown in Fig. 3.1.
BI application
Multidimensional Analysis of Large Data
Data Dashboard
Data Early Warning
Data Acquisition and Preprocessing Data Storage and Data Cleaning
Structured Data
Unstructured Data
Fig. 3.1 Big data processing framework
3 The Research and Design of Wage Supervision Service Platform … Fig. 3.2 Iris recognition process
25
Iris Image Feature Matching
Feature Extraction and Coding
Image Preprocessing Iris Registration
Iris image normalization
Image Enhancement
Iris Image Acquisition
Big data analysis is the extraction of the value of massive and complex data, and the most valuable part is predictive analysis, which can help data scientists better understand data through data visualization, statistical pattern recognition, data description and other data mining forms, and then make predictive decisions [6].
3.2.2 Iris Recognition Technology Human iris tissue is the most ideal identification basis because of its uniqueness, highest stability, lifelong invariance and anti-deception [7]. The process of iris recognition technology is generally divided into four steps: iris image acquisition, image pre-processing, feature extraction and feature matching, as shown in Fig. 3.2.
3.3 Functional Demand Analysis of Wage Supervision Platform Human iris tissue has the characteristics of uniqueness, stability, immutability and anti-deception. In the process of software design and development, iris recognition technology is the most ideal basis for identification [8]. It is far beyond the high uniqueness of fingerprint recognition and can hardly imitate counterfeiting. Only in this way can we truly solve the current situation of obtaining real data. The overall use-case diagram is shown in Fig. 3.3. According to the actual needs of migrant workers’ wage supervision business and the design objectives of the system, the main functions of the system include: migrant workers information management, iris attendance management, news announcement management, enterprise project management, site management, wage management,
26
D. Li et al. Wage Supervision Service Platform migrant workers information management iris attendance management
Migrant workers
news announcement management
Enterprise Leading Users
Enterprise project management
site management
Wage supervisors
wage management system administrator
migrant workers rights protection management Multi-dimensional analysis of large data
Fig. 3.3 The overall use case diagram
migrant workers’ rights protection management and multi-dimensional analysis of large data.
3.4 Logical Architecture Design of Wage Supervision Platform Through the analysis of the business and functional requirements of the big data service platform for migrant workers’ salary supervision, the system logic architecture is divided into display layer, business logic layer and data layer, as shown in Fig. 3.4. a. Presentation Layer The presentation layer is a page of human–computer interaction, through which operation instructions are issued and then processed internally by the business logic layer [9].
3 The Research and Design of Wage Supervision Service Platform …
Presentation Layer
Business Logic Layer
27
Login Interface PC-side Page
APP- side Page
Wechat Applet Page
Other UI Pages
Migrant workers information management
iris attendance management
News announcement management
Enterprise project management
Site management
Wage management
Migrant workers rights protection management
Multi -dimensional analysis of large data
Business Support
Data Layer
Data Warehouse Technology
Big data Analysis technology
System Settings
Application Integration
Migrant Workers Basic Information Data
Attendance data
Wage data
Site data
Backup data
Fig. 3.4 System logic architecture diagram
b. Business Logic Layer The business logic layer mainly realizes the functions of the platform, completes the business processing of the platform and feeds back the information of the processing results to the presentation layer for users to use. c. Data Layer The data layer mainly implements the data storage and data processing of the platform, and provides data support for the business logic layer by extracting, transforming and loading data [10]. Data layer mainly includes basic information data, attendance data, wage data, site data and backup data information of migrant workers, as well as other unstructured data information.
3.5 Network Topology Structure Design of Wage Supervision Platform Network topology refers to the use of transmission media to connect the hardware devices related to the system in the network. The big data service platform of migrant workers’ salary supervision is deployed on the server of the supervision department.
28
D. Li et al. Information Center Hot standby server
Application Server
Web Server
Database Server
FTP Server system administrator
system administrator
Intranet Core Switch Branch
Handheld device users Mobile network 5G/WiFi
Secondary Switch
End User
End User
Internet
End User
Handheld device users
Fig. 3.5 System network topology
The server includes application server, web server, database server and FTP server. Other branches use the system through intranet, extranet and mobile network to complete the data processing of migrant workers’ wage supervision. The network topology is shown in Fig. 3.5.
3.6 Conclusion The platform integrates many advanced technologies such as iris recognition, network technology and database. It has a complete set of iris recognition core algorithm technology and product technology. It seamlessly integrates biometrics products with migrant workers’ online management software, and establishes an intelligent and reliable system platform. Compared with the traditional IC card attendance system, iris system has the following advantages: (a) Reliability. Iris recognition technology has been currently recognized as the most accurate method of biometric recognition. (b) Simplicity of operation. There is no need to punch in or touch any part of the body. Just look at it with your eyes. (c) Irreplaceable. Due to the uniqueness of iris, workers must be present when they go down the well, so it is impossible to replace punch.
3 The Research and Design of Wage Supervision Service Platform …
29
With its iris recognition equipment, the platform registers and enters the iris and related information of migrant workers in advance, and establishes a real-name identity database of migrant workers. Compared with traditional monitoring system, this platform not only fully grasps the identity information of migrant workers but also facilitates the management of migrant workers by public security departments. Workers are authenticated by iris recognition equipment of management platform. The system automatically records, stores and uploads attendance data to management platform and workers’ work record. Both labor companies and regulators are very clear about the status of workers, which is the biggest highlight of the system. The big data service platform of wage supervision for migrant workers can further guarantee the legitimate rights and interests of migrant workers and play a supervisory and managerial role for enterprises in arrears of wages. The platform also improves the work efficiency of migrant workers’ supervisory departments, realizes the detailed management of migrant workers’ information resources, and the competent government departments can timely supervise the payment of migrant workers’ wages, and prevent the occurrence of arrears of migrant workers’ wages in advance.
References 1. Duan, F.: Research on the application of wage management system in units. ZHISHI JINGJI 11, 116–117 (2019) 2. Li, Y., Zhang, G., Zhang, Q.: Development of management system based on spring MVC. Microcomput. Appl. 34(11), 119–123 (2018) 3. Baldassar, L., Baldock, C.V., Wilding, R.: Families Caring Across Borders: Migration, Ageing and Transnational Caregiving. Palgrave Macmillan, Basingstoke (2007) 4. Schwenken, H., Heimeshoff, L.-M. (eds.): Domestic Workers Count: Global Data on an Often Invisible Sector. Kassel University Press, Kassel (2011) 5. Wildes, R.P., Asmuth, J.C., Green, G.L., Hsu, S.C., Kolczynski, R.J., Matey, J.R., McBride, S.E.: A machine-vision system for Iris recognition. Mach. Vis. Appl. 9(1), 1–8 (1996) 6. Jiao, X., Huang, H.: Design and implementation of enterprise salary management system. China Comput. Commun. (5), 119–120, 124 (2018) 7. Park, C., Lee, J.: Extracting and combining multimodal directional iris features. In: Springer LNCS 3832: International Conference on Biometrics, pp. 389–396 (2006) 8. Peng, Y., Di, Y.: Influencing factors of entrepreneurship behavior of new generation migrant workers in Beijing. Tianjin Hebei. Jiangsu Agric. Sci. 47(4), 316–320 (2019) 9. Jiang, Y., Li, X., Jiang, J.: Probe into the application of intelligent site cost management and control system. Financ. Account. 1, 70–71 (2019) 10. Zhang, S.: On the governance of migrant workers from the perspective of social development: a case study of southern. Beijing Cult. Rev. 4, 126–133 (2018)
Chapter 4
Research on Block Segmentation and Assembly Technology of 3D Printing Structure Gang Qiao, Shangwei Liu, Qun Wei, Luting Wei, and Yingjie Wang
Abstract There have been reports of attempts and experiments on 3D printing technology in the construction industry. However, due to the huge volume and complex diversity of the building structure, the desire for 3D printing of architectural structure faces difficulties and technical problems in terms of printing size and materials. This paper introduces the digital modeling principle, block segmentation principle, applicable principle of printer and corresponding assembly method of 3D printing of architectural structure. Practical results have been obtained through experiments of different types of 3D printer, which can break complex components into parts. At low cost, only modeling and printing is needed to form components, and the whole building structure can be completed by combining different ways of connection. With the expansion of 3D printing technology and materials, it can be predicted that the use of 3D printing for block segmentation and assembly molding will become another branch of prefabricated buildings.
4.1 Introduction 3D printing technology has been gradually integrated into people’s daily understanding and application. There have been successful attempts in 3D printing in industrial buildings. 3D printing technology gives designers the edge of customization, singularity of shape, greenness of construction and model visualization. The continuous expansion and substitution of applicable 3D printing materials make people more confident and hopeful about 3D printing structural engineering. However, the current
G. Qiao · Y. Wang Powerchina Road Bridge Group Co. Ltd., Beijing, China S. Liu · Q. Wei (B) North China University of Water Resources and Electric Power, Zhengzhou, China e-mail: [email protected] L. Wei School of Civil Engineering and Architecture, University of Jinan, Jinan, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_4
31
32
G. Qiao et al.
problem is that 3D printing is limited in size, which makes it impossible to print complex or huge structures at one time. In addition, it is impossible to meet engineering needs with only one material due to the requirements of structural shape, partition and material properties. Therefore, 3D printing block segmentation and assembly technology has become a more practical and widely concerned method in structural engineering. The practical process of 3D printing block segmentation and assembly in building structure engineering proposed in this paper has solved the key technical problems and has been improved in engineering practice. Building information modeling (BIM) technology is a kind of data tool applied in engineering design, construction and management. Through the data of building, the integration of information model, the sharing and transmission in the whole lifecycle of project planning, operation and maintenance, the engineers and technicians can correctly understand and respond to all kinds of building information. 3D printing technology can extract comprehensive data through BIM, and has a strong application prospect.
4.2 3D Printing Technology Dating back to the mid-1990s, 3D printing technology works as such: based on photocuring or paper lamination technology, colored liquid or powder printing materials in the printer are superimposed layer by layer through the computer-controlled printing and operating platform so that the 3D graphics stored on the computer are “printed” as physical entities. This set of technology, which consists of digital model, printer device and control platform under the unified coordinated instruction of computer, is called 3D printing technology [1].
4.2.1 Introduction to Three Technologies Commonly Used in 3D Printing (a) Fused deposition modeling Here’s how it works: filamentous hot melting materials are heated and melted. At the same time, under the control of the computer, the 3D inkjet head selectively applies the materials on the worktable according to the section profile information, forming a layer after rapid cooling. Then the machine table drops by a height (i.e., layer thickness) to form the next layer until the whole solid shape is formed. Without the need of expensive components such as laser, FDM printer is affordable and can use a variety of materials, generally scroll. However, its maximum accuracy is 0.1 mm, and the screw precision and stiffness of the operating platform often cause large errors. Support materials are required for printing in the space, which is difficult to avoid [2–4].
4 Research on Block Segmentation and Assembly Technology …
33
(b) Selective laser sintering (SLS) of powder materials A layer of powder material is spread on the upper surface of the formed part and heated to a temperature just below the sintering point of the powder. The control system controls the laser beam to scan on the powder layer according to the section profile of the layer so that the temperature of the powder rises to the melting point for sintering and bonding with the formed part below. After one layer is completed, the worktable drops by one layer of thickness, and a layer of uniform and dense powder is spread on it by the spreading roller for sintering of a new layer of section until the whole model is completed. The advantages of SLS technology are shown in many aspects: Multiple materials are available, including polymer, metal, ceramic, gypsum, nylon and other powder materials, especially metallic powder material, one of the hottest trends in 3D printing technology, due to its simple manufacturing process. (c) Stereo lithography apparatus (SLA) of photosensitive resin Liquid photosensitive resin is filled in the flume, and will rapidly solidify under the irradiation of the ultraviolet laser beam emitted by the laser (ultraviolet laser for SLA and infrared laser for SLS). At the beginning of the molding, the lifting table is below the level of the liquid, just a height equal to the thickness of a section layer. The section profile is scanned along the liquid level according to the instructions of the machine through the laser beam focused by the lens. The resin in the scanning area solidifies rapidly, thus completing the processing of a cross section and obtaining a thin layer of plastic. Then, the table drops the height equal to the thickness of one layer of section, and then solidifies another layer of section, constructing 3D entity by superimposing layers. SLA has been most widely used, with the longest development time and the most mature technology. Of the rapid prototyping machines installed around the world, about 60% of them use stereo lithography apparatus.
4.2.2 Four Common Data Formats for 3D Printing (a) STL (stereolithography) STL is a 3D graphics file format developed by 3D SYSTEMS in 1988 for rapid prototyping technology. STL was not created specifically for 3D printing, but happened to be used in 3D printing rapid prototyping technology. Different from other featurebased solid models, STL uses triangular grid to represent 3D CAD model, which can only describe the geometric information of 3D objects, but does not support the information such as color and material. A triangle with only three points is not topological. Owing to simplified STL data and simple format, STL has become the real data standard of rapid prototyping system with the rapid rise of 3D SYSTEMS. STL files come in two types: text files (in ASCII format) and BINARY files. The former is a readable file with stronger universality.
34
G. Qiao et al.
(b) OBJ (Object) The OBJ file is a standard 3D model file format developed by Alias|Wavefront for its workstation-based 3D modeling and animation software “Advanced Visualizer”, which is suitable for data exchange between 3D software models. For models created in 3dsMax or LightWave, exporting OBJ files is a good option for data exchange across a variety of software. OBJ mainly supports Polygons models, excluding animation, material properties, texture path, dynamics, particles, and so on. Owing to the convenience of OBJ format in data exchange, most 3D CAD software currently supports OBJ format, and most 3D printers also support printing in OBJ format. (c) Additive manufacturing file format (AMF) It is the 3D printing format recommended by American Society for Testing and Materials (ASTM). AMF is a data format based on the “STL” format currently used by 3D printers, which make up for its weaknesses. The new format can record the information such as color, material and internal structure of object. Based on XML (extensible markup language), the AMF standard is a readable file that can be easily processed by a computer and understood by a human. It can be easily expanded by adding tags in the future. The new standard can record single material, and specify different materials for different parts, and can also change the proportion of two kinds of materials for modeling, with the internal structure of the form recorded in numerical formulae. In addition, it can specify the surface of the molds for printing images and the most efficient direction for 3D printing, add new data information and data standard structure under development. (d) 3D manufacturing format (3MF) Compared with STL format, 3MF file format can describe 3D model more completely. In addition to geometric information, it can also maintain internal information, color, material, texture and other characteristics. 3MF is also an XML-based data format with extensibility, supported by many big brands. Jointly launched by Microsoft, Autodesk, Dassault Systems, Netfabb, SLM, HP and Shapeways in 2015, 3MF format has a lot of potential and has replaced other defective data formats. In addition to the four 3D printing data formats described above, WRL format is also a 3D printing format available for virtual reality files. This paper will focus on the uses and processing methods of STL format.
4.2.3 Basic Requirements for 3D Printing STL Data Format When 3D files output STL file format, the following points should be paid attention to in order to ensure that the STL file has high efficiency and is not stuck in printing.
4 Research on Block Segmentation and Assembly Technology …
35
(a) Common vertex rules: two adjacent triangular planes can share only two vertices. The vertex of a triangle plane cannot fall on the edge of any adjacent small triangle plane. (b) Orientation rules: the normal vector of each small triangle plane must be outward. The vectors of three vertices are defined by the right-hand rule (counterclockwise). Orientation contradiction cannot appear in the adjacent small triangle planes. (c) Valuing rules: the vertex coordinates of each small triangle plane must be positive; negative values and zero will cause failure. (d) Filling rules: on all surfaces of the 3D model, no omission shall be allowed. That is to say, all surfaces of a 3D model must be covered with small triangular faces. (e) Euler’s formula: the relation between numbers of vertices, faces and edges in STL files must conform to Euler’s formula. (f) Prevention of gaps, that is, the loss of triangular face: The intersection of large curvature surfaces will produce this defect when triangulation is carried out. On the displayed STL format model, there will be incorrect cracks or holes (with no triangles), which violates the filling rules. In this case, a number of small triangular faces should be added along the cracks or holes. (g) Avoiding distortion, where all the edges of a triangular face are collinear: this problem usually occurs in the algorithm for converting a 3D entity to a STL file. The distortion of the triangular face at the intersection line is caused by using the triangular face generated to different entities at the intersection line. (h) Inspection of overlap of triangular faces: the overlap of the faces is mainly caused by the value error in the triangulation of the faces. The vertices of a triangle are represented in 3D space as floating-point numbers, not as integers. The overlap must be avoided for STL format. (i) Ambiguous topological relationship: according to the common vertex rules, on any side of the model, only two triangles have common sides. When more than two triangles share this side, an ambiguous topological relation is created. These problems may occur in the triangulation of a plane with a sharp angle, the intersection of different entities or due to error in control parameters when STL files are generated. Because of these defects, we must check the validity of the STL file data in advance in the printing process; otherwise, the defective STL file will lead to many problems in the processing of the rapid prototyping system, such as the geometric distortion of the prototype, and even crash in severe cases. If STL format files meet the above nine conditions, 3D printing effect can be obtained.
36
G. Qiao et al.
4.3 Block Segmentation and Assembly Technology of 3D Printing Component 4.3.1 Block Segmentation and Assembly Process of 3D Printing Component For large volume 3D printing components, due to the limitation of the size of the printer itself, sub-block must be split for printing and then assembled. Constraints from digital modeling to printing are shown in the process diagram (Fig. 4.1).
Fig. 4.1 Flow chart for block segmentation and assembly of 3D printing component
4 Research on Block Segmentation and Assembly Technology …
37
4.3.2 Selection of 3D Printing Sub-block Direction and Position The 3D printing block also involves the printing orientation. For the powder printing medium of Z-copperater, without color or glue, it is supported by the powder medium itself. Finally, dust is removed to obtain the required entity. However, for the FDM printing method, selecting an appropriate printing method can reduce the printing of the supporting structure and the subsequent cleaning work.
4.4 Printing Cases A long-span suspension bridge, with 280 m above the bottom of the valley, the total length of 1578 m, is printed with scale of 1:900. In order to adapt to the limitation of FDM printing size (500 × 500 × 400 mm), the outer contour of the whole model is 1700 × 380 × 100 mm. According to the process in Fig. 4.1, divide the 3D printing sub-block into 10 blocks. Figure 4.2 shows the result of printing block segmentation and assembly. Figure 4.3 shows the connection mode of each printing block adopting bolting, convex–concave and wedge groove.
4.5 Conclusion 3D printing technology has been gradually extended and put into practice. Block segmentation and assembly process of 3D printing structure in this paper has realized the need to make large molding with small machines. The technology can also be used with a variety of auxiliary components to increase the postprocessing process, using CNE components, cutting components, surface treatment to increase the function of block segmentation and assembly of printing components. It also gives prefabricated building technology with more options for printing molding. In the future, the whole process of printing and modeling should be improved to form standards and specifications like steel structure details. The combination of other forms of printing methods can reduce the support of printing components and increase practicality so that 3D printing block segmentation and assembly technology can play a greater role.
Fig. 4.2 Schematic diagram of 3D printing sub-block index
block c (lower) 09 Geological block d 10
Geological block a (upper) 06 Geological block a (lower) 07 Geological block b 08 Geological
Deck 03Approach Bridge Sub-block on XishuiBank 04 Approach Bridge on Xishui Bank 05
Approach Bridge Sub-block on Jiangjin Bank 01 Main Girder and Deck 02 Main Girder and
38 G. Qiao et al.
4 Research on Block Segmentation and Assembly Technology …
Connection Form
Connection Form
39
Connection Form
Fig. 4.3 Schematic diagram of model block connection form
References 1. Wei, Q., Yin, W.B., Liu, S.W.: Research Progress of Digital Graphic Information Fusion System in BIM Technology, Beijing: Sciencepaper Online [2014–03–19]. http://www.paper.edu.cn/ releasepaper/content/201403-758 2. Keough, I.: What Revit Wants: Dynamo Revit Test Framework [2013–10–16]. https://www. revitforum.org/blog-feeds/16596-what-revit-wants-dynamo-revit-test-framework-ian-keough. html 3. Steel Structure and Engineering Research Institute North China University of Water Resources and Electric Power, Report on 3D Printing Technology of Sunxihe Bridge, North China University of Water Resources and Electric Power, p. 10 (2018) 4. de Matos, C.R., de Oliveira Miranda, A.C.: The use of BIM in public construction supervision in Brazil. Organ., Technol. Manag. Constr.: Int. J. 10(1) (2018)
Chapter 5
A Study on 3D Reconstruction Method for Common View Field in Hybrid Vision System Chang Lin, Haifeng Zhou, Wu Chen, and Yan Zhang
Abstract A hybrid system is constructed by an RGB-D camera, and an omnidirectional camera is proposed to solve the shortcomings of time-consuming and low-matching precision in image feature matching and the reconstruction of complicated traditional hybrid vision system. Then a 3D reconstruction method for the common view field in hybrid vision system is proposed. First, the epipolar geometry of this system is utilized to determine the common view field of this system. Secondly, the feature points of the common view field and RGB image were extracted using SURF and the corresponding matching points were found using nearest neighbor method. Thirdly, 3D reconstruction of the matching points is realized by the depth information and coordinate transformation of the system. Finally, the feasibility and effectiveness of the algorithm is verified in the experiment.
5.1 Introduction With the rapid development of computer vision technology and artificial intelligence technology, it provides a broad application prospect for mobile robots. In the research of robots application, 3D reconstruction based on vision is an important and difficult problem. Therefore, how quickly and accurately to build objects is an important goal of computer vision. In recent years, the hybrid vision system consisting of monocular vision and panorama vision has also been widely used in mobile robots [1–4]. The appearance of RGB-D camera is provided for 3D reconstruction as another C. Lin · H. Zhou (B) · W. Chen Marine Engineering College and Key Laboratory of Fujian Province Marine and Ocean Engineering, Jimei University, Xiamen, Fujian 361021, China e-mail: [email protected] C. Lin · Y. Zhang School of Mechanical and Electrical Engineering, Putian University, Putian 351100, China C. Lin · H. Zhou · Y. Zhang Key Laboratory of Modern Precision Measurement and Laser, Nondestructive Testing in Fujian Universities and Colleges, Putian, Fujian 351100, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_5
41
42
C. Lin et al.
effective method [5–8]. In Ref. [5] the authors present a novel mapping system that robustly generates highly accurate 3D maps using an RGB-D camera. This approach requires no further sensors or odometry and applies to small domestic robots such as vacuum cleaners, as well as flying robots such as quadrocopters. The system can robustly deal with challenging scenarios such as fast camera motions and feature-poor environments while being fast enough for online operation. In this paper a hybrid vision system is build. The epipolar geometry within the system is analyzed, and the method to determine the common field in the hybrid vision is proposed. The SURF [9, 10] feature in the common view field is extracted in which the matching of the corresponding feature points and the 3D reconstruction of the scene are completed. The method only uses RGB-D camera to obtain the environment color and depth information at the same time. The matching method and the equipment are simple and the reconstruction accuracy is high.
5.2 Hybrid Vision System 5.2.1 Introduction of Hybrid Vision System Model The hybrid vision system used in this paper is shown in Fig. 5.1. It is composed of RGB-D camera and panoramic camera. The RGB-D camera is a depth measuring sensor with three outputs: IR image, RGB image and depth information, which can be used to measure spatial 3D point data. The panoramic camera can collect the information of 360◦ × θ in the scene, where 360◦ is the horizontal field view angle and θ is the vertical field view angle. It is determined by the camera’s own parameters. The panoramic camera vertical direction in terms of the elevation angle is 15◦ , the angle is 60◦ , so the value of θ = 75◦ .
5.2.2 The Imaging Principle of RGB-D Camera RGB-D camera is a combination of an IR camera and RGB camera. Triangulation [6] by IR transmitter and IR camera can obtain the depth information of each pixel corresponding to the RGB image d. The calibration method can be used to obtain the internal parameters K p of the RGB camera. Then the pixel coordinates (u, v, 1) of the feature points can be transformed into the coordinates of the spatial points (X p , Y p , Z p ) in the RGB camera coordinate system, as shown in Eq. (5.1): T [X p , Y p , Z p ]T = d · K −1 p · [u, v, 1]
(5.1)
where d is the IR camera that can obtain the depth information of each pixel corresponding to the RGB image. K p is the internal parameters of RGB camera.
5 A Study on 3D Reconstruction Method …
43
Fig. 5.1 Hybrid vision system model
5.2.3 The Imaging Principle of Panoramic Camera The imaging model of hyperboloid panoramic camera is composed of two projection processes: one is the space point to the mirror point and the other is mirror point to the image point. The imaging principle is shown in Fig. 5.2. According to the optical characteristics of the hyperboloid, the light rays in space that are directed toward Om must converge to Oo after reflection by the hyperboloid mirror and imaged on the camera image plane. Choosing Om as the origin of hyperbolic mirror coordinate system, the establishment of hyperbolic mirror equation: (X m2 + Ym2 ) (Z m + c)2 − =1 a2 b2
(5.2)
where a, b, c are the parameters of hyperboloid mirror, then a 2 + b2 = c2 . Om is the upper focus of hyperboloid mirror, and Oo is the lower focus of hyperboloid mirror. In the hyperbolic mirror coordinate system, the spatial point P in the hyperboloid mirror coordinate system is as follows: Pm (X m , Ym , Z m )T . The imagine process of the panoramic camera image plane can be represented by the formula (5.3): (Z m + 2c)[ pu , pv 1]T = K o [X m , Ym , Z m + 2c]T
(5.3)
44
C. Lin et al.
Fig. 5.2 Panoramic camera projection model
where K o is the inner parameter matrix of panoramic camera. ( pu , pv ) is the Pm projection pixel coordinates in the image plane.
5.3 The Feature Points 3D Reconstruction 5.3.1 SURF Feature Extraction Compared with the SIFT method, the SURF feature extraction method not only greatly improves the computational speed but also has better performance in resisting scale changes, rotation, illumination, radiation and geometric transformation. The SURF is simplified by the DoH Gaussian second-order template so that the template filtering of the image can be a simple calculation. In addition, the integral image method is used to make the operation time independent of the size of the filter template. The detailed algorithm is shown in [9, 10].
5.3.2 SURF Feature Matching Euclidean distance between the feature-point vectors is used as matching metric for SURF feature matching. Let N1 and N2 be the SURF feature sets of images I p ,Io , respectively. For any feature point n iI1 of N1 and n iI2 of N2 , the smallest Euclidean
5 A Study on 3D Reconstruction Method …
45
j j distance between the two feature points are n I2 and n I2 , respectively. Then the cor responding distance is di j and di j , respectively. If di j ≤ tdi j (t = 0.6), considering j that n iI1 and n I2 match each other, otherwise the matching pair is removed. Traverse the feature points in I p to find out all matching pairs.
5.3.3 3D Reconstruction of Feature Points The RGB-D camera can obtain RGB image and depth information of the corresponding point. Let [x ip , y ip , z ip ]T be the coordinate space of feature points in the RGB-D camera coordinate system, and then transform it to the panoramic camera coordinate system to get the coordinates of the feature points [x ip2m , y ip2m , z ip2m ]T through the Eq. (5.4): ⎡
⎤ ⎤ ⎡ X ip2m X ip ⎢ i ⎥ ⎥ ⎢ −1 ⎣ Y p2m ⎦ = R p2m · ⎣ Y pi ⎦ − T p2m Z ip Z ip2m
(5.4)
where (R p2 , T p2m ) is outer parameter matrix of hybrid vision system, Z ip2m is the depth information of feature points. [xoi , yoi , z oi ]T is the feature points coordinate of panoramic image in panoramic camera coordinate system. We know that Z oi = Z ip2m . According to the camera hole imaging principle to get the coordinates of the spatial point:
xoi yoi z oi
T
T = K o−1 u o vo 1 z ip2m
(5.5)
where K o is the inner parameter matrix of panoramic camera, and the [u o , vo ]T pixel i T ] . coordinates of [xmi , ymi , z m
5.4 Results With AS-R mobile robot, a hybrid vision system is built by installing a panorama and RGB-D camera on it, and its internal and external parameters have been obtained beforehand. The resolution of RGB image and depth image is 640 × 480. The resolution of the panoramic image is 2048 × 1536.
46
C. Lin et al.
Table 5.1 The points of feature matching results Method
Feature matching (F M)
Mismatching number (M N )
Correct matching rate (P) (%)
Paper method
41
3
92.7
Literature [11]
42
7
83.3
* Where
the correct matching rate is: P = (F M − M N )/F M
Table 5.2. Distance from reconstruction point to fitting plane Method
Mean/mm
Maximal value/mm
Standard deviation/mm [12]
Paper method
1.3
3.4
1.3
Literature [11]
1.5
4.7
1.4
5.4.1 Feature Matching Point 3D Reconstruction 5.4.1.1
Reconstruction Result
A hybrid vision system is used to capture the carton in the office, and the common view field of the system is obtained, the SURF feature matching and panoramic image feature points 3D reconstruction are carried out.
5.4.1.2
Literature [11] Methods Reconstruction Results
See Fig. 5.4 and Tables 5.1, 5.2.
5.4.2 Discussion (1) The experimental results are compared with the results obtained by the proposed method and the literature [11] method (see Figs. 5.3d and 5.4a), where the common region of the hybrid vision system is shown in Fig. 5.3c. It is found that the correct matching rate is 83.3% by using the method of document [11] but there are many mismatching pairs. This method uses the SURF feature matching method, and the correct matching rate is 92.7%, which significantly reduce the number of false matches (see Table 5.1). (2) According to the matching results and the depth information obtained by RGB-D camera, the 3D reconstruction feature points of panoramic images are reconstructed (as shown in Figs. 5.3e and 5.4c). The fitting plane and projecting into X-Y plane is shown in Figs. 5.3f and 5.4c. Comparing the reconstruction results, the method reconstruction results are better than the literature [11]. The mean
5 A Study on 3D Reconstruction Method …
47
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 5.3 Reconstruction results of feature points in panoramic images. a RGB image. b Panoramic image. c Common visual field. d SURF feature matching result. e Feature point 3D reconstruction. f Panoramic feature point project in X-Y plane
distance from the reconstructed point to the fitting plane is less than that in reference [11]. And the reconstructed point to the fitting plane’s distance standard deviation is also less than in reference [11].
48
C. Lin et al.
(a)
(b)
(c)
Fig. 5.4 Literature [11] reconstruction result. a SIFT feature matching result. b Feature point 3D reconstruction. c Panoramic feature point project in X-Y plane
5.5 Conclusion In this study, a new 3D reconstruction method of the common view field in the hybrid vision system is employed which combine RGB-D camera and panoramic camera. This method makes full use of the advantages of the wide viewing angle of the panoramic camera and the simple depth information obtained by the RGB-D camera. The overlapping parts of the image in the system are determined by epipolar geometry relationship. The SURF feature matching method is used to region mismatching and to improve the matching efficiency and accuracy. We will do further research on the method of rapid reconstruction of indoor multi-object and multi-object occlusion, then SURF feature error matching point elimination and panoramic camera image distortion correction also as the focus of research. Acknowledgements This research is jointly supported by the National Natural Science Foundation of China under contract (No. 51179074), by Fujian Natural Science Foundation (No. 2018J01495), by the Fujian Young and Middle-aged Teacher Education and Research Project (No. JAT170507, No. JAT170507(P)); Modern Key Laboratory of Precision Measurement and Laser Nondestructive Testing in Colleges and Universities of Fujian Province (No.2016JZA018 and B17119); by Jimei University Research Funding (No. ZQ2013007); by Putian Science and Technology Project (No. 2018RP4002).
5 A Study on 3D Reconstruction Method …
49
References 1. Meilland, M., Comport, A.I., Rives, P.: Dense omnidirectional RGB-D mapping of large- scale outdoor environments for real-time localization and autonomous navigation. J. Field Robot. 32(4), 474–503 (2015) 2. Fernández, C., Fernández-Llorca, D., Sotelo, M.A.: A hybrid vision-map method for urban road detection. J. Adv. Transp. 30(1), 1–21 (2017) 3. Zhou, Q., Zou, D., Liu, P.: Hybrid obstacle avoidance system with vision and ultrasonic sensors for multi-rotor MAVs. Ind. Robot.: Int. J. 45(2), 227–236 (2018) 4. Whelan, T., Kaess, M., Johannsson, H., Fallon, M., Leonard, J.J., McDonald, J.: Real-time large-scale dense RGB-D SLAM with volumetric fusion. Int. J. Robot. Res. 34(4–5), 598–626 (2015) 5. Endres, F., Hess, J., Sturm, J., Cremers, D., Burgard, W.: 3-D mapping with an RGB-D camera. IEEE Trans. Rob. 30(1), 177–187 (2013) 6. Huang, A.S., Bachrach, A., Henry, P., Krainin, M., Maturana, D., Fox, D., Roy, N.: Visual odometry and mapping for autonomous flight using an RGB-D camera. In: Robotics Research, pp. 235–252. Springer, Cham (2017) 7. Scona, R., Jaimez, M., Petillot, Y.R., Fallon, M., Cremers, D.: StaticFusion: background reconstruction for dense RGB-D SLAM in dynamic environments. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1–9. IEEE, Brisbane (2018) 8. Laidlow, T., Bloesch, M., Li, W., Leutenegger, S.: Dense RGB-D-inertial SLAM with map deformations. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6741–6748). IEEE, Vancouver (2017) 9. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust feature. In: European Conference on Computer Vision, pp. 404–417 Springer, Berlin (2006) 10. Khoshelham, K., Elberink, S.O.: Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors 12(2), 1437–1454 (2012) 11. He, B., Chen, Z.: Determination of the common view field in hybrid vision system and 3 D reconstruction method. Robot 33(5), 614–620 (2011) 12. Manikandan, S.: Measures of dispersion. J. Pharmacol. Pharmacothera-Peutics 2(4), 315 (2011)
Chapter 6
Three-Dimensional Characteristics Observation of Ocean Waves in Coastal Areas by Microwave Doppler Radar Longgang Zhang
Abstract Wind wave and swell usually coexist in real ocean waves, both in offshore and coastal areas. This phenomenon often results in multiple peaks in the wave spectrum. Directional wave spectra and the radial velocity of the ocean waves obtained from the microwave Doppler radar which has been developed by Wuhan University were used as data sets to observe three-dimensional characteristics. The radial velocity relative to the radar of the ocean waves in a range can be measured by this radar directly, and then the velocity spectrum can be converted into the wave spectrum through an assigned conversion function. The three-dimensional chart of the radial velocity of the ocean waves demonstrates the direction and velocity of the waves. Meanwhile, the variation in the wave crest and trough is also reflected in the three-dimensional chart. The three-dimensional features of wave radial velocity and wave spectrum were distinct, which play an important role in describing the shape and characteristics of the ocean wave.
6.1 Introduction The ocean wave spectra measured at specific locations in the ocean are generally considered to be the sum of wave ingredient generated by direction, frequency or events separated from each other [1–3]. When the sea state is composed of several wave systems, the spectrum is usually multi-peaked, and each peak represents local wind waves or expansion independent of local wind. The energy characteristics of ocean waves are reflected in the three-dimensional chart of the ocean wave spectrum. Several observations have shown that both wind waves and swell can occur simultaneously on the high seas and in coastal areas, with a relatively high probability of occurring around the world [4]. He obtained a range of 23–26% from experiments in Portuguese coastal areas. Moon and Oh found that the percentage of spectra with
L. Zhang (B) Institute of Color and Image Vision, Yunnan Normal University, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_6
51
52
L. Zhang
multi-peaks was 25% based on 17,750 spectra measured in coastal areas around Korea [5]. Nowadays, more oceanographers begin to use electromagnetic wave to detect ocean waves, so as to obtain three-dimensional images of ocean waves. In this process, navigation radar and microwave Doppler radar have gradually become important tools for three-dimensional wave imaging. It is difficult for traditional wave measuring tools such as the wave buoys to observe waves all-weather and over a wide range. The three-dimensional energy map of ocean wave can be obtained by navigation radar through measuring the energy of ocean wave echo. For navigation radar, a chart can be acquired by each scan, which contains the energy values of coordinate points in the horizontal range and vertical range. However, the disadvantage of navigational radar is that the radial velocity of ocean waves could not be measured, which contains a lot of important information in the development of ocean waves. According to the Doppler principle, microwave Doppler radar can easily obtain the radial velocity sequence of the ocean wave echo by means of waveform coherence. The three-dimensional chart of the radial velocity sequence of the ocean wave is a function of time and range. The features of change and development of the ocean wave with time and range could be reflected clearly in that two maps. Wave spectrum can be derived from the radial velocity series by an assigned conversion function. The three-dimensional chart of the wave spectrum demonstrates the relationship between wave energy and frequency and direction.
6.2 3D Spectrum Pre-processing There are usually several spectral peaks in the wave spectrum. Some of the spectral peaks are caused by noise, so it is necessary to pre-process the three-dimensional wave spectrum. Portilla adopted the iterative smoothing method to reduce the influence of false peaks [6]. This method involves a two-dimensional discrete convolution operation with weighted averaging of adjacent peaks. In addition, a kind of watershed algorithm proposed by Gerling is a wave system separation method based on spectral geometry without considering the geophysical properties of ocean wave [7]. By means of multiple discrete convolution operations, the false spectral peaks caused by noise can be eliminated and the real wave spectrum containing wave energy can be obtained. In order to eliminate the influence of singularities, it is necessary to remove singularities from the time series of each range element for the three-dimensional graph of radial velocity sequence and signal-to-noise ratio of ocean waves. After eliminating singularities, the radial velocity spectrum at each range can be obtained by Fourier transform. Then the wave spectrum is calculated according to conversion function. The three-dimensional graph of signal-to-noise ratio (SNR) shows the influence of the wave on radar echo at the wave crest and trough.
6 Three-Dimensional Characteristics Observation of Ocean …
53
6.3 Radar Data A microwave Doppler radar has been developed by Wuhan University for directional spectrum measurement. The radar works in S band and the main frequency is 3 Gigahertz. The preliminary spectrum results confirm the performance of the radar in ocean wave measurement [8]. The wave data used in this paper were collected from December 2012 to May 2016 in the coastal areas of the South China Sea. The radar is a coherent system. The spectrum is calculated by the Doppler shift of the ocean surface velocity, rather than based on the non-backscattering power, which is used in navigational radar [9]. The radar works well in extreme weather conditions or under conditions that are not visible at night. During the observation experiment, radar is installed at a height of 20 m above the sea level. Six horn antennas with a beam width of 30° are installed, which can cover an area of 180°. The antenna works in time-division multiplexing mode, and only one antenna works at a specific time. The three-dimensional image of radial velocity and wave direction spectrum can be measured by radar every 3 min. Under the set transmitting power, valid data can be obtained in the coverage area of 300– 2000 m from the radar, in which the range resolution of radar is 7.5 m. In addition, an anemometer is placed 10 m above the sea level near the radar, which is used to record the wind direction and wind speed in the area every 10 min. The position of the microwave Doppler radar and the coverage of six antennas are displayed in Fig. 6.1.
Fig. 6.1. The location of microwave Doppler radar and coverage of six antennas
54
L. Zhang
6.4 3D Characteristics of Ocean Wave To analyze the details, the 3D chart of the directional wave spectrum at 20:26 16 December 2012 is shown in Fig. 6.2. The three-dimensional chart of the wave spectrum at 20:26, 16 December 2012 can be obtained by pre-processing of the wave spectrum. The wave spectrum shows the relationship between frequency and azimuth of wave energy. The vertical axis of the image represents the power spectral density of the wave energy in units of square meter per Hz per degree. It can be seen from the figure that there are two peaks in the wave spectrum, which demonstrates that there are two wave systems in the sea area during this observation period. It can also be seen from the figure that the two peaks are two systems with different frequencies in the same direction. The directions of the wave system with maximum and minimum peak energy are both 134°. The peak frequency of the wave system with the largest peak energy is 0.12 Hz, and that of the wave system with the smallest peak energy is 0.24 Hz. This phenomenon accords with the actual law of wave development [10]. In addition, to show more observations, the 3D chart of the directional wave spectrum at another time is displayed in Fig. 6.3. This directional wave spectrum was acquired at 21:33, 16 December 2012. The meaning of the coordinate axis of the graph is the same as that of Fig. 6.2. As can be seen from Fig. 6.3, there are three peaks in the 3D chart of the directional wave spectrum. In Fig. 6.3, the wave system with the largest peak energy has the same direction as the wave system with the smallest peak energy, but the frequencies of them are different. The frequency of the wave system with the highest peak energy is similar to that of the wave system with the second peak energy, but the directions of them are different. The directions of the wave system with the maximum and minimum
0.5
0.8
0.4
0.6 0.4
0.3
0.2 0.2 0 0.5
0.1
0.4
300
0.3 200
0.2 0.1
Frequency (Hz)
100 0
0
Azimuth (degree)
Fig. 6.2. The 3D chart of the directional wave spectrum at 20:26 16 December 2012
6 Three-Dimensional Characteristics Observation of Ocean …
55
0.9 0.8 1
0.7
0.8
0.6
0.6
0.5
0.4
0.4 0.3
0.2 0 0.5
300 0.4
200 0.3
0.2
Frequency (Hz)
100 0.1
0
Azimuth (degree)
0.2 0.1 0
Fig. 6.3. The 3D chart of the directional wave spectrum at 21:33 16 December 2012
peak energy are both 165°, and the azimuth of the wave system with the second peak energy is 80°. The peak frequency of the wave system with the maximum peak energy is 0.13 Hz, and that of the wave system with the second maximum peak energy is 0.11 Hz. The wave spectrum is calculated from the radial velocity series measured by radar, and the three-dimensional chart of the radial velocity series is an important representation of the wave development with time and space. Next, the radial velocity sequence is analyzed. The plane figure of the three-dimensional sequence diagram of radial velocity measured at 08:08 18 December 2013 by radar is shown in Fig. 6.4. In this chart, abscissa represents the time series, and ordinate represents the range series. The chart represents the change of wave radial velocity in a certain range of time and distance. It can be seen that during this period, the radial velocity series of the waves show as stripes, and the slope of the stripes characters the speed of the main wave [11]. When the radial velocity is greater than zero, it shows that the water particle of the wave is moving to the direction of radar. Nevertheless, the water particle of the wave moves off the direction of the radar when the radial velocity is less than zero.
6.5 Conclusion From the data analysis of this paper, it can be seen that the microwave Doppler radar developed by Wuhan University can be used for wave observation. By deploying radar on the seashore, the island or ship, wave spectrum measurement and observation of the radial velocity of the ocean wave with time and space can be implemented. The
56
L. Zhang 180 160
1.5
140 1
Range Series
120 0.5
100 80
0
60
-0.5
40 -1
20 0 300
400
500
600
700
800
-1.5
Time Series
Fig. 6.4. Three-dimensional sequence diagram of radial velocity at 08:08 18 December 2013
three-dimensional characteristics of the ocean wave were showed in this paper. More information about ocean waves can be obtained through the subsequent technology of three-dimensional image processing. Acknowledgements This work has been supported by Doctor Start Project of Yunnan Normal University (00900205020503127).
References 1. Hanson, J.L., Phillips, O.M.: Automated analysis of ocean surface directional wave spectra. J. Atmos. Ocean. Technol. 18, 277–293 (2000) 2. Portilla, J.: Spectral partitioning and identification of wind sea and swell. J. Atmos. Ocean. Technol. 26(1), 107–122 (2009) 3. Yilmaz, N., Özhan, E.: Characteristics of the frequency spectra of wind-waves in Eastern Black Sea. Ocean Dyn. 64(10), 1419–1429 (2014) 4. Soares, C.G.: On the occurrence of double peaked wave spectra. Ocean Eng. 18, 167–171 (1991) 5. Moon, I.J.: A study of the characteristics of wave spectra over the seas around Korea by using a parametric spectrum method. Acta Ocean. Taiwan (AOT), 37(1), 31–46 (1998). ISSN: 0379-7481 6. Portilla, J., Ocampo-Torres, F.J., Monbaliu, J.: Spectral partitioning and identification of wind sea and swell. J. Atmos. Ocean. Technol. 26, 107–122 (2009) 7. Gerling, T.: Partitioning sequences and arrays of directional ocean wave spectra into component wave systems. J. Atmos. Ocean. Technol. 9, 444–458 (1992) 8. Chen, Z., Fan, L., Zhao, C., Jin, Y.: Ocean wave directional spectrum measurement using microwave coherent radar with six antennas. IEICE Electron Express 9, 1542–1549 (2012)
6 Three-Dimensional Characteristics Observation of Ocean …
57
9. Amartya, M., Nilanjan, D.: Smart Computing with Open Source Platforms, pp. 56–60. CRC press (2019) 10. Chen, Z., Zhang, L., Zhao, C., Chen, Z.: A practical method of extracting wind sea and swell from directional wave spectrum. J. Atmos. Ocean. Technol. 32(11), 2147–2159 (2015) 11. Chen, Z., Zhao, C., Jiang, Y.: Wave measurements with multi-frequency HF radar in the East China sea. J. Electromagn. Waves Appl. 25(7), 1031–1043 (2011)
Chapter 7
Three-Dimensional Reconstruction Based on Left Right Objects Qiang Liu
Abstract Select two images from the input image of initialization reconstruction and set the camera corresponding to the first image as the reference camera. Then other cameras were added in turn, and their position was relative to the position of the reference camera. Find the polar geometry of the newly added image, and then the points in the image that are consistent with the reconstructed three-dimensional point will be extracted to infer the correspondence between the two and three dimensions. The color information about the image is saved, the gray value of the image is extracted, and the matching and fusion of the gray image is performed. Add the saved color information on the fused gray image.
7.1 Research Background and Significance Three-dimensional (3D) reconstruction based on left and right image matching is a 3D reconstruction method that has appeared earlier and has been applied to practice. The 3D reconstruction system has different preset conditions and technical requirements in different application fields, so the kernel algorithm also has corresponding differences. It mainly includes reconstruction systems in the medical field, real-time reconstruction systems for robot navigation, industrial high-precision reconstruction systems including 3D printing, and real-time 3D reconstruction systems in photogrammetry. The basic idea of 3D reconstruction is to simulate the way human eyes handle scenes, which is simple and reliable. The three-dimensional reconstruction technology is mainly divided into active and passive types. Active type refers to the use of light sources or energy sources such as laser, sound wave, electromagnetic wave to emit to the target object, and receives the returned light wave to obtain the depth information of the object. Active measurement has four methods: Morie fringe method, time-of-flight method, structured light method, and triangulation method [1]. Passive generally Q. Liu (B) Yunnan Normal University Kunming, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_7
59
60
Q. Liu
uses the reflection of the surrounding environment such as natural light, uses the camera to acquire images, and then calculates the stereoscopic spatial information of the object through a specific algorithm [2]. Passive measurements include texture recovery shape methods, shadow recovery shape methods, and stereo vision methods. The camera hardware requirements are low and the cost is low. Instead of using a special transmitter and receiver like TOF and structured light, use a normal consumer RGB camera. It can be used both indoors and outdoors because it can collect images directly according to ambient light. In contrast, TOF and structural light can only be used indoors. In the scene with rich texture and small light variation, the application effect is better than other methods based on 3D reconstruction technology.
7.2 Technical Route This is the flowchart Get two pictures from different angles
Calibrate to get camera parameters
Get the polar geometry
Reconstruction of the final results show
Build pseudocolor diagrams
Gray image fusion registration
7.2.1 Offline Calibration In the image measurement process and machine vision application, in order to determine the relationship between the three-dimensional geometric position of a point on the surface of a space object and its corresponding point in the image, a geometric model of camera imaging must be established [3]. These geometric model parameters are camera parameters [4]. Under most conditions, these parameters must be obtained through experiments and calculations. The process of solving parameters (internal parameters, external parameters, and distortion parameters) is called camera calibration. Whether in image measurement or machine vision applications, the calibration of camera parameters is a very critical part. The accuracy of the calibration results and the stability of the algorithm directly affect the accuracy of the results produced by the camera [5]. Figure 7.1 shows the checkerboard grid, which is used for camera calibration. By collecting different angles of the checkerboard, the internal and external parameters of the camera are calibrated.
7 Three-Dimensional Reconstruction Based on Left and Right Objects
61
Fig. 7.1 Calibrated checkerboard lattice
The world coordinate system, called the measuring coordinate system, is a threedimensional rectangular coordinate system, which can be used as a reference to describe the spatial position of the camera and the object to be measured [6]. The position of the world coordinate system can be determined freely according to the actual situation [7]. The camera coordinate system is a three-dimensional rectangular coordinate system [8]. The origin is located at the optical center of the lens. The x and y axes are parallel to the two sides of the phase plane, respectively [9, 10]. The camera coordinate system is converted to the world coordinate system through the camera parameters obtained after calibration. The transformation equation is: ⎤ ⎤ ⎡ xw xc ⎥ ⎢ yc ⎥ R t ⎢ ⎥ ⎢ yw ⎥ ⎢ ⎣ zc ⎦ = 0 1 ⎣ zw ⎦ 1 1 ⎡
(1)
where is the rotation matrix of 3 * 3, is the translation vector of 3 * 1, is the homogeneous coordinate of the camera coordinate system, is the homogeneous coordinate of the world coordinate system. World coordinates are converted to pixel coordinates: ⎡ ⎤⎡ ⎡ ⎤ ⎡ 1 ⎤ XW 0 u u f 0 0 0 dx ⎢ R T ⎢ YW ⎥ ⎢ 1 v0 ⎦⎣ 0 f 0 ⎦ Z C ⎣ v ⎦ = ⎣ 0 dy 0 1 ⎣ Z W 1 0 0 1 0 0 10 1
⎤ ⎥ ⎥ ⎦
62
Q. Liu
Table 7.1 Camera parameters Parameter Perspective transformation
Tangential distortion radial distortion Internal parameter
Expression ⎡ ⎤ a γ u0 ⎢ x ⎥ ⎥ A=⎢ ⎣ 0 a y vo ⎦ 0 0 1
Degree of freedom
p1, p2 k1, k2 ⎡
4 ⎤ r11 r12 r13
5
⎡
⎤
6
tx
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ R=⎢ ⎣ r21 r22 r23 ⎦T = ⎣ t y ⎦ r31 r32 r33 tz
⎡ ⎤ XW ax 0 u 0 0 ⎢ ⎥ RT ⎢ ⎢ YW = ⎣ 0 a y v0 0 ⎦ 0 1 ⎣ ZW 0 0 10 1 ⎡
⎤ ⎥ ⎥ ⎦
(2)
where f is the focal length of the camera, generally in mm; ax and ay are pixel dimensions; u 0 and v0 are the center of the image. Table 7.1 shows the parameters of camera calibration ax = f /dx , a y = f /d y . These are called normalized focal lengths on the x and y axes.
7.2.2 Stereo Correspond According to the disparity map, the depth value is calculated by using the geometric relationship between f and B, and the three-dimensional coordinates are calculated by using the camera internal parameters. The imaging of the same point on the object on different cameras is shown in Fig. 7.2. The depth calculation formula is as follows, and the depth map can be generated by traversing the image. According to the triangle similarity law: x x −b y y z = = = = f XL XR YL YR
(1)
b is the distance between cameras, and f is the focal length. d p = disp(x, z) + (cx2 − cx1 )
(2)
7 Three-Dimensional Reconstruction Based on Left and Right Objects
63
Fig. 7.2 Disparity map
The imaging model is based on small holes, and it is known that Z and camera internal parameters can be calculated from three-dimensional point coordinates, thereby generating a three-dimensional point cloud. From Eq. (1), the following equation is obtained: ⎧ XL ⎪ ⎪ X= ·b ⎪ ⎪ XL − XR ⎪ ⎪ ⎨ y Y = ·b XL − XR ⎪ ⎪ ⎪ ⎪ f ⎪ ⎪ ⎩Z= ·b XL − XR
(3)
where disp(x, z) represents the disparity map coordinate value.
7.3 Results and Analysis Two images are selected from the input image for initial reconstruction, wherein the camera corresponding to the first image is set as the reference camera. Taking the first picture as a standard, the following pictures match the first picture in turn. The analysis process is (1) finding the polar geometry of the newly added image, which can be represented by a third-order rank-2 matrix. (2) Then extract the points of the image in which the points are consistent, and use it to infer the correspondence between the two-dimensional and three-dimensional. After each calculation, the initial reconstruction results are optimized (Figs. 7.3 and 7.4). The main function of the code of grayscale map to pseudo-color map is to make the pixels with higher brightness in the grayscale map tend to be red. The lower the brightness, the more the pseudo-color tends to be blue. On the whole, according to the level of gray value, it gradually changes from red to blue, and the middle color is green 7.5.
64
Q. Liu
Fig. 7.3 Left and right image of the object
Fig. 7.4 Gray scale map
Fig. 7.5 Pseudo-color chart
7.4 Conclusion This paper introduces the main steps of 3D reconstruction, including camera calibration, stereo matching algorithm process, and image 3D reconstruction. In the process of matching two images, the scene structure and camera movement can be recovered according to the image sequence, and the results can be optimized by bundling. As
7 Three-Dimensional Reconstruction Based on Left and Right Objects
65
Fig. 7.6 3D image display
can be seen from the results shown in Fig. 7.6, the method can better map the merged grayscale image and the color image, and reconstruct the image of the object.
66
Q. Liu
References 1. Zhang, H.; Zhang, C.; Yang, W.; Chen, C.Y.: Localization and navigation using QR code for mobile robot in indoor environment. In: Proceedings of the IEEE International Conference on Robotics and Biomimetics, Zhuhai, China, pp. 2501–2506 (2015) 2. Du, J.C., Teng, H.C.: 3D laser scanning and GPS technology for landslide earthwork volume estimation. Autom. Constr. 16(5), 657–663 (2007) 3. Nefti-Meziani, S., Manzoor, U., Davis, S., Pupala, S.K.: 3D perception from binocular vision for a low cost humanoid robot NAO. Robot. Auton. Syst. 68, 129–139 (2015) 4. Prats, M., Martínez, E., Sanz, P.J., Del Pobil, A.P.: The UJI librarian robot. Intell. Serv. Roboi. 1, 321 (2008) 5. Fuentes-Pacheco, J., Ruiz-Ascencio, J., Rendón-Mancha, J.M.: Visual simultaneous localization and mapping: A survey. Artificial Intell. Rev. 43, 55–81 (2015) 6. Will, P.M., Pennington, K.S.: Grid coding: a novel technique for image processing. In: Proceedings of the IEEE, vol. 60(6), pp. 669–680 (1972) 7. Elhakim, S.F.: Three-dimensional modeling of complex environments (2000) 8. Pandey, A., Pandey, S., Parhi, D.: Mobile robot navigation and obstacle avoidance techniques: A review Int. Robot. Autom. 2, 00022 (2017) 9. Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light, vol. 1, pp. 195–202 (2003) 10. Levoy, M., Pulli, K., Curless, B. et al.: The digital Michelangelo project: 3D scanning of large statues. In: Conference on Computer Graphics and Interactive Techniques, ACM Press/Addison-Wesley Publishing Co. pp. 131–144 (2001)
Chapter 8
Research on 3D Scene Dynamic Scanning Modeling Technology of Substation Based on Depth Vision Yu Hai, Xu Min, Peng Lin, Liu Wei, Sun Rong, Shao Jian, and Wang Gang
Abstract State Grid’s intelligent transportation inspection system has taken “self and environmental awareness” as one of its core functions. Among them, environmental modeling and intelligent reconfiguration are the basis for panoramic information collection and organization of field operations, using multiple sensing technologies and high efficiency. The intelligent image processing algorithm eliminates the need to additionally deploy base stations or sensors at the job site, and can directly form 3D model by scanning the environment directly at the job site through the portable terminal. Furthermore, the various types of information on the site are mainly spatial dimensions, and the time dimension and logical relationship are supplemented by reorganization. This technology will provide new automation and intelligent solutions for grid field operations, improve operation modes, improve work efficiency, and achieve active safety protection.
8.1 Introduction Under the traditional electric power inspection operation mode, due to the complicated operation of power equipment, the component logic is not visible, especially the highly integrated module of the intelligent substation and the more complicated logic relationship. The inspection personnel cannot complete the task independently, and the dependence on the manufacturer is high. During the operation process, dangerous situations such as misoperations and mis-entry intervals are likely to occur, resulting in casualties, while the actual operation time of the inspection operation is less than half, and it takes more than 70% of the time to prepare for work and confirm Y. Hai (B) · X. Min · P. Lin · W. Gang Global Energy Interconnection Research Institute Co., Ltd, Nanjing, Jiangsu 210003, China e-mail: [email protected] State Grid Key Laboratory of Information and Network Security, Nanjing 210003, China L. Wei · S. Rong · S. Jian Electric Power Research Institute of State Grid Jiangsu Electric Power Company, Nanjing, Jiangsu 210003, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_8
67
68
Y. Hai et al.
communication. This traditional method lacks important links to the perception and understanding of the main body of the work, the working environment, and the work target. In order to realize the autonomous modeling, cognitive understanding, and positioning navigation of the work environment, it is intended to use portable terminal equipment as information. The collection, processing, and interaction centers are guided by work tasks, and research on low-cost operating environment autonomous fast scanning and 3D modeling technology is one of the core key technologies for automation and intelligent development in the industrial field. The power industry’s intelligent operation terminal equipment and even automated operation robot equipment, research and development extended business application support system laid the foundation [1]. The specific research includes deep visual 3D scanning modeling algorithm for indoor and outdoor environments of substation operation, improving the accuracy, reliability, and efficiency of low-cost sensor pose estimation and scene reconstruction in complex and noisy environments [2]. The hybrid layered scanning modeling technology of large-scale scene solves the problems of segmented scanning, sub map splicing, topology creation optimization, synchronous object scanning and so on. The technology of region learning and power transformation environment understanding under the scene point cloud studied the clustering and segmentation of the scene point cloud, and realized the region structure semantic marking of the power transformation environment [3].
8.2 Deep Visual 3D Scanning Modeling Algorithm for Indoor and Outdoor Environments of Substation Operations In view of the fact that the sensor pose (the operator’s own pose) and the work environment map are completely unknown, the three-dimensional scanning modeling algorithm for indoor and outdoor substation working environment based on depth vision sensor in unknown complex substation environment is studied. Low-cost sensor pose estimation and scene reconstruction accuracy, reliability and efficiency in complex, noisy environments. The specific research plan to be adopted is as follows.
8.2.1 Semi-dense SLAM Based on RGB-D in Complex Indoor Working Environment Because the indoor substation environment is often complicated, the classic indoor 3D dense map construction method such as RTAB-Mapping has problems such
8 Research on 3D Scene Dynamic Scanning Modeling …
69
as poor real-time performance, large memory space consumption, and low precision. Therefore, the dense mapping method is not suitable for indoor substation environment. To this end, based on the actual indoor substation operating environment, a threedimensional semi-dense map (Semi-dense Map) scanning modeling optimization algorithm based on RGB-D sensor is studied. The classical ORB-SLAM method requires the use of sufficient point features to calculate inter-frame registration, while the substation has scenes lacking significant visual features (such as white walls) and similar scenes (such as switch cabinets), which can easily cause the SLAM process. The calculation failed. Compared with the previous methods, this study intends to improve in the framework of ORB-SLAM, the inter-frame pose optimization method of fusion point and line features is proposed, and the LSD line segment detection sub-extraction line feature is used and the registration is performed. The bundle adjustment method is used to solve the relative pose estimation between frames. At the same time, the improved closed-loop detection strategy is studied to reduce the redundant key frames stored in the background to reduce the probability of false closed-loop detection. Thereby improving the computational efficiency and modeling accuracy of the three-dimensional environment modeling method [4].
8.2.2 Sparse SLAM Based on Binocular Camera in Complex Outdoor Working Environment A two-dimensional summary (sparse) map creation method based on visual odometer is a new method proposed in this paper, which is a compromise to achieve outdoor work guidance. Low-cost vision sensors are difficult to create dense threedimensional maps of outdoor environments. For this study, this method based on visual odometer-based path/topology hybrid summary map creation is proposed [5]. The environment summary map is a hybrid map that contains both the metric path and the scene point information of interest, retains the characteristics of the topology map streamlined information, and supports global positioning and global path guidance along the job path. On this basis, research and development of the operating environment summary map creation and positioning navigation function module based on the headset and the IMU. Figure 8.1 shows the imaging principle of a binocular camera.
8.3 Hybrid Hierarchical Scanning Modeling for Large-Scale Scenes For the large-scale indoor power grid operating environment, the hybrid layered scanning modeling method is studied to solve the problems of segmented scanning, sub-map splicing, topology creation optimization, synchronous object scanning, etc.
70
Y. Hai et al.
Fig. 8.1 Binocular camera parallax principle
Online scanning and optimized environment scanning can be realized on the low-cost mobile computing system platform. The specific research plan to be adopted is as follows (Fig. 8.2). The study establishes a hybrid map model with equipment and regional information. The bottom layer is a metric semi-sparse sub-map, and the upper level is the environment topology. It also allows the environment information to be combined with device information (location, feature, name, status). The topological nodes where the area information (area type) is located are associated, thereby giving the map rich site semantic information. This paper studies a three-dimensional modeling method of multi-session indoor environment scanning based on sub-map and uses binocular/RGB-D visual sensor information to create sub-maps and anchor points using feature-based ORB-SLAM algorithm. The multi-frame scene image preservation saved by the adjacent sub-maps before and after the point is used to restore the transformation matrix between the
Fig. 8.2 Multi-stage hierarchical map creation
8 Research on 3D Scene Dynamic Scanning Modeling …
71
adjacent sub-maps, and the sparse sub-map is offline. The method supports batchto-batch and interval-based environmental scanning operations in a large-scale environment and reduces the workload of the scanning engineer to scan the entire environment at one time. On the basis of the above method, the device object recognition technology is introduced at the same time as the environment modeling to realize the labeling of the substation equipment. Algorithm design: (1) In a continuous mapping process, a sub-map is created and saved by means of a button, and a transformation relationship between sub-maps is calculated by cloud registration. (2) After the map is created, the map can be reconstructed again (multi-stage mapping), and the newly created sub-map should have an overlapping area with the already established map. The sub-map is used to detect overlapping regions between sub-maps using the DBoW algorithm. Then, the point cloud registration algorithm is used to calculate the transformation relationship between the two sub-maps with overlapping regions, thereby adding the newly established sub-map to the global map [6].
8.4 Regional Learning and Substation Environment Understanding Under Scene Point Cloud Extracting advanced object features from simplified original discrete 3D data point clouds is the basis for object recognition and location in indoor scenes. By clustering and segmenting the scene point cloud, the regional structure learning and semantic marking of the indoor substation working environment are realized.
8.4.1 Point Cloud Pretreatment The depth image data acquired by the RGB-D sensor is converted into a discrete threedimensional data point cloud, and the point cloud data is denoised and sampled. In order to reduce the amount of subsequent calculations to enhance the real-time nature of the application, it is necessary to sample the point cloud data while maintaining the characteristics of the point cloud. Commonly used point cloud data sampling methods include growth type neural gas algorithm and voxel grid method. Because the voxel grid method is simple and efficient, it does not need to establish a complex topology and can simplify the point cloud as a whole while maintaining the characteristics of the point cloud. This paper uses the voxel grid method to sample the point cloud data based on the PCL point cloud library.
72
Y. Hai et al.
8.4.2 Point Cloud Clustering The first step of understanding the scene based on point cloud is to extract the point cloud clustering information of a specific object from the point cloud image of the complex scene containing the target object through the point cloud segmentation algorithm. According to the above-mentioned segmentation algorithm, the segmentation extraction of the typical planes such as the ground and the road in the scene is realized for the characteristics of the planes in the three-dimensional indoor scene. That is, the normal vector of all points is calculated, and the Gaussian mixture model is used to cluster the normal set of the whole three-dimensional point cloud, and then the random sampling consistency algorithm is used to planarly fit each cluster, and each cluster obtains several planes, and finally split the entire point cloud data into a collection of planes. In the process of building the map, the deep learning scene recognition technology is further used to classify the scene, thereby realizing clustering and segmentation of the categories (such as control room, power distribution room, and corridor) belonging to the area in the indoor substation working environment, thereby obtain the regional functional semantic information annotation in the established map.
8.5 Conclusions Environment modeling and intelligent reconstruction are the basis of field operation panoramic information collection and organization. Based on sensor technology and image processing algorithm, without additional base stations or sensors deployed, a three-dimensional model can be formed based on portable terminals directly scanning the environment on the spot. And then realize the reorganization of all kinds of on-site information with spatial dimension as the main, time dimension and logical relationship as the auxiliary. This technology will provide new automation and intelligent solutions for grid field operation, improve operation mode, improve operation efficiency, and realize active safety protection. Acknowledgements Thanks for the support of the science and technology project to State Grid Corporation “Research on Intelligent Reconfiguration and Cognitive Technology of Complex Dynamic Work Environment Based on Deep Vision” (SGJSDK00PXJS1800444).
8 Research on 3D Scene Dynamic Scanning Modeling …
73
References 1. Stewenius, H., Engels, C., Nistor, D.: Recent developments on direct relative orientation. ISPRS J. Photogramm. Remote. Sens. 60(4), 284–294 (2006) 2. Schall, O., Belyaev, A., Seidel, H.-R.: Adaptive feature-preserving non-local denoising of static and time-varying range data. Comput-Aided Des. 40(6), 701–707 (2008) 3. Galvez-Lopez, D., Salas, M., Tardos, J.D.: Real-time monocular object slam. Robot. Auton. Syst. 2016(75), 435–449 (2016) 4. Newcombe, R.A.: KinectFusion: real-time dense surface mapping and tracking. IEEE Int. Symp. Mix. Augment. Reality IEEE 127–136 (2012) 5. Izadi, S. et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: ACM Symposium on User Interface Software and Technology ACM, pp. 559–568 (2011) 6. Durand, F., Dorsey, J.: Fast bilateral filtering for the display of high-dynamic-range images. ACM Trans. Graph. 21(3), 257–266 (2002)
Chapter 9
Fusion and Localization Technology of 3D Model Attitude and Real Physical Environment Visualization Information of Power Inspection Site Xu Min, Yu Hai, Hou Zhansheng, Li Nige, Bao Xingchuan, Peng Lin, Wang Gang, Zhu Liang, Lu Hao, Wang Zhifeng, and Zhao Peng Abstract In this paper, the closed loop detection technology of depth vision scene recognition is adopted to overcome the problem that the visual odometer cannot initially locate, and the disadvantage that the location and azimuth can only be estimated publicly, which is easy to accumulate positioning errors. Through closed-loop detection, the scene map collected at present is matched with the scene in the atlas in real time, and the binocular vision attitude adjustment algorithm is assisted to obtain the topological determination with high accuracy. In this way, the navigation and positioning accuracy based on depth vision sensor in the indoor environment of the substation is greatly improved, and the accurate positioning of substation personnel is realized. Thus, the dangerous area is delimited, and the safety control of personnel on-site operation station is realized, which greatly improves personal safety.
9.1 Introduction In the traditional power inspection operation mode, due to the complex operation of power equipment, the logic of each part is not clear, especially the highly integrated module and complex logic relationship of the intelligent station, the transportation inspection personnel cannot complete the task. Electrical equipment is independent and relies heavily on the manufacturer. In the process of operation, it is easy to have dangerous situations such as wrong operation and error interval, which cause X. Min (B) · Y. Hai · H. Zhansheng · B. Xingchuan · P. Lin · W. Gang · Z. Liang Global Energy Interconnection Research Institute, Nanjing, Jiangsu 210003, China e-mail: [email protected] X. Min · L. Nige State Grid Key Laboratory of Information & Network Security, Nanjing 210003, China L. Hao · W. Zhifeng · Z. Peng State Grid Shanxi Electric Power Information & Communication Company, Shanxi, Xian 710048, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_9
75
76
X. Min et al.
casualties. However, the patrol operation is real. Operation time is less than half, more than 70% of the time for communication preparation and confirmation. The three—dimensional model is the real—time acquisition of field environment information. Accurate positioning and adaptive navigation technology can combine operation content with experience and knowledge, provide intuitive and convenient operation guidance for operators, and improve the quality, efficiency and safety of power grid field operation. First, the 3D modeling technology of dynamic scanning based on depth vision, which can quickly and cheaply build 3D models of the work site, has the ability of large-scale promotion and application. The second is to use the portal to realize the rapid three-dimensional positioning of personnel and target equipment. Terminal equipment for operators to help them quickly arrive at the work site according to the set operation sequence. Guide the completion of work tasks, improve the efficiency of workers; Through the reconfiguration and positioning technology based on the field environment, the position of workers and the status of equipment can be determined to prevent accidents and effectively reduce accidental losses.
9.2 Binocular Posture and Visual Information Fusion Positioning Technology This paper proposes a set of effective methods to solve the preamble problem. Firstly, aiming at the indoor and outdoor scenes of substation, point cloud array is constructed by using depth vision sensor to realize the rapid modeling of field operation environment. Secondly, the word bag method is used to generate the image signature. The image signature is represented by the visual dictionary set, which is incrementally created online. Based on opencv, surface features were extracted from the point cloud array to obtain visual words. Then, the closed-loop detection algorithm is used to correct the initial positioning information to obtain accurate initial positioning information of field personnel. Finally, the relative attitude estimation algorithm is used to calculate the position of the image taken by the binocular camera and the position of the image taken in the atlas when the map is created. That is, the rotation matrix R and the translation matrix T between the two scene images are recovered by using image feature matching and registration to obtain accurate position information. It includes the following steps: Step 1: aiming at the indoor and outdoor scenes of substation, the point cloud array is constructed by using depth vision sensor to realize the rapid modeling of field operation environment. Image depth information and location and posture information of field attractions are obtained by ZedSDK provided by 3D lab, which is used to match Zed camera. ZedSDK USES the binocular images collected by a Zed camera to calculate the binocular parallax, and then recovers the depth information of the 3D scene. Then, the visual odometer output is estimated by the attitude, and the position of the current field scenic spot is determined according to the output
9 Fusion and Localization Technology of 3D Model Attitude …
77
of the visual odometer. Since the visual odometer cannot give the initial position and attitude, it is necessary to obtain additional IMU (Inertial Measurement Unit) geomagnetic measurements to determine the direction of the current system [1]. Step 2: closed-loop detection. When a closed loop hypothesis is considered valid, a closed loop link is established between the new anchor and the old anchor. There is no fusion at the closed-loop anchor point [2], but only one link is added between them. This not only preserves the different characteristics of the same point, but also allows a better estimation of future closed-loop assumptions, which are important in highly dynamic environments or in environments that change over time, such as day to night or weather. Step 3: when selecting the closed loop hypothesis, make sure to visit the current field attraction on the exploration map. Assuming that the current position is detected in the closed loop to form a closed loop, there is a field scenic spot in the map (i.e., re-visit), and the global posture of field scenic spot stored in the node is obtained [3], it is considered that the current posture is similar to the global posture retained by the midfield scenic spot in the map. Step 4: In the actual environment, although the field scenic spots detected by the closed loop are roughly in the same position as the field scenic spots in the map, it is impossible for the visual sensor to maintain the same position and posture in the two acquisition processes. In order to obtain more accurate positioning results, the relative position and attitude of the current binocular camera image acquisition position and the image acquisition position in the atlas need to be calculated when creating the map. That is, through image feature matching and registration, the rotation matrix R and the translation matrix T between the two scene images are recovered, and the accurate position information after correction is finally obtained [4]. Step 5: Divide the dangerous area, connect the auxiliary system of electric field operation with the D6000 system of production management of the national network, obtain the current live equipment and regional data, and automatically delimit the coordinates of the operation area through the current work tasks of the field workers, alarm and remind the field workers who are beyond the scope of the operation area, so as to ensure the personal safety of the field workers. 1. Users wear smart helmets for on-site operation and log in the on-site operation assistance system. 2. Open the binocular vision sensor on the smart helmet to get the dof data. 3. The point cloud array is used to calculate the position coordinates of personnel in real time. 4. By means of closed detection and attitude correction algorithm [5], the accuracy of personnel positioning coordinates is optimized to obtain real-time positioning information of field operators. 5. Docking with D6000 production management system to obtain live equipment and regional coordinate information of substation; 6. Determine in real time whether there are operators beyond the safe operation range. For the dangerous personnel beyond the scope of operation, the dangerous area shall be timely warned by means of smart helmet vibration/sound alarm, etc.,
78
X. Min et al.
and the background monitoring personnel shall be notified in time to maximize the protection of the personal safety of the field personnel. In order to verify the positioning accuracy of the wearable outdoor positioning system proposed in this paper, a rectangular area with a length of 14 m and a width of 6 m was selected as the experimental site in the school. On the boundary of the area with a circumference of 40 m, 20 locations were calibrated as scene points at intervals of 2 M. The experimental hypothesis is that 20 locations are calibrated as scene points. The location of these 20 points is known, that is, the location of these 20 calibration points is regarded as ground truth (Fig. 9.1). From Fig. 9.2, it can be seen qualitatively that the visual odometer positioning results after introducing Closed-loop Detection and pose compensation have deviations in local areas, but after Closed-loop Detection and pose compensation, the deviation is quickly eliminated. When the visual odometer deviates, the positioning error will only increase.
Fig. 9.1 Flow diagram of dangerous area division and warning algorithm
9 Fusion and Localization Technology of 3D Model Attitude …
79
1 0.9
error1 with close-loop detect and pose compensation
0.8
error2 without close-loop detect and pose compensation average error1 average error2
Error/m
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
0
5
10
Frame id
15
20
25
Fig. 9.2 Location error polygraph
9.3 Conclusions This paper introduces a method of binocular posture and visual information fusion for electric patrol scene. The closed loop detection technology of depth vision scene recognition overcomes the problem that the visual odometer cannot carry out the initial positioning, and the disadvantage that the position and attitude estimation can only be carried out publicly, which is easy to accumulate positioning errors. Through closed-loop detection, scene map and map database are acquired in real time. Scene matching and binocular vision attitude adjustment algorithm are adopted to obtain the topological positioning results with high accuracy, which greatly improves the positioning accuracy of substation indoor environment navigation based on depth vision sensor, realizes the accurate positioning of substation personnel and delimit the dangerous area. To realize the safety management and control of personnel site operation station, greatly improve the personal status of site operators. Acknowledgements This work was financially supported by the science and technology project to State Grid Corporation “Research on Intelligent Reconstruction and Cognitive Technology of Complex Dynamic Work Environment Based on Depth Vision (SGJSDK00PXJS1800444)”.
References 1. Galvez-López, D., Tardos, J.D.: Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Trans Rob 28(5), 1188–1197 (2012) 2. Angeli, A., Filliat, D., Doncieux, S., et al.: A fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans. Robot. 24(5):1027–1037 (2008) 3. Cummins, M., Newman, P.: Probabilistic appearance based navigation and loop closing. In: IEEE International Conference on Robotics and Automation, pp. 2042–2048. IEEE (2007).
80
X. Min et al.
4. Cummins, M., Newman, P.: Highly scalable appearance-only SLAM-FAB-MAP 2.0. Robot. Sci. Syst. (2009). https://doi.org/10.15607/RSS.2009.V.039 5. Mouragnon, E., Lhuillier, M., Dhome, M., et al.: Real time localization and 3D reconstruction. In: IEEE computer society conference on computer vision and pattern recognition, pp. 363–370. IEEE Computer Society (2006)
Chapter 10
The Research and Design of Big Data Monitoring Platform for Enterprise Growth Dacan Li, Yuanyuan Gong, Guangshang Tang, Guicai Feng, Yan Zhang, and Weimo Tian Abstract With the rapid increase of high-quality and top-notch enterprises in China, it is the main problem of government departments at this stage to help enterprise users optimize management and control, accelerate process innovation, and comprehensively improve product competitiveness and operating efficiency. Therefore, based on the actual demand of industrial manufacturing enterprises for big data applications, government departments need to comprehensively use the new generation of high-tech information technology, such as internet of things, cloud computing and big data, to design and develop an enterprise growth monitoring platform to meet the above challenges. The online implementation of big data monitoring platform can make full use of the existing industrial economy and industry-wide operation data resources. According to government departments’ requirements for high-quality economic development and big data monitoring of key enterprises’ growth, the construction of monitoring platform into a unified caliber of enterprise growth and operation evaluation can provide important data support for the government to accurately grasp the regional economy and the growth and operation of key enterprises.
10.1 Background Introduction With the continuous upgrading of industrial structure optimization and the landing of preferential policies for the development of high-quality and top-notch enterprises, a large number of domestic big data leading enterprises have landed, which has led to the rapid growth of regional economy. With the rapid increase of high-quality and D. Li · Y. Gong (B) · G. Tang · G. Feng Shiyuan College of Nanning Normal University, Nanning 530226, China e-mail: [email protected] Y. Zhang Hebei University of Economics and Business, Shijiazhuang 050061, China W. Tian Guizhou Shuanglong Airport Economic Zone, Guiyang 550002, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_10
81
82
D. Li et al.
top-notch enterprises, how to supervise and manage enterprises is the main problem to be solved by the government. For this reason, government departments urgently need to carry out the design and implementation of big data monitoring platform for enterprise growth. By collecting, gathering, cleaning and integrating the business data, production data, environmental data and circulation data of industrial manufacturing enterprises, and through the analysis, modeling and application mining of different dimensions, it can help enterprise users optimize management and control, accelerate process innovation, and improve the product competitiveness and operating efficiency of enterprises in an all-round way [1]. At the same time, it will help relevant regulatory authorities to strengthen the monitoring and early warning of industries, industries and markets, and improve the level of scientific decision-making and service capabilities.
10.2 Introduction to Big Data Technology Big data refers to data sets that cannot be captured, managed and processed by conventional software tools in a certain time range [2]. It is a huge, high growth rate and diversified information asset that requires a new processing mode to have stronger decision-making power, insight and process optimization ability [3]. Big data has five main technical features (proposed by IBM), as shown in Fig. 10.1. Volume: It can range from hundreds of terabytes to tens of hundreds of PB or even EB [4]. Variety: Big data includes data in various formats and forms. Velocity: That is, many big data can be processed within a certain time limit. Veracity: That is to say, the result of processing can ensure certain accuracy. Value: Big data contains a lot of deep value. Big data mining and utilization will bring huge commercial value. Fig. 10.1 Technical characteristics of big data Velocity
Volume
Value Discover Data Value
Variety
Veracity
10 The Research and Design of Big Data Monitoring Platform …
83
10.3 Functional Demand Analysis At the present stage, strategic emerging technology enterprises are the main force to build an innovation-oriented central city. However, the information barrier, weak supervision, unitary evaluation model and untimely behavior information of related enterprises have become the “pain points” of government supervision [5]. How to use big data to break the original thinking mode, improve the scientific level of supervision of state-owned enterprises, and innovate the ideas, mechanisms and measures has always been a key task of government supervision. At present, the new generation of information technologies represented by the internet, big data and artificial intelligence are changing with each passing day, bringing a significant and far-reaching impact on economic and social development, state management, social governance and people’s lives. Under the guidance of relevant government departments, it is urgent to follow the requirements of “mindfulness comes first, first easy, then difficult”, highlight problem-oriented and demandoriented, and use big data and cloud computing to build a big data monitoring platform for enterprise growth with pre-warning, in-process supervision and post-disposal [6]. According to the government’s regulatory requirements for enterprises, this platform has designed eight functional modules, which are home page management, background investigation of investment enterprises, dynamic monitoring of stock enterprises, forewarning of key enterprises, supervision of public opinion information, data collection management, data conversion and loading, operation and maintenance management. Through a platform to achieve multi-level supervision, it can promote resource sharing, data sharing, cross-validation in different industries, so that the government’s supervision of enterprises can be counted, supported and traced. The big data monitoring platform for enterprise growth has realized the monitoring of key enterprises’ operation, monitoring key production indicators of key enterprises and warning abnormal production and operation conditions. It can provide assistant decision support for investment invitation, analysis of regional industries, early warning of vulnerable industries, and provide services for investment invitation. The overall use case diagram of big data monitoring platform is shown in Fig. 10.2. The platform effectively integrates and processes massive internet data with government data by means of big data, and monitors the operation situation of enterprises. Qualitative and quantitative analysis of capacity, inventory, leverage, cost and short board is carried out to assist the government in decision-making. Integrating multi-dimensional information such as industry intelligence, market environment and enterprise reporting, it can dynamically supervise key enterprises and guide the market to find overall industry risks in time. Also, describe customer portraits, tap consumer needs, assist SMEs to operate efficiently, and provide business and financial management services. At the same time, it monitors the operation status of enterprises, intelligently warns the abnormal situation of enterprises and assists the supervision of government departments.
84
D. Li et al. Big Data Monitoring Platform for Enterprise Growth Home page management
Background investigation of investment enterprises
Dynamic monitoring of stock enterprises
Ordinary Users Data Analyst
Forewarning of key enterprises
Supervision of public opinion information
Data collection management
system administrator
Data conversion and loading
Leader
Operation and maintenance management
Fig. 10.2 Large data monitoring platform for enterprise growth use case diagram
10.4 System Development and Design 10.4.1 Overall Architecture Design The big data monitoring platform for enterprise growth designed by us is a means for government to innovate enterprise supervision and early warning. The innovation point is that the platform can build a risk monitoring model for key enterprises in the region. Through enterprise risk monitoring, it can issue early warning for enterprises that have found problems or reached the threshold value, prompt the risk level and dynamically assess the business risk, tax risk, relocation risk, and so on. The platform is designed with B/S structure. The logic architecture of the platform is designed as shown in Fig. 10.3. Presentation layer: The presentation layer mainly refers to the display page of the system [7]. The operation instructions are conveyed through the function buttons of the UI page of the presentation layer, mainly the interface of data input and feedback. Operation instructions are issued by the user through the presentation layer and analyzed and executed by the business logic layer. The business processing result information of the platform will also be fed back from the business logic layer to the presentation layer, which can be used by the users of the relevant government regulatory departments.
10 The Research and Design of Big Data Monitoring Platform …
85
The Login Interface Presentation Layer
Business Logic Layer
Data query
Data monitoring
Data analysis
Other Presentation layers
Home page management
Background investigation of investment enterprises
Dynamic monitoring of stock enterprises
Forewarning of key enterprises
Supervision of public opinion information
Data collection management
Data conversion and loading
Operation and maintenance management
Big Data Technology Data Warehouse Technology
Hadoop
Data Layer Structured data Unstructured data
Fig. 10.3 Logical architecture design diagram
Business logic layer: Business logic layer is the core layer of the system. The internal logic layer of the system contains various business functions. These functions are mainly used to handle various businesses. After the presentation layer issues instructions, the business logic layer receives and performs functional operations. If relevant data is needed, the data layer provides data to assist the business logic layer to complete business functions. Data layer: The data layer is mainly responsible for data storage, management and maintenance of platform, and also for processing various data analysis within the platform [8]. Data layer can provide basic data to satisfy the use of business logic layer, which is the basis of ensuring the operation of platform [9, 10]. The data layer can easily realize the internal data processing and quickly realize the internal data operation. Besides, it also provides the data security prevention mechanism to guarantee the system security.
10.4.2 System Functional Architecture Design Based on the analysis of business process and demand of big data monitoring of enterprise growth, we can design the system function. Compared with other monitor system, it can not only show the overall development of the enterprise, the types of enterprises focused on and the development of key industries, but also provide
86
D. Li et al. Home page management subsystem
Background investigation of investment enterprises subsystem Dynamic monitoring of stock enterprises subsystem
Forewarning of key enterprises subsystem Large Data Monitoring Platform for Enterprise Growth Supervision of public opinion information subsystem
Data collection management subsystem
Data conversion and loading subsystem
Operation and maintenance management subsystem
Fig. 10.4 Functional structure design diagram
timely analysis and early warning for the enterprises in the region, providing strong support for the industrial dispersion, layout planning and other work. The functional structure of big data monitoring platform for enterprise growth is shown in Fig. 10.4.
10.4.3 Network Topology Structure Design The application servers, web servers and database servers of the enterprise growth big data monitoring platform are all deployed in the regional government network center. Other departments can apply for access to the related data of the platform using intranet. It is necessary to rely on excellent hardware to ensure the normal and effective operation of the system in accordance with the design requirements. The network topology of large data monitoring platform is shown in Fig. 10.5.
10.5 Conclusion Big data monitoring platform for enterprise growth designed by our team is an early warning means for government to innovate enterprise supervision. It is a new management model, which builds risk monitoring model of key enterprises in this region. Through enterprise risk monitoring and management, it can provide important data
10 The Research and Design of Big Data Monitoring Platform … Network Equipment Center Hot standby server
87 Enterprise Supervisory Authority
Server
Internet User System Administrator
Switch Router
Intranet
User
User
Fig. 10.5 Network topology structure design diagram
support for the government to accurately grasp the regional economy and the growth and operation of key enterprises.
References 1. Gantz, J., Reinsel, D.: The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east, pp. 1–3. International Data Corporation Analyze the Future (2012) 2. Lee, J.: Big data technology trend. Hallym ICT Policy J.2, 14–19 (2015) 3. Kim, J.: Big data utilization and related technique and technology analysis. Korea Contents Assoc. J.10(1), 34–40 (2012) 4. Seo, Y., Kim, W.: Information visualization process for spatial big data. J. Korea Spat. Inf. Soc. 23(6), 109–116 (2015) 5. Wu, M.: The change of enterprise internal audit based on big data era. Mod. Bus. Trade Ind. 40(5), 95–96 (2019) 6. Xia, Y.: Analysis of the impact of big data on enterprise financial management. China Venture Cap. (1), 140 (2019) 7. Wang, X., Yao, K., Zhang, J., Yu, L.: On-line energy consumption monitoring system for key energy-using enterprises in typical industrial cities and its application practice. Energy of China 41(3), 22–24, 39 (2019) 8. Tian, W.: Design and Implementation of big data monitoring platform for enterprise growth. Master Thesis, Beijing Institute of Technology (2019) 9. Duan, B., Zhu, F., Xia, B., Yang, L., Zuo, H., Guo, Y.: Operation monitoring and management of power grid enterprises based on big data application. Manage. Obs. (7), 14–15, 18 (2019) 10. Xu, G., Dai, Z., Cui, X.: Research and implementation of quality and safety regulatory platform for the whole process management of wisdom construction site. Urban Geotech. Invest. Surv. 1, 37–40 (2019)
Chapter 11
Application Analysis of Big Data in Library Management and Service Huang Weining
Abstract Books are the ladder of human progress, while the library is the ocean of books and the storage place of knowledge. There are many kinds of books in the library, and its management and service have always been a big problem. Along with the technological changes in the information age, big data technology has shined in many fields, including library management and services. Applying big data technology to libraries can make information management of books, facilitate book inquiry, sorting and recycling, and improve work efficiency and service level.
11.1 Introduction The era of big data is coming, and big data technology will bring revolutionary changes to all major fields of application. This is the opportunity and challenge given by the times. At the same time, the development of the internet and the booming online e-books have also had a great impact on the library. Under this background, libraries should use big data technology to improve service levels and try their best to compete for readers and provide a foothold for themselves. Sally Feldman, President of the American Library Association, in an interview with the Chronicle of Higher Education, used the STACK project of the Syracuse University Library as an example to demonstrate the role of American university libraries in the use of big data. The Egyptian Library of Egypt established an intellectual embassy at the University of Arish [1], and Khurshid analyzed the ability and skills of university librarians to implement big data analysis in libraries [2]. The research on the practical experience of foreign big data innovation services can provide reference and reference for the service transformation of domestic libraries. However, there are relatively few specific applications for library management and services in China. This paper analyzes the application of big data in library management and services, and hopes to provide assistance to relevant workers. The library book decision analysis system in the era of H. Weining (B) Guangdong University of Science & Technology, Dongguan 523083, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_11
89
90
H. Weining
Fig. 11.1 Per capita borrowing analysis chart
big data includes various functions, such as borrowing time, borrowing distribution and per capita loan analysis, as shown in Fig. 11.1.
11.2 Big Data Technology Overview Although the era of big data has arrived, the concept of big data is not well known to the public. A common misconception is that many data is big data. In fact, big data cannot be measured by the amount of data. It is the product of the development of modern information technology. In summary, big data is a way to aggregate, retrieve, and analyze information [3]. It can extract information from a keyword or a direction, extract effective information, and perform integrated classification and correlation analysis to extract a precise, insightful data support. The outstanding feature of big data is diversification, and diversification means strong variability and real time. Therefore, the information retrieved at different times may change greatly, and this is also the externalization performance of big data visualization, precision, and instantiation. The advent of big data has helped the world’s major enterprises to achieve transformation and upgrading, and will bring about changes in many aspects such as production and operation in many fields.
11 Application Analysis of Big Data in Library Management …
91
11.3 Analysis of Library Management Service Problems 11.3.1 Computational Ability Is Weak At present, most libraries still maintain the traditional computer management system network. The system construction mainly uses the LAN system instead of the interactive all-round data management system. This directly leads to the development of the library management system behind the times, and there are disadvantages when carrying out more comprehensive management of book resources [4]. Under the social atmosphere of universal reading advocacy and people’s spiritual needs, the library has a stable audience base, and the number of people going to the library has increased, which puts higher requirements on the management and scheduling of the library. Under the above circumstances, the library should optimize the computing management system as soon as possible to improve computing power. There are already some computing management systems, such as the hdfs system. The system uses Hadoop as the basic framework, has the advantages of high fault tolerance, and can have strong flexibility and scalability, and can meet storage requirements in a large range [5].
11.3.2 Lack of Optimization of Network Resources The rapid development of the internet has led to an increase in the amount of network data stored in libraries, coupled with the proliferation of data by numerous libraries. This leads to huge information, and the traditional vertical network architecture is incapable of handling large amounts of information and cannot effectively process data. Therefore, only when combined with big data technology to optimize the storage of network resources and data, we can ensure that the management requirements are met.
11.4 The Advantages of Big Data in Library Management and Services 11.4.1 Improve the Efficiency of Collection and Integration of Books and Resources The application of big data has penetrated into all aspects of people’s lives. Skilled and accurate use of big data technology has become one of the competitive advantages of enterprises. For example, iCourt, a lawyer enterprise, has built a visual and databased business network with big data. Quickly become a leading company in the
92
H. Weining
legal profession. Big data has the advantage of being highly efficient, providing great convenience for readers and libraries in information gathering and access. There are many kinds of books in the library, and there are many kinds of books. Although there are fixed methods for books search in a certain order, these methods still need manual operation, which wastes a lot of time and energy for readers. After using big data, a library-specific database can be formed, and all the books in the library can be incorporated into the system. If the library does not have books, its network resources will be integrated into the library search system. Big data technology not only enriches resources but also enables readers to gather the books and information they need in the shortest time [6].
11.4.2 Optimize Library Service Projects The positioning of the library should be the place where the book is read and the place where the readers exchange ideas and share opinions. In the course of actual operation, each library will consider how to alleviate the anxiety brought by fastpaced life and provide people with a peaceful atmosphere of returning to books. This is the advantage of the library from online bookstores and the importance of its management services. Combined with the current popular new library business model, it is not difficult to find that the library has turned to a comprehensive leisure space and has become a cultural space for people to provide more leisure projects, such as the Tianjin Binhai Library with extremely beautiful internal structure. Give readers a good reading experience and a great casual experience.
11.4.3 Promote Resource Sharing and Form a Library Sharing System As far as the school library is concerned, the students of the school can browse the digital resources directly through the campus network, and the outsiders usually cannot browse, which makes the digital resources relatively closed, which is not conducive to the construction of the learning atmosphere of the whole people. The application of big data technology can unify various digital resources and promote the effective sharing of resources, so that all university libraries form a network of university digital library systems, and improve the utilization rate and sharing rate of resources. It can be found that the construction of university online libraries such as the Superstar Library in China is in full swing, and the resources are gradually increasing, especially the academic resources are most prominent [7].
11 Application Analysis of Big Data in Library Management …
93
11.5 The Difficulties and Solutions of Big Data Applied to Libraries 11.5.1 The Library Big Data System Is not Perfect The library is dominated by paper books, and it contains many precious books dating decades ago or even hundreds of years ago. The ranking is still the traditional way of sorting books, which is often referred to as the partition mode: the library titles are written in English letters. Grouping, readers need to go to the designated area to search for the group and the number of each book when searching for books. This makes the reader waste a lot of time searching for books, and there are many errors and problems. In addition, most libraries still tend to adopt the past operational development model, and the intervention of big data is difficult and needs to be gradual. In the face of this situation, we should not be too eager to promote big data applications. We must fully consider the issue of management inertia, conduct pilot projects in conditional libraries, and summarize the advanced experience before moving on to other libraries. Moreover, the relevant awareness of library managers has a great impact on library management, and efforts should be made to strengthen their understanding of the benefits of big data technology and enhance their awareness of using big data technology to manage libraries.
11.5.2 Lack of Skills in Librarians In the traditional library mode, librarians only need to know the book partition and book number in the library. However, after the intervention of big data, the librarian must have high information search ability and retrieval ability. In addition to finding the location of the books in the library, they also need to retrieve the information available throughout the information network. Therefore, one of the main problems of applying big data to library management and services is that the librarians who can fully master the big data search engine technology are still lacking. The related solution strategy is to speed up the professional and technical skills of library managers, and regularly conduct information retrieval and training to adapt to the new model of the big data era [8].
11.5.3 Management Mode Is Old Curing At present, people go to the library not just for reading needs, but more people are pursuing a good reading atmosphere and a quiet and comfortable reading environment. However, most of China’s library management models tend to be traditional. Administrators only provide basic services such as finding books and borrowing
94
H. Weining
books. The service model is relatively simple and cannot attract readers with higher requirements [9]. The most important thing for the library should be the creation of the overall environment, rather than a single improvement of the hardware conditions. On the basis of upgrading the facilities and equipment, we should also focus on creating a cultural atmosphere. In response to this situation, libraries should use big data to provide personalized and accurate services and readers’ exchange activities, to better meet the diverse needs of readers, to use the library’s related features to compete for attention, to expand the audience, attract more readers choose the library as the main place to read and enjoy the fun of reading.
11.6 The Conclusion Big data technology provides a new way for libraries to develop in the new era. This paper analyzes the problems existing in the current management of libraries, the application of big data in the future of libraries, and the status quo of big data applications. It can be seen that libraries not only need to have the awareness of applying big data technology, but also need to formulate reasonable strategies, make proper use of big data, and improve the ability to use big data. Only in this way can the library be in line with the times and realize the transformation and upgrading in the new era.
References 1. http://www.idcun.com/plus/list.php?tid=6&TotalResult=10111&PageNo=54.20th April (2019) 2. Ahmad, K.: An analysis of academic librarians competencies and skills for implementation of Big Data analytics in libraries, vol. 3, pp. 201–216. ProQuest (2019) 3. Yao, Y., Xu, S.H.D., Hao, Q.: Knowledge Innovation Service of University Libraries Based on Big Data (2018) 4. Research. China Electro-Chemical Education, vol. 2, pp. 110–117 (2019) 5. Lb, Sun: On the Application of big data in library management and service. Inside Outside Lantai 13, 44–45 (2019) 6. Li, D.Y.: Application analysis of big data in library management and service. J. Sci. Technol. Econ. Guide 04, 168 (2019) 7. Zhang, Q.Y.: Research on digital resource service innovation of university libraries in the age of big data. Inf. Record. Mater. 1, 163–164 (2019) 8. Li, W.: Application of big data in library management and service. Inf. Comput. (Theoretical Edition) 9, 126–127 (2019) 9. Liu, F.: The application of big data processing in library information management. Electron. World 1, 180–181 (2016)
Chapter 12
Application of Virtual Reality Technology in the Teaching of 3D Animation Course Baiqiang Gan, Qiuping Dong, and Chi Zhang
Abstract This paper starts from the teaching practice of virtual reality technology and 3D animation course and expounds the connotation and features of virtual reality technology and the curriculum teaching revolution generated by the integration of 3D animation technology. Especially focuses on the innovation of the teaching idea of 3D animation course, the abundance of teaching methods, the enthusiasm of learners, and the application of curriculum which are brought by the virtual reality technology. It provides references for researchers who involved in teaching reform of virtual reality technology and 3D animation course.
12.1 Virtual Reality Technology Virtual reality technology is an emerging simulation technology that uses a computer to transform a two-dimensional image into a three-dimensional virtual interaction scene [1], enabling learners to experience and feel the real-world situation from multiple perspectives, and immerse in the virtual environment of interactive learning. Virtual reality technology which has three features of immersion, interactivity, and imagination is widely used in architectural design, industrial design, education, medical, and other fields, providing learners with a three-dimensional virtual situation that is conducive to understanding the features of various industries. With the continuous development of virtual reality technology, the application in the field of education is particularly prominent. Many universities, education and training institutions integrated virtual reality technology into the curriculum to enhance the B. Gan (B) Guangzhou Nanyang Polytechnic, Conghua District, Guangzhou 510900, China e-mail: [email protected] Q. Dong Guangzhou College of Technology and Business, Guangzhou 510800, China C. Zhang Nanfang College of Sun Yat-sen University, Conghua District, Guangzhou 510900, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_12
95
96
B. Gan et al.
immersion, experience, and feeling of learners. This also brings opportunities and challenges to the study of curriculum reform, teaching methods, and means.
12.2 Integration of Virtual Reality Technology and 3D Animation Technology Virtual reality technology is a combination of technology and art. It can simulate and act on the learner’s visual, auditory, tactile, kinesthetic, etc. to create an immersive feeling for the learner. In addition, the various elements in 3D virtual interaction scenario which is constructed by virtual reality technology, such as houses, characters, vehicles, etc., are not only the link to reflect the real world, but also the contact point for learners to learn interactively in the virtual environment. 3D animation technology is a mature 3D simulation technology with powerful 3D modeling and 3D animation capabilities, which can effectively produce highquality models and smooth animation [2]. From the perspective of 3D modeling functions, realistic scene elements are created based on three aspects: modeling, texture mapping, and lighting simulation. From the perspective of 3D animation, 3D software such as C4D, Maya, 3Dmax, etc. can be used to simulate the various actions required by the character, but the character movements simulated by these softwares cannot interact with the learner. The integration of virtual reality technology and 3D animation technology makes up for the shortcomings between the two technologies and also provides a broad space for the development of 3D animation courses. The traditional 3D animation course teaching tends to focus on the usage of software and animation making, so the characters produced do not have emotional resonance [3]. The integration of virtual reality technology not only gives the character vitality, but the learner can also feel the reality of the character and the scene in the three-dimensional virtual interaction scene, observe the movement details of the character from multiple perspectives, and interact with the character, so that the learner can obtain the experience which cannot be obtained in real world.
12.3 The Importance of Virtual Reality Technology to the Teaching of 3D Animation Courses 12.3.1 Innovation in Teaching Ideas The 3D animation course is highly practical, contents of which come from the real project case of the enterprise. In the teaching process, a modular and project-oriented teaching method is adopted. Teachers and learners are two actively interactive individuals. It is no longer the traditional cramming method of teaching. The features of
12 Application of Virtual Reality Technology …
97
virtual reality technology can enhance the interaction between teachers and learners, motivate learners, strengthen their ability to learn independently, and enhance the emotional connection between teachers and learners, as well as connections between learners and learners.
12.3.2 Abundance of Teaching Methods Teachers mostly teach 3D animation courses with multimedia methods such as courseware, text, audio and video animation, etc. The learners use these auxiliary materials to help understand the knowledge points. Then the teacher demonstrates and the students follow the teacher’s demonstration steps to practice in order to consolidate what they are learning [4]. In this process, learners need to summarize their experience and skills according to their own understanding and feelings, and it is difficult to fully understand the intention of knowledge. The integration of virtual reality technology enriches and compensates for the shortcomings of the teaching methods of 3D animation courses. It can construct the virtual teaching situation needed for learners according to the knowledge of course [5], so that learners can observe and appreciate the concrete information presented by the situation with an all-round and multi-view perspective to understand the knowledge of course in a deeper and more thorough way.
12.3.3 Motivate Learners to Learn Independently Virtual reality technology enables learners to have an immersive experience and motivate learners to learn independently. In the 3D animation course, many abstract concepts and principles are difficult to understand through the way of teacher-teaching, and virtual reality technology presents these abstract contents in concrete form to learner [6], and the learner is guided by the teacher or enter the virtual teaching situation autonomously, optimize and reorganize the presented concrete content, and select appropriate node to learn, so as to strengthen the deep understanding and application of the knowledge.
12.3.4 Implementation of Interactive Learning The interactive nature of virtual reality technology makes the teaching of 3D animation courses more vivid and flexible. The 3D animation course itself has strong advantages. The course content is mainly based on 3D modeling and 3D animation. It can reconstruct various scenes of the real world. Generally, multimedia video is used to teach. Learners can only learn skills by repeatedly watching instructional videos. It is difficult to observe the angles and structures of models and animations,
98
B. Gan et al.
and it is impossible to learn interactively. The virtual reality technology allows the learner to see the various angles, structures, components, sizes, colors, textures, maps, lights, and other information of the 3D model in real time in the virtual scene [7]. To get real-time effects the learner can change the effect of the model and animation according to his own preferences. Moreover, the integration of course content with virtual reality technology makes the virtual scenes and virtual characters constructed more realistic. By setting interactive contacts in the virtual scene, the learner actively triggers the nodes to realize the interaction in the learning, thereby constructing the emotional connection between the learner and the virtual scene, and improving the learning efficiency of the learner.
12.4 The Importance of Virtual Reality Technology to the Teaching of 3D Animation Courses 12.4.1 Teaching of 3D Animation Course Theory The application of VR technology to the teaching of 3D animation requires three basic conditions for learners. The first is that learners can skillfully use 3D software, such as 3Dsmax, Maya, etc. The second is that learners understand the basic rules of animation movement. The third is that learners are familiar with the operation of virtual reality devices and platforms. Therefore, the author chooses Class 1 and Class 2 of Class 17 animation major as experimental subjects. They have studied 3Dsmax software foundation, graphic design, animation principles, Unity, and other courses before the start of the 3D animation course. They have the conditions for developing VR technology for 3D animation course teaching. Virtual reality technology runs through the whole process of 3D animation course teaching. Therefore, VR technology is applied to the teaching of 3D animation courses by using the “four-step experiential teaching” method to help learners complete the 3D animation course project practice in an experiential manner. Specifically, it is divided into four parts: project case display experience, virtual simulation experiment experience, interactive evaluation experience, and finished product’s display experience. The project case display experience and interactive evaluation experience refer to the immersive and interactive experience created by virtual reality technology, whose main intention is to enable learners to find problems, explore problems in the context of virtual reality teaching [8], and improve the enthusiasm for learning the content of project cases. The virtual simulation experiment experience and the finished product’s display experience are the practical experience in virtual reality technology [9]. It is mainly to cultivate learners’ practical ability in virtual reality teaching situation, experience the production in the real environment, so as to improve learners’ ability in real working environment and post. Throughout the practice teaching process, learners have both the immersive experience of the virtual environment and the practical experience of the real job. This combination
12 Application of Virtual Reality Technology …
99
Discovering and Exploring Problems in Virtual Reality Scenarios Project Case Demonstration Experience
Virtual Simulation Experiment Experience
Interactive Evaluation Experience
Finished Products Display Experience
Enhancing Practical Training and Selfexploring Learning
Fig. 12.1 The four-step experiential teaching method for VR technology applied to course teaching
of virtual and real makes the learners have different experiences every time they learn the 3D animation course. Figure 12.1 shows the four-step experiential teaching method for VR technology applied to course teaching.
12.4.2 Teaching of 3D Animation Course Practice VR technology has been applied to the teaching of many courses in our school. The most outstanding course is the mechanical design course, which uses VR technology for teaching. On the one hand, it changes the status quo of traditional teachers’ teaching and students’ operation, using 3D animation to show 2D drawings, avoiding the danger caused by operation. On the other hand, the virtual reality situation enables students to experience the real working environment and improve the initiative and operational awareness of students. According to the experience and suggestions of the implemented courses, it is better to apply VR technology to courses with strong practicality and high requirements of knowledge principle and job recognition. Therefore, adopting a four-step experiential teaching method is beneficial to the integration of VR technology into the teaching of 3D animation courses and enhances the learner’s learning experience and interest in learning. The author takes the teaching of cartoon character modeling project in the 3D animation course as an example to illustrate the teaching process of its implementation. First of all, the teacher sorts out the theoretical knowledge and skill points of the cartoon characters before teaching and prepares the corresponding teaching plan to create a virtual reality environment. In the formal class, the teacher guides the learner into the virtual reality environment and presents a cartoon character sample. The learner can split and combine the model to observe the structure and wiring of the model, explore and discover the problem. Then the teacher demonstrates the principle and key technologies of the model in the virtual reality environment. Secondly, the learner practices according to the teacher’s presentation and explanation. Thirdly, when the learner is making a cartoon model, he can interact and communicate with other learners in the virtual reality environment. Finally, the learner outputs the finished model and submits it to the virtual reality platform for display. Other learners objectively evaluate it and give suggestions. In order to test the feasibility of the above methods and the improvement of teaching effect, the author experimented
100
B. Gan et al.
Table 12.1 Data testing of two teaching methods Test items
17 animation class 1
17 animation class 2
Number of students absent from class in traditional teaching methods
6
4
Number of students absent from class in new technology teaching methods
0
0
Number of students experimented with traditional teaching methods
19
23
Number of students experimented with new technology teaching methods
30
30
Number of assignments completed by traditional teaching methods
19
21
Number of assignments completed by new technology teaching methods
30
30
on Class 1 and Class 2 of Grade 17 Animation Major which have test conditions, and each class has 30 people. Using the cartoon character modeling project in the course of three-dimensional animation as the test content, the author compared the teaching mode of traditional 3D animation and teaching method of VR technology applied to 3D animation, from the aspects of the number of students absent, the interest in learning, the learning effect, the completion rate of the homework, etc. Through data comparison, it is found that, as shown in Table 12.1, the number of students absent from class in the new technology application teaching method is much lower than that in the traditional teaching method, and the learning interest, learning effect, and homework completion rate are also higher than that in the traditional teaching mode. We issued 60 questionnaires of traditional 3D animation teaching and VR technology is applied to 3D animation teaching, and 60 questionnaires were collected. The feedback from the students is that the traditional 3D animation teaching mode is boring and difficult to concentrate on learning. The teaching method of VR technology applied to 3D animation stimulates students to learn independently, effectively motivating students’ enthusiasm and curiosity. Table 12.1 shows data testing of two teaching methods.
12.5 Conclusion From the teaching practice of three-dimensional animation course, we can see the positive influence and significance of virtual reality technology on it. Realistic virtual teaching situations stimulate the enthusiasm and initiative of learners to learn independently, and learners change from passive acceptance of knowledge to active exploration of knowledge. In the course of teaching, the author needs to create virtual reality situations and learning tasks for students, raise questions, guide students to
12 Application of Virtual Reality Technology …
101
study independently in situations and solve problems, so as to help students master the knowledge content and improve the quality of course teaching. Acknowledgements Research on Interactive Design of 3D Animation Based on Virtual Reality Technology (Project Number: 2018GkQNCX042), (Project Number: NY-2018KYYB-6), Exploration and Practice of “MOOC+SPOC” Hybrid Teaching Model Oriented to Deep Learning (Project Number: 19GGZ006), Research and Practice of Modern Apprenticeship Course Information-based Teaching Based on TPACK Theory (Project Number: NY-2019CQJGZD-01), Digital Media and Animation Professional Group Teaching Team Building (Project Number: NY-2019CQTD-03).
References 1. Chen, L.Y., Wang, B.Z.: Application of 3D virtual reality technology in animation teaching. Future Dev. 2(10), 87–89 (2011) 2. Wang, Y.Y.: Discussion on the application of virtual reality technology in college education. J. Chongqing Univ. Sci. Technol. 7(4), 89–90 (2014) 3. Meng, H.: Digital media technology curriculum practice based on 3DMax and virtual reality. Decorate 8(10), 134–135 (2018) 4. Li, X.P., Sun, Z.W., Zhang, S.G.: Research on virtual reality teaching design under the perspective of influence. China Educ. Technol. 9(12), 120–127 (2018) 5. Zhao, M.C., Sun, C.Y.: Exploration and practice of virtual simulation experiment teaching. Res. Explor. Lab. 9(4), 90–93 (2017) 6. Watanuki, K., Kojima. K.: Knowledge acquisition and job training for advanced technical skills using immersive virtual environment. J. Adv. Mech. Des. Syst. Manuf. 8(1), 48–57 (2016) 7. Slater, M., Sanchez, V.: Transcending the self in immersive virtual reality. IEEE Trans. Learn. Technol. 7(17), 255–259 (2014) 8. Limniou, M., Roberts, D., Papadopoulos, N.: Full immersive virtual environment CAVETM in chemistry education. Comput. Educ. 10(2), 584–593 (2016) 9. Garris, R., Ahlers, R., James, E.: Games, motivation and learning: a research and practice model. Simul. Gaming 9(4), 441–467 (2013)
Chapter 13
Study on Properties of Polymer Materials Based on 3D Printing Technology Weixiang Zhang and Lingming Yang
Abstract 3D printing is a new processing technology, which has developed rapidly in recent years. In addition to industrial production and civil fields, it is also widely used in aerospace, military, medical, and other fields. Material used for 3D printing usually needs to have a more prominent adhesion performance to ensure a strong bonding force between multi-layer materials, so as to support the three-dimensional forming of products. Generally speaking, polymer materials such as wire, photosensitive resin, and polymer powder can be used for 3D printing. The experimental results show that the engineering plastics printed by 3D will have obvious anisotropic characteristics. It is brittle in the transverse direction and has broken before reaching the plastic stage, but it shows obvious elastic–plastic characteristics in the vertical direction. It goes through the linear elastic stage first, and then goes through the plastic softening stage.
13.1 Introduction Three-dimensional printing is a rapid prototyping technology based on digital model. Compared with traditional material-forming methods, 3D printing has many advantages, such as wide application field, high production efficiency, and environmental protection. Materials for 3D printing usually need to have more prominent bonding properties to ensure strong bonding between multi-layered materials to support the three-dimensional forming of products. Generally speaking, polymer materials such as photosensitive resins and polymer powders can be used for 3D printing [1, 2]. In order to further improve the competitiveness of science and technology in China, it
W. Zhang (B) · L. Yang School of Civil and Environmental Engineering, Hunan University of Science and Engineering, Yongzhou, Hunan 425199, China e-mail: [email protected] W. Zhang Nantong Polytechnic College, Nantong, Jiangsu 226002, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_13
103
104
W. Zhang and L. Yang
is an important link to develop new 3D printing materials. According to the mechanism of different 3D printing technologies, the performance standards of the selected polymer materials are different. In general, besides the excellent processing and printability of the materials, the materials also need to have certain functionality, so as to realize the application of the products in various fields. For example, the products used in tissue engineering need non-toxic polymer substrates with excellent tissue biomimetic and biocompatibility. Modification of macromolecule materials and special functionality of the prepared macromolecule materials are prerequisites for expanding the application prospects of 3D printing. At present, polylactic acid (PLA) is the most widely used plastics for 3D printing. PLA is an environmentally friendly material, which can be biodegraded completely. Compared with the traditional molding technology such as injection molding, the mechanical properties of 3D printed specimens are generally poor, which seriously affects their application. Therefore, it is of great significance to study the effect of 3D printing parameters on the mechanical properties of PLA. PLA is a semi-crystalline polymer with excellent machinability and can be used for 3D printing by melt deposition. In addition, PLA has excellent biodegradability and biocompatibility, and its biological toxicity is low. It can be used in the production of 3D printing products such as tissue engineering, medical devices, and so on. Tang et al. prepared a kind of PLA material and tested its properties [3]. The results show that the melting point of PLA material is 130.65 C, the melt flow rate is 9.58 g/(10 min), and the tensile strength is 63.2 MPa. Because the PLA material has excellent flowability and mechanical properties, it is easy to form filaments and to break filaments in the process of 3D printing by melt deposition. However, the mechanical brittleness and degradability of PLA greatly limit its application. At present, the main way to improve the mechanical brittleness of PLA is to blend with inorganic, organic, or polymer materials to achieve the purpose of material toughening. For example, Zhang et al. [4] added a series of rigid particles and toughening agents to PLA matrix by melt blending method, and assisted with synergistic toughening agents to improve the toughness of materials. Besides PLA, Polycaprolactone (PCL) is also a biodegradable polyester material. Under the condition of melt processing, the viscoelastic and rheological properties of PCL are very outstanding, which can be processed by melt deposition 3D printing technology. In addition, PCL has a certain stability. The corresponding 3D printing products can be used in tissue engineering, and their life in vivo can be as long as 6 months. In addition to the common polymer materials mentioned above, photosensitive resins, polymer hydrogels, and polymer powder can also be used for 3D printing. For example, Tang et al. prepared nano-silica-modified photosensitive resins [5]. When the mass fraction of nano-silica particles was only 1%, the tensile strength, modulus of elasticity, and impact strength of the modified resins were 110 MPa, 7.8 GPa, and 4.02 kJ/m, respectively. Compared with the unmodified resins, 3D printing had higher content. The research progress of sub-materials has increased by 1.2 times, 2.9 times, and 1.1 times, respectively.
13 Study on Properties of Polymer Materials …
105
Fig. 13.1 Geometric data of the specimen (unit: mm)
13.2 Experimental Methods 13.2.1 3D Printing Specimen In this paper, acrylonitrile butadiene styrene material (ABS) is selected and melt deposition method is used for 3D printing. First, ABS material is supplied by feeder. ABS melts and extrudes when it passes through print nozzle. The extruded material solidifies rapidly when it is cooled and binds to the solidified material and accumulates layer by layer. In the process of 3D printing, because ABS is extruded by filaments, the arrangement direction of filaments may have an important influence on the mechanical behavior of materials.
13.2.2 Tensile Test According to the national standard (GB/T 1039-92), the specimens were made into dog bone specimens. The size of the specimens is shown in Fig. 13.1. After clamping the two ends of the specimens with a clamp, the loading speed of the test machine is controlled to be 2 mm/min, which can be regarded as quasi-static loading. In this experiment, displacement sensor and force sensor are used to measure the force and displacement of the part in tension, in which the two sensors are placed in the upper chuck.
13.3 Results and Discussions Figure 13.2 shows the effect of printing temperature on the tensile and bending properties of PLA. As shown in Fig. 13.2, the tensile strength and elongation at break of PLA increase first and then decrease with the increase of printing temperature. The tensile strength and elongation at break were the highest at 210 Centigrades, 57.0 MPa and 5.0%, respectively. Figure 13.1 exhibits that the flexural strength and modulus of elasticity of PLA increase first and then decrease with the increase of printing temperature. When the printing temperature is 215 Centigrades, the flexural
106
W. Zhang and L. Yang
Fig. 13.2 Effect of temperature on tensile strength
strength and modulus of elasticity of PLA are the largest, 79.6 MPa and 2556 MPa, respectively. 3D printing is a layer-by-layer printing technology. There are often weld lines between layers. The mechanical failure usually begins at the weld line. Proper printing temperature (210–215 Centigrades) can make the fusion between the PLA sample layer and the layer better, thus making up for the mechanical defects of the weld marks. In reference [6], Wang studied the relationship between the tensile properties of glass fiber and the printing temperature, and the results are consistent with that of this paper. The experimental results also show that with the increase of printing speed, the tensile strength, flexural strength, and flexural modulus of elasticity of PLA samples increase, while the elongation at break increases first and then decreases. When printing speed is increased from 40 to 80 mm/s, the tensile strength and flexural strength of PLA are increased from 55.9 MPa and 76.3 MPa to 57.8 MPa and 78.9 MPa, respectively, and the flexural modulus is increased from 2506 to 2550 MPa. With the increase of printing speed, the orientation of PLA molecular chains increases, and the tensile strength, flexural strength, and flexural modulus of elasticity increase.
13.4 Conclusion 3D printing technology is a new plastic processing technology. By modifying aliphatic polyester PLA and PCL and using them in 3D printing technology, they can be widely used in medical and biological fields; other materials with poor biocompatibility can also be modified to improve their mechanical properties, thus using 3D printing to prepare plastics for industrial, military, aerospace, and other fields. With the expansion and improvement of material types, 3D printing will gradually become the mainstream plastic processing technology. PLA and PCL have good biocompatibility and can be used in tissue engineering. They usually need to use inorganic materials to strengthen and toughen, and then 3D printing by melt deposition method; blending with other aromatic polyesters can effectively improve the dimensional stability and mechanical properties of products. Non-food-grade ABS
13 Study on Properties of Polymer Materials …
107
cannot be used in biomedical and tissue engineering fields, and ABS cannot be used in biomedical and tissue engineering fields. Dimensional stability is poor, but blending with styrene block copolymer can effectively improve the dimensional stability of materials and improve the mechanical properties of products. Acknowledgements This research was financially supported by Major Natural Science Research Projects in Colleges and Universities of Jiangsu Province (17KJA430012).
References 1. Dey, N., Mukherjee, A.: Embedded Systems and Robotics with Open Source Tools. CRC Press (2017) 2. Mukherjee, A., Dey, N.: Smart Computing with Open Source Platforms. CRC Press (2019) 3. Tang, T.M., Lu, Y.: Preparation and properties of PLA for environment-friendly 3D printing material. China Synth. Resin Plast. 32(6), 21–23 (2015) 4. Zhang, X.N., He, W.G.: Study on toughening modification of PLA. Plast. Sci. Technol. 41(6), 63–66 (2013) 5. Tang, F.L., Mo, J.H., Xue, S.L.: Study on radiation-curable composition modified by nanosilica. Polym. Mater. Sci. Eng. 23, 210–213 (2007) 6. Wang, C.H.: Properties of 3D printing glass fiber/thermoplastic polyurethane blended materials (in Chinese). Rubber Ind. 66(8), 596–601 (2019)
Chapter 14
Improved SSD-Based Multi-scale Pedestrian Detection Algorithm Di Fan, Dawei Liu, Wanda Chi, Xiaoxin Liu, and Yongyi Li
Abstract This paper proposes an improved pedestrian detection algorithm based on SSD (Single Shot MultiBox Detector) model. The algorithm mainly solves the problem that the detection effect on small-scale pedestrians is not ideal when the SSD pedestrian detection algorithm performs the pedestrian detection task. The algorithm improves the original SSD model by introducing a shallower feature map. The pedestrian detection is carried out by using the characteristics of different output layers in the model, and the multi-layer detection results are combined to improve the detection effect of small-scale pedestrian. The Squeeze-and-Excitation Module is introduced in the additional feature layer of the SSD model. The improved model can automatically acquire the importance of each channel by self-learning, according to which useful features can be enhanced and features that are not useful for current tasks can be suppressed, thereby further improving the detection ability of the algorithm for small-scale pedestrians. Experiments show that the accuracy of the proposed algorithm in the INRIA dataset reaches 93.17%, and the missed detection rate is as low as 8.72%.
14.1 Introduction With the rapid development of video image processing technology, video sequence analysis algorithms have been widely used in various fields [1]. Pedestrian detection algorithm is an important part of video sequence analysis algorithm. The technology can be combined with pedestrian tracking, pedestrian recognition and other technologies, it is applied to artificial intelligence systems, vehicle assisted driving systems and other fields [2]. At present, there are basically two kinds of algorithms for pedestrian detection: traditional methods and deep learning methods [3]. The traditional method of pedestrian detection algorithm improves the performance of the algorithm by optimizing D. Fan (B) · D. Liu · W. Chi · X. Liu · Y. Li Shandong University of Science and Technology, Qingdao, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_14
109
110
D. Fan et al.
the image features. Viola proposed a “Haar-Like+AdaBoost” algorithm to introduce rectangular filter banks for face detection into pedestrian detection tasks [4]. Felzenszwalb proposed an algorithm combining HOG (Histogram of Oriented Gradient) features with DPM (Deformable Part Model), which obtained better detection results in pedestrian detection [5]. Dollar proposed the Integral Channel Feature (ICF) method, which uses the integral graph to quickly calculate the characteristics of each class and combines it with the classifier, which improves the performance of the model and the detection speed [6]. Nam [7] further improved the ACF (Aggregated Channel Features) framework. In order to be more effectively used for classification and discrimination, each channel feature was locally correlated to obtain Locally Decorated Channel Features (LDCF). Although these algorithms have also achieved good detection results, it is difficult to adapt to the occlusion and smallsized pedestrians in natural scenes based on the feature method of manual design. Deep learning has made important progress in many areas, such as artificial intelligence and object detection. Girshick proposed a target detection algorithm based on Region-based Convolutional Network (R-CNN) [8], but the algorithm has a high computational cost and low efficiency, which limits the potential of CNN (Convolutional Neural Network). Girshick proposed the Fast R-CNN (Fast Region-based Convolutional Network) detection algorithm, which greatly improves the detection speed [9]. Kaiming He proposed Faster R-CNN (Faster Region-based Convolutional Network) detection algorithm [10], which uses a regional proposal network to propose a prediction frame mechanism. Some other detection algorithms such as YOLO (You Only Look Once) [11] and SSD (Single Shot MultiBox Detector) [12] regard the target position problem as a regression problem. This kind of method directly regresses the target position and category in the output layer, which improves the accuracy and speed of detection. However, when the target detection method based on deep learning detects multi-scale pedestrians, there are problems such as the unsatisfactory detection effect on small-scale pedestrians and high missed detection rate. Aiming at the problems of these two kinds of pedestrian detection methods, this paper improves the model based on SSD. The model adds the lower-level feature map as the detection branch network and introduces the pedestrian context information on the feature maps of different detection branches. In order to further consider the relationship between each layer of feature map channels, Squeeze-and-Excitation Module was introduced to make the model more generalized, which effectively improved the detection accuracy of small-scale pedestrians.
14.2 Improved SSD Pedestrian Detection Model The SSD algorithm is a target detection algorithm which directly predicts the target position and category. The backbone of SSD is VGG-16 (Visual Geometry Group Network-16). The auxiliary network consists of additional feature extraction layers. The network structure is shown in Fig. 14.1.
14 Improved SSD-Based Multi-scale Pedestrian Detection Algorithm
111
Fig. 14.1 SSD network architecture
In SSD detection algorithm, the feature maps of six convolution layers with different scales are used as detection layers, and the detection layer is convoluted with two 3 × 3 convolution kernels to obtain two outputs, which are the confidence for classification and the position coordinates for regression. The calculation results of each layer are combined and passed to the loss layer, which synthesizes the detection results of all detection layers and gets the final detection results through Non-Maximum Suppression. Based on the SSD model, this paper optimizes the network structure of SSD model. The network structure of the improved model is shown in Fig. 14.2. The model takes conv3_3, conv4_3, conv7, conv8_2, conv9_2, conv10_2 and conv11_2 as the detection branch networks to realize multi-scale detection of pedestrians. Secondly, the pedestrian context information is introduced on the feature layers (conv4_3, conv8_2 and conv10_2) to improve the detection performance of the model for small-scale pedestrians. Finally, the Squeeze-and-Excitation Module strategy is introduced in the conv7, conv8, conv9, conv10 and conv11 convolution modules. The model automatically obtains the weight score for each channel through selflearning. Based on these scores, the model enhances useful features for detection and suppresses useless features. Thereby further improving the ability to detect smallscale pedestrians.
Fig. 14.2 Improved SSD network structure
112
D. Fan et al.
Fig. 14.3 Characteristic fusion schematic of conv4_3 and conv5_3
14.2.1 Introduction of Context Information Small-scale pedestrians often lack sufficient feature information, which is a key issue in the field of pedestrian detection. Many domestic and foreign studies have shown that rich context information describes the intrinsic connection of the object and helps to improve the accuracy of small-scale pedestrian detection, but the increase of context information will be affected by environmental noise [13–15]. Therefore, in order to balance the influence of context information and environmental noise, feature layer (Conv4_3 and conv5_3, conv8_2 and conv9_2, conv10_2 and conv11_2) is fused and deconvolution is used to keep the size of the feature map consistent. Among them, the size of conv4_3 is 38 × 38, the size of conv5_3 is 19 × 19. Firstly, the size of conv5_3 is expanded to 38 × 38 by deconvolution operation. It is fused with conv4_3 to get the final fusion feature map conv4 and 5, and the specific operation is shown in Fig. 14.3. Since the operations of conv8_2 and conv9_2, conv10_2 and conv11_2 are similar to their operations, the operation is not repeated.
14.2.2 SSD Improvement Layer Network Structure Design In order to enhance the detection ability of the SSD detection algorithm for smallsized pedestrians, the model draws on the Squeeze-and-Excitation Module structure of Squeeze-and-Excitation networks [16]. The model learns the relationship between feature channels by self-learning and uses these relationships to increase the weight of effective features and reduce the weight of invalid features, so as to improve the detection ability of the improved model. Since the structures of the conv7, conv8,
14 Improved SSD-Based Multi-scale Pedestrian Detection Algorithm
113
conv9, conv10 and conv11 convolution modules are similar, the convolution module conv8 is taken as an example to demonstrate the improvement of the improved layer network structure. The schematic diagram is shown in Fig. 14.4. Firstly, Squeeze operation is to compress input tensor U along its spatial dimension to make 256 feature maps into a 1 × 1 × 256 sequence, which has global information to some extent. Its operation formula is shown in (14.1): zc =
W H 1 uc (i, j), U = [u1 , u2 , uc , . . . , u256 ] W × H i=1 j=1
(14.1)
where U represents the input tensor, W and H represent the width and height of the feature map, uc represents the feature map of the c-th channel and zc represents the real number of the c-th position of the sequence. Then, the Excitation operation is a mechanism similar to the gate in the cyclic neural network. By learning the parameter W to generate the weight of each channel, the correlation between the feature channels is explicitly established, and the operation formula is as shown in (14.2): s = σ (W2 δ(W1 z))
(14.2)
Here, σ is the activation function of Sigmoid, δ is the ReLU (Rectified Linear Unit) function, W1 ∈ R64×256 , W2 ∈ R256×64 , z = [z1 , z2 , zc , . . . , z 256 ] and s = [s1 , s2 , sc , . . . , s256 ]. In order to improve the detection speed of the model and implement auxiliary generalization, two fully connected layers and a nonlinear activation layer are introduced. The first fully connected layer is the dimensionalityreduction layer, which has parameter W1 and reduction ratio 4. The activation function is a ReLU function. The other fully connected layer is the dimensionality-increasing layer, which has parameter W2 . Finally, the Scale operation multiplies the scalar and the tensor by the channel to complete the re-calibration of the original features to obtain the final output tensor. The formula is shown in (14.3).
Fig. 14.4 Improved conv8 schematic diagram
114
D. Fan et al.
u˜ = sc × uc
(14.3)
Here, U˜ = u˜ 1 , u˜ 2 , u˜ c , . . . , u˜ 256 , U˜ represents the output tensor. uc represents the feature map of the c-th channel, sc represents the importance of the c-th feature map. u˜ c refers to multiplication between sc and uc . Since Conv8_1 is similar to Conv8_2, the process will not be described again.
14.3 Model Training and Analysis of Experimental Results 14.3.1 Data Preprocessing and Parameter Setting This paper uses the pedestrian data in the PASCAL VOC dataset and the manual annotation data to amplify the INRIA dataset for the auxiliary training of this model. The INRIA pedestrian dataset is a standard static pedestrian detection dataset. The dataset is accurate and the scene is rich. It consists of a training set and a test set. Among them, the training set contains 614 images (including 2416 pedestrian samples) and the test set contains 288 images (including 1126 pedestrian samples) [17]. The PASCAL VOC dataset is a standard dataset for object classification and detection. This paper extracts 9583 images containing pedestrians and corresponding annotation information from the dataset as a part of the training set. In addition, this paper uses 2000 real-scenes pictures provided by Pengying Wu, and manually marks the pedestrian samples such as posture changes, occlusions and small targets in the picture to further expand the training set [18]. The experiment is based on the Ubuntu 1604 operating system and Nvidia GTX TITAN (12 GB) discrete graphics to build an experimental platform, using TensorFlow open source framework and Python language programming. During training, the size of the input image is 300 × 300. In order to make the algorithm meet the requirements of the pedestrian detection task, the experiment sets the aspect ratio of the default box to 1:1, 2:1, 1:2, 3:1, 4:1. The initial learning rate is 0.001, the number of iterations is 50,000, the learning rate is 0.0001, the number of iterations is 80,000, the learning rate is 0.00001 and training is terminated after 100,000 iterations. The experiment chooses Adam optimizer for network training, the batch size is 32, weight attenuation coefficient is 0.0005 and the momentum parameter is set to 0.95.
14.3.2 Performance Evaluation Index The trained model in this paper is tested and evaluated on the INRIA data set. When the model is evaluated, it is assumed that Bdt is the final detection box and Bgt is the real detection box. When the overlap rate (IOU) between Bdt and Bgt is greater than 0.5, the detection box and the real box match. The IOU calculation formula is
14 Improved SSD-Based Multi-scale Pedestrian Detection Algorithm
115
as shown in (14.4): area Bdt ∩ Bgt IOU = area Bdt ∪ Bgt
(14.4)
Formula: Bdt is the final detection box and Bgt is the real detection box. IOU is the overlap rate between Bdt and Bgt .area which represents the area between Bdt and Bgt . When the detection frame (Bdt ) matches the real box (Bgt ), the unmatched detection frame (Bdt ) is considered to be a false detection frame, and the unmatched real frame (Bgt ) is considered to be a missing frame. According to the evaluation method provided by Dollar, we calculate the False Positives Per Image (FPPI) and Miss Rate of each image to evaluate the performance of the model. Among them, the formula of FPPI and Miss Rate is shown in (14.5) and (14.6): FPPI =
FP TN + FP
(14.5)
Formula: TN indicates the number of non-pedestrian samples, FP indicates the number of non-pedestrian samples determined as pedestrian samples and FPPI is average per image number of false detections. Miss Rate =
FN FN + TP
(14.6)
Formula: TP indicates the number of pedestrian samples. FN indicates the number of pedestrian samples determined as non-pedestrian samples. Miss Rate indicates the missed detection rate of each image.
14.3.3 Experimental Results In order to verify the effectiveness of each improvement part of this paper, three models are set up for experimental verification. Model 1: Introducing pedestrian context information on three feature maps (conv4_3, conv8_2 and conv10_2) of the SSD model. Model 2: conv3_3, conv4_3, conv7, conv8_2, conv9_2, conv10_2 and conv11_2 are used as detection branch networks. Model 3: The Squeeze-andExcitation Module strategy is introduced in the conv7, conv8, conv9, conv10 and conv11 convolution modules of the SSD model. The test chart of the three models on the INRIA test set is shown in Fig. 14.5. According to the test results, the missing rate of three models on the INRIA test set is significantly lower than that of the SSD model, which proves that the improvement parts of the detection model can improve the detection performance of the model.
116
D. Fan et al.
Fig. 14.5 Test diagrams of different models on INRIA datasets
In this paper, the popular multiple detection algorithms are compared with the improved SSD algorithm on the INRIA dataset and the different algorithm comparisons are shown in Fig. 14.6.
Fig. 14.6 Miss Rate-FPPI curves of different algorithms on the INRIA dataset
14 Improved SSD-Based Multi-scale Pedestrian Detection Algorithm
117
Fig. 14.7 Test effect chart
Experiments show that compared with the traditional pedestrian detection algorithm, the miss detection rate of the algorithm in this paper has different degrees of reduction. Compared with the deep learning detection algorithm, the miss detection rate is also significantly reduced, which indicates that the improved algorithm in this paper has a better detection effect on the INRIA dataset and can obtain better detection results. It shows that the improved SSD model considers the shallow features and deep features of pedestrians, and also considers the relationship between different feature layers and the relationship between channels in the same layer, which improves the detection effect on small-scale pedestrians and occluded pedestrians. It is of great significance for solving the detection problems of illumination, occlusion and small-scale pedestrians in real scenes. In this paper, several pedestrian images in real scenes are selected for experimental testing, and the test results are shown in Fig. 14.7. It can be seen from the test effect diagram that pedestrians in the test chart have different degrees of illumination intensity, occlusion and scale difference, the detection method of this paper can obtain accurate detection results, which basically meets the pedestrian detection task under the real scene.
14.4 Conclusion In this paper, an improved SSD-based pedestrian detection algorithm is proposed for the SSD model, which is not accurate for small-scale pedestrian detection. In the basic network part of SSD, the shallower feature map is introduced as the detection branch network. Context information is introduced into different detection branch networks (Conv4_3, Conv8_2, Conv10_2) to enhance the intrinsic connection of the object and improve the accuracy of small-scale pedestrian detection. The ability of the model to detect small-scale pedestrians is further improved by modeling the interrelationships between the channels in each feature layer. Experiments show that
118
D. Fan et al.
the improved method has better performance and higher accuracy than the SSD model, especially for small-scale pedestrians. In the future, we can further study the simplified model network structure to improve the detection speed without affecting the detection accuracy.
References 1. Pal, G., Rudrapaul, D., Acharjee, S.: Video shot boundary detection: a review. In: Emerging ICT for Bridging the Future-Proceedings of the 49th Annual Convention of the Computer Society of India CSI, vol. 2, pp. 119–127. Springer, Cham (2015) 2. Li, X.D., Ye, M., Li, T.: Summary of research on target detection based on convolution neural network. Comput. Appl. Res. 34(10), 2881–2886+2891 (2017) 3. Benenson, R., Omran, M., Hosang, J., Schiele, B.: Ten years of pedestrian detection, what have we learned? In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 613–627. Springer, Cham (2015) 4. Viola, P., Jones, M., Snow, D.: Detecting pedestrians using patterns of motion and appearance. Int. J. Comput. Vision 63(2), 153–161 (2005) 5. Felzenszwalb, P., Girshick, R.B., McAllester, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009) 6. Dollr, P., Tu, Z., Perona, P.: Integral channel features. In: British Machine Vision Conference, pp. 91.1–91.11, London (2009) 7. Nam, W., Dollár, P., Han, J.H.: Local decorrelation for improved pedestrian detection. In: Advances in Neural Information Processing Systems, pp. 424–432 (2014) 8. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014) 9. Girshick R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015) 10. Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015) 11. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016) 12. Liu, W., Anguelov, D., Erhan, D.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Cham (2016) 13. Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007) 14. Divvala, S., Hoiem, D., Hays, J.: An empirical study of context in object detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE press, Miami, pp. 1271– 1278 (2009) 15. Fang, S., Li, Y., Liu, X., Fan, D.: An improved multi-scale face detection algorithm based on SSD model. Inf. Technol. Inf. 02, 39–42 (2019) 16. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 17. INRIA Person Dataset.: http://pascal.inrialpes.fr/data/human/. Accessed 19 Sep 2018 18. Wu, P.Y., Zhang, J.M., Peng, J., Lu, C.Q.: Research on man detection in real scene with multilayer convolutional characteristics. J. Intell. Syst. 02, 306–315 (2019)
Chapter 15
Activity Recognition System Optimisation Using Triaxial Accelerometers Zhenghui Li, Bo Li, and Julien Le Kernec
Abstract Activity recognition is required in various applications such as motion analysis and health care. The accelerometer is a small, economical, easily deployed and high-performance sensor, which can continuously provide acceleration data from the body part it is worn. Previously, the human activity recognition system researches utilising triaxial accelerometer have mainly focused on one placement of the sensor, and rarely states the reason for the choice of sensor placement. This paper presents an optimisation method utilising a triaxial accelerometer when the sensor is placed on different body parts. The statistical characteristics-based algorithms use data from a motion-captured database to classify six classes of daily living activities. Feature selection is performed using the principal component analysis (PCA) from a range of features. Robust and sensitive features that highly contribute to the classification performance are selected. Activity classification is performed using the support vector machine (SVM) and K-nearest neighbour (K-NN) and the results are compared. Based on the HDM05 Mocap database with six activity types (overall 89 motions) collected from five subjects, the best place for wearable accelerometers is the waist, followed by chest, head, left wrist, right wrist, humerus and femur. Based on the preliminary results, multi-accelerometers and data fusion methods are utilised for further increasing the accuracy of classification, where the accuracy increases by 6.69% for SVM and 7.99% for KNN. For two sensors, the best placements for sensors are the waist with the left wrist, followed by the waist with the right wrist, waist with chest, waist with the humerus, waist with head and waist with femur. The result provides a guideline for sensor placement when developing an activity recognition system.
Z. Li Glasgow College, University of Electronic Science and Technology of China, Chengdu 611731, China B. Li (B) College of Science, Chongqing University of Technology, Chongqing 400054, China e-mail: [email protected] J. Le Kernec School of Engineering, University of Glasgow, Glasgow, UK © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_15
119
120
Z. Li et al.
15.1 Introduction People wish to increase their quality of life and live as long as possible, which is pushing towards new healthcare delivery models [1]. In recent years, the number of seniors who live alone keeps increasing. The risks of multiple chronic diseases and critical events such as fall pose a significant issue to their life expectancy. The current worldwide condition is pushing towards new healthcare models, and different solutions have been trialled to resolve these issues. The detection and recognition of human activities provide valuable information that can be leveraged as a part of the healthcare system. With advanced technology in micro-electromechanical system (MEMS), many connected objects, like a watch or a ring, can be designed with embedded sensors for monitoring the activities of an individual. The information of activity recognition can be looked at the macrolevel to infer the patter of life, finer information on gait metrics and identify the person [2, 3]. Also, the human activity recognition technologies are extensively used in the entertainment industry, such as films, animations and games [4, 5]. Computing and sensing technologies can be combined with innovative perspectives to achieve a motion classification system, and thus three main types of technologies are applied in this field: wearable devices, camera-based [6] and radar sensor [7]. The camera-based or vision-based approach is one of the most researched, as it stems from the prominent field of computer vision, which provides a more complex and practical framework [6]. However, it is perceived as an invasion of privacy and potential dispute over image rights is a deterrent for the use of this technology for home use. Additionally, it is easy to be affected by lighting condition. In the case of both weak light and strong light, cameras cannot guarantee the quality of the images. In the night scenario, the vision-based method cannot work properly without a light source [8, 9]. Radar systems are relatively new in this area, with the advantage of high-accuracy, safety and non-obstructive illumination [7]. It transmits a signal that interacts with the target and is backscattered to the radar [10]. The main approaches are using micro-Doppler signatures, range information, range-Doppler and Cadence velocity diagrams or combinations of those representations for activity classification. Radar sensors collect the signal instead of real image, which greatly reduces the risk of invasion of privacy. However, radar systems must be placed at a fixed location, and the classification performance heavily relies on the position and number of radars. Wearable devices are mobile sensors, where activities are monitored in a more firstperson perspective. Concurrently, many wearable devices include both capabilities of data collection and data analysis. Compared with other methods, wearable sensors are considered as a more economical and more flexible approach in human activity recognition, with its ease of deployment, capability for providing continuous tracking and no invasion of privacy. In this paper, we mainly focus on human activity recognition using accelerometers. Acceleration is one of the most direct ways of reflecting the movements of the human body. Over the last decades, accelerometers have become small, low cost and high performance. Their accuracy is continuously improving, due to the maturity of the
15 Activity Recognition System Optimisation …
121
manufacturing and measurement technology [11]. Accelerometers are available in single, dual and triaxial sensors. The triaxial accelerometers are most frequently used. They can output three mutually perpendicular directions, which means they can decompose any motion into three directions. This makes the recognition task more reliable and effective. Therefore, accelerometers are adopted in this project to achieve the task of human motion classification. The studies based on using triaxial accelerometer for human motion classification started several years ago and have made great achievements [12]. The recognition can be generally classified as two types [13]: threshold-based [14] and machine learning approach. Examples of machine learning approaches that are frequently used for human activity are hidden Markov model [14], support vector machine [14] and K-nearest neighbour [15]. Many studies focused on the algorithm optimisation and feature processing (extraction and selection) optimisation. However, the accuracy of results is also heavily dependent on sufficient raw acceleration data. Each body part has its own acceleration when people are doing activities, and one position may produce the same accelerations for completely different motions, which leads to false alarms. The model based on the acceleration data obtained from various body parts has an entirely different performance. Thus, this study is aimed to place sensors on diverse body parts and utilise data fusion technology to optimise the performance of classification [16, 17]. The organisation of this paper is as follows: Sect. 15.2 presents a review of existing studies on accelerometer for activity recognition. Section 15.3 introduces the setup of data collection and processing. Section 15.4 reports the results. Section 15.5 provides the conclusion based on the result.
15.2 Related Works A large body of research concentrates on activity recognition using accelerometers. Initially, the accelerometers were mainly placed at the waist because it was the central part of our body, providing more stable data than any other part. Mantyjarvi et al. [18] used a belt accelerometer to acquire signals and then generated feature vectors. The feature sets were very limited and only one classification algorithm was implemented. With the tendency of electronic device, such as e-watches and better hardware support, the placements of accelerometers were expanded, leading to the increase of classification accuracy [12]. Besides, more classification algorithms were introduced. Naranjo-Hernández et al. [19] designed a system, where they placed an accelerometer smart sensor at the lower back. The monitoring was achieved in a device’s holistic manner with three processing modules. With the improvement of classification due to the choice in sensor placement and advances in classification algorithm, more research tends to be more complex using more features, more locations and fusion to achieve better results. These works employed more complex algorithms such as supervised learning technologies to
122
Z. Li et al.
tackle classification problems. Lee et al. [20], proposed a one-dimensional convolutional neural network (1-D CNN) for recognising human activity using triaxial accelerometer data. Zhang et al. [21] compared the classification performance among four different classifiers (K-nearest neighbour, naive Bayesian classifier, support vector machine and their sparse-representation-based classification). Pannurat et al. [13] placed the sensors at different body parts. They ranked the performances of several joints for fall detection at different stages in certain conditions. They found that the waist had the highest accuracy of classification. Li et al. [14] used a wearable device with data fusion approach. The device involves an accelerometer, a gyroscope, a magnetometer, an inertial sensor and a radar sensor. The overall performance was much better after using data fusion technologies. However, the study on classification optimisation and sensor placement is insufficient. Therefore, this paper is aimed to optimise the performance of recognition according to the placement of sensor and data fusion technologies with SVM and KNN algorithms. The more information about how the simulation is set up will be shown in the next section.
15.3 Data Description HDM05 database provides researchers with both ASF/AMC (skeleton-based) format and C3D (3D trajectory-based) format. ASF/AMC format is chosen for two reasons. The bone length is a constant and ASF/AMC does not have many redundant markers [22]. The ASF file records skeleton (see Fig. 15.1) data and the AMC file records motion data. Fig. 15.1 Skeletal Kinematic chain model consisting of rigid bones that are flexibly connected by joints, which are highlighted by circular markers and labelled with joint names [22]
15 Activity Recognition System Optimisation … Table 15.1 List of human activities
Table 15.2 Table of features of acceleration for each joint (without feature selection)
Number
123 Activity description
I.
Grabbing up
II.
Jumping
III.
Sitting
IV.
Standing
V.
Running
VI.
Walking
Features
No.
Mean
3
Standard deviation
3
RMS (root mean square)
3
Variance
3
Range
3
Minimum
3
Median absolute deviation The number of features
3 21
In this paper, seven joints are chosen: wrist, left wrist, head, chest, waist, right femur and right humerus. Five human subjects performed all the activities which were recorded at constant frequency of 120 Hz. There are overall 89 motion files, including six types of activities, which are grabbing, jumping, sitting, standing, walking and running (Table 15.1). Acceleration data was obtained using MOCAP toolbox [23]. Extracting the effective features from the original acceleration data guarantees that the result of classification is efficient and accurate. In this case, features were extracted from pre-processed acceleration data and were summarised in Table 15.2. The principal component analysis is a common and effective data dimension reduction method in human activity recognition. The core of PCA is the principle of variance maximisation of the covariance matrix. The original vectors can be replaced by fewer vectors using PCA. Those new transformed vectors are linearly independent, which are also called the principal component. Meanwhile, the information contained in the original vectors is preserved as much as possible. After using PCA, the dimension of feature space is reduced from 21 (Table 15.2) to 7.
124
Z. Li et al.
15.4 Results and Analysis The prepared dataset is first used to evaluate the performance of the SVM classifier. In this paper, a quadratic-kernel SVM classifier is used. Then, the classifier will be changed to a weighted KNN with K = 10. The average classification performance is shown below. Note that all the training sets were processed by PCA and the dimension of the feature space is seven. From Table 15.3, it is clear that the principal component analysis greatly reduces the running time 20.3% and 53.2% for SVM and KNN, respectively. Tables 15.4 and 15.5 show the confusion matrix of classification at the waist. Standing and jumping are the most difficult pair to recognise among all activities. Besides, sitting is also easily confused with these two activities. In fact, not only for the waist, but also for other sensor placements, there three activities are difficult to identify, especially for standing and jumping. To further improve our classification result, two accelerometers are concurrently utilised for collecting data. With data fusion technology, the features extracted from both accelerometers are fused at feature level and fed into SVM and KNN. The waist is kept as it has been the best performance. The features extracted from the waist will be combined and fused with features from the other six parts. That is, Table 15.3 A running time comparison between before and after using PCA for both algorithms
Running time before PCA (s)
Running time after PCA (s)
SVM
2.7562
2.1957
KNN
2.0795
0.9735
Table 15.4 Confusion matrix for results from the waist and SVM algorithm
I II III IV V VI
I 15
II
III
IV
14 1
2 1 13
10 1
V
VI
15 1
14
Table 15.5 Confusion matrix for results from the waist and KNN algorithm
I II III IV V VI
I 14
II
III
IV
V
2 2 12
1
12 1
9 2 1
VI 1 1
14 1
14
15 Activity Recognition System Optimisation …
125
there are six combinations, which are the waist with chest, the waist with head, the waist with left wrist, the waist with right wrist, the waist with humerus and the waist with femur (Fig. 15.2). Figure 15.3 shows the accuracy of six combinations of sensor placements. The waist data remains as a control group to compare with fused data classification performance. The Waist+left wrist has the best performance, followed by the waist+right wrist, waist+chest, waist+humerus, waist+head and waist+femur. Generally, the SVM algorithm still has better performances than the KNN algorithm. After data
Fig. 15.2 A performance comparison on accuracy of classification for the SVM and the KNN
Fig. 15.3 A performance comparison on accuracy of classification for the SVM and the KNN (using data fusion)
126
Z. Li et al.
Table 15.6 Confusion matrix for results from the waist and SVM algorithm (with data fusion)
I II III IV V VI
I 15
II
III
IV
14
1 1 14
11 1
V
VI
15 15
Table 15.7 Confusion matrix for results from the waist and KNN algorithm (with data fusion)
I II III IV V VI
I 15
II
III
IV
13 1
1 1 13
11 1 1
V
VI 1
14 15
fusion, the accuracy of classification is higher than the classification without data fusion. Tables 15.6 and 15.7 are the confusion matrices for the waist+left wrist. From these two tables, it can be observed that the false alarms among standing, jumping and sitting have greatly decreased. Besides, for SVM algorithm, the grabbing, walking and running can be recognised perfectly, which means the accuracy is 100%. The accuracy of easily mis-recognised motions (standing, jumping and sitting) increases greatly, which increases by 5.00% and 10.1% for SVM and KNN, respectively. The left wrist and the right wrist are separately considered in this paper because we want to investigate the influence of wearing the sensor on different sides. The classification result shows that models trained by left wrist data and right wrist data almost have the same accuracy, where the difference of accuracy is less than 1%. That means the effect for different sides is negligible.
15.5 Conclusion In this paper, classification approaches and data fusion methods for discriminating human daily movement are proposed. All 89 data samples from the HDM05 database, including six classes, were used for acquiring accelerations and extracting features. Both SVM and KNN algorithms were used for classification. The preliminary results show that the performance of SVM is generally better than KNN. It is also clear that for the single accelerometer condition, the placement at the waist is optimum for putting the wearable devices, followed by the chest, head, wrist (both left and
15 Activity Recognition System Optimisation …
127
right), humerus and femur. After using data fusion methods, the overall accuracy at the waist is 92.75% for SVM and 85.82% for KNN. In addition, the average accuracy is increased to 94.60% and 89.60% from 87.91% and 81.61% for SVM and KNN, respectively. The proposed sensor placement for a single sensor is the waist. If wearing more than one sensor, the proposed placements would be the waist and the wrist (left or right). This paper gives a preliminary guideline on the detection performance for sensor positions. Acknowledgements This work is supported by the Natural Science Foundation Project of Chongqing Science and Technology Commission (Grant No. cstc2015jcyjBX0113).
References 1. Pang, Z., Zheng, L., Tian, J., Sharon, K., Dubrova, E., Chen, Q.: Design of a terminal solution for integration of in-home health care devices and services towards the internet-of-things. Enterp. Inf. Syst. 9(1), 86–116 (2015) 2. Cippitelli, E., Fioranelli, F., Gambi, E., Spinsante, S.: Radar and RGB-depth sensors for fall detection: a review. IEEE Sens. J. 17(12), 3585–3604 (2017) 3. Chen, Z., Li, G., Fioranelli, F., Griffiths, H.: Personnel recognition and gait classification based on multistatic micro-doppler signatures using deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 15(5), 669–673 (2018) 4. Yuksek, M., Barshan, B.: Human activity classification with miniature inertial and magnetic sensor signals. In: 19th European Signal Processing Conference, pp. 956–960, Spain (2011) 5. Jovanov, E., Milenkovic, A., Otto, C., Groen, P.: A wireless body area network of intelligent motion sensors for computer assisted physical rehabilitation. J. Neuro-Eng. Rehabil. 2(1), 1–10 (2005) 6. Mubashir, M., Shao, L., Seed, L.: A survey on fall detection: principles and approaches. Neurocomputing 100, 144–152 (2013) 7. Amin, M.G., Zhang, Y., Ahmad, F., Dominic, K.C.: Radar signal processing for elderly fall detection: the future for in-home monitoring. IEEE Signal Process. Mag. 33(2), 71–80 (2016) 8. Xue, Z., Ming, D., Song, W., et al.: Infrared gait recognition based on wavelet transform and support vector machine. Pattern Recogn. 43(8), 2904–2910 (2010) 9. Ashour, A.S., Beagum, S., Dey, N., et al.: Light microscopy image de-noising using optimized LPA-ICI filter. Neural Comput. Appl. 29, 1517 (2018) 10. Chen, V.C., Li, F., Ho, S., Wechesler, H.: Analysis of micro-Doppler signatures. IEEE Proc.Radar, Sonar Navig. 150(4), 271–276 (2003) 11. Yu, Y.: Current situation and development trend of silicon micro-resonant accelerometer. Sci. Technol. Innov. 4(1), 22–23 (2019) 12. Cornacchia, M., Ozcan, K., Zheng, Y., Velipasalar, S.: A survey on activity detection and classification using wearable sensors. IEEE Sens. J. 17(2), 386–403 (2017) 13. Pannurat, N., Thiemjarus, S., Nantajeewarawat, E.: A hybrid temporal reasoning framework for fall monitoring. IEEE Sens. J. 17(6), 1749–1759 (2017) 14. Li, H., Shrestha, A., Heidari, H., Le Kernec, J., Fioranelli, F.: A multisensory approach for remote health monitoring of older people. IEEE J. Electromagn. RF Microw. Med. Biol. 2(2), 102–108 (2018) 15. Gupta, P., Dallas, T.: Feature selection and activity recognition system using a single triaxial accelerometer. IEEE Trans. Biomed. Eng. 61(6), 1780–1786 (2014)
128
Z. Li et al.
16. Dey, N., Ashour, A.S., Beagum, S., Pistola, D.S., Gospodinov, M., Gospodinova, E.P., Tavares, J.M.R.: Parameter optimization for local polynomial approximation-based intersection confidence interval filter using genetic algorithm: an application for brain MRI image de-noising. J. Imaging 1(1), 60–84 (2015) 17. Jagatheesan, K., Anand, B., Samanta, S., Dey, N., Ashour, A.S., Balas, V.E.: Particle swarm optimisation-based parameters optimisation of PID controller for load frequency control of multi-area reheat thermal power systems. Int. J. Adv. Intell. Paradig. 9(5–6), 464–489 (2017) 18. Mantyjarvi, J., Himberg, J., Seppanen, T.: Recognizing human motion with multiple acceleration sensors. In: 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236), vol. 2, pp. 747–752. USA (2001) 19. Naranjo-Hernández, D., Roa, L.M., Reina-Tosina, J.M., Estudillo-Valderrama, Á.: SoM: a smart sensor for human activity monitoring and assisted healthy ageing. IEEE Trans. Biomed. Eng. 59(11), 3177–3184 (2012) 20. Lee, S., Yoon, S., Cho, H.: Human activity recognition from accelerometer data using Convolutional Neural Network. In: IEEE International Conference on Big Data and Smart Computing (Big Comp), pp. 131–134, Jeju (2017) 21. Zhang, M., Sawchuk, A.A.: Human daily activity recognition with sparse representation using wearable sensors. IEEE J. Biomed. Health Inform. 17(3), 553–560 (2013) 22. Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation Mocap Database HDM05. Technical report, No. CG-2007–2, ISSN 1610-8892. Universität Bonn (2007) 23. Burger, B., Toiviainen, P.: Mocap toolbox—a MATLAB toolbox for computational analysis of movement data. In: R. Bresin (ed.), Proceedings of the 10th Sound and Music Computing Conference (SMC), Sweden (2013)
Chapter 16
Near-Duplicate Video Detection Based on Temporal and Spatial Key Points Diankun Zhang, Zhonghua Sun, and Kebin Jia
Abstract With the development of the Internet and video editing technologies, there are a large number of near-duplicate videos on the Internet today. This can cause a lot of trouble in video content retrieval and copyright protection. It is time-consuming to manually classify a large number of near-duplicate videos A method is proposed here to automatically recognize and classify near-duplicate videos based on temporal and spatial key points. This method extracts key frame, proportion of video segment, average gray level and average segmentation ratio as the video key information, which is used to identify the approximate video. For near-duplicate video, this method has a good effect.
16.1 Introduction Near-duplicate videos are identical or approximately identical videos close to the exact duplicate of each other, but different in some details [1]. With the development of video editing and the web sharing technology, video copyright issues on the Internet have become more serious, because some people use copyrighted video resources for editing and re-publishing [2]. Therefore, an efficient and accurate nearduplicate video classification method is very necessary. Current video classification methods are various and each of them has their own characteristic. The first is content-based classification method, which provides best classification results for content retrieval [3]. The second is deep-learning-based video classification algorithms. These methods mainly achieve classification by extracting abstract features between data. For example, some techniques based on convolutional neural networks have a good effect on classification in the situation with a large number of videos [4]. These classification methods are certainly good, D. Zhang · Z. Sun · K. Jia Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China Z. Sun (B) · K. Jia Beijing Laboratory of Advanced Information Networks, Beijing 100124, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_16
129
130
D. Zhang et al.
because they can classify some abstract concepts. But for near-duplicate videos, they are usually not suitable because they are too complicated. To represent a whole video, a global signature is proposed [5, 6], which can be efficiently managed and calculated. But these kinds of features are not suitable for retrieving near-duplicate videos. Taking both spatial and temporal information into consideration, some methods are proposed to lower the matching cost [7–9]. However, the performance of these kinds of methods is weak in strong spatial transformations which is often occurring in near-duplicate videos. In this algorithm, the extraction of video features mainly includes two parts, video segmentation and segment feature extraction. Compared with the above method, this method is more suitable for near-duplicate video classification because it is simpler, faster, and does not require a lot of sample training; moreover, it is also very accurate. In the experiments, the algorithm for CC_WEB_VIDEO [1] recall ratio and the precision of the data set rate of is more than 80%. As the result showed in Sect. 16.4, we can find that the average time cost for a few minutes of SD video is less than the length of video itself. So the algorithms have the ability for real-time processing of SD video and possibility for HD video. The rest of this paper is organized as follows. In the next section, we will review the related work. In Sect. 16.3, the methods presented herein will be specifically described. Section 16.5 will analyze the results of the experiment. Section 16.6 summarizes and concludes the paper.
16.2 The Classic Key Frame Extraction and Video Shot Segmentation Technology Video key frames represent the main content of a video unit. The selection of the key frames will affect the descriptive performance of the video unit. When extracting key frames, the irrelevance between two adjacent frames will be the primary factor to select the key frames. The extracted key frames can be used as the basis for video segmentation.
16.2.1 Video-Shot-Based Method In the video-shot-based method, some classic methods include frame average method and histogram average method. In efficient matching and clustering of video shots [10], a method that extracts key frames by calculating the maximum distance of the feature space is proposed. Zhang et al. [11] include extracting key frames using color histograms.
16 Near-Duplicate Video Detection Based on Temporal …
131
16.2.2 Visual-Feature-Based Method The distribution of visual features often varies little within a video shot compared with that between different video shots. This method extracts key frames based on the change of color, texture, and other visual information of each frame [12–14]. When this information changes significantly, the current frame is considered to be a key frame. The frame difference (distance) between f i and f j is defined as the formula (16.1), in which x, y is the position of pixel: D( f i , f j ) =
| f i (x, y) − f j (x, y)|
(16.1)
x,y
16.2.3 Cluster-Based Method First, a cluster center is initialized, and then the distance between the current frame and the center is calculated, and finally all the frames are included, and it is determined that the current frame is classified into the class or as a new type of cluster center. The frame closest to the center of the cluster is determined to be a key frame. Related method is key frame extraction based on clustering [15]. The disadvantage of this method is that it costs a large amount of calculation.
16.3 Temporal and Visual Feature Extraction In order to ensure the efficiency of near-duplicate video detection, the calculation of the key frame extraction process should not be too complicated. Here, a contentbased key frame extraction method is adopted. The method is mainly divided into the following steps.
16.3.1 Video Segmentation First, we segment the video and extract its segment features, including key frames, and some minor features (fuzzy features), such as the average grayscale of the video segment, the average composition ratio of the video segments, and the proportion of time (frames) that the video segment occupies in the entire video. The frame difference is normalized with Eq. (16.2), which is mainly to discriminate the video with different resolutions.
132
D. Zhang et al.
D( f i , f j ) =
x,y
| f i (x, y) − f j (x, y)| L×W
(16.2)
where L and W are the number of pixels that the frame contains in width and height. We use the method of adjacent frame difference to distinguish the abrupt shot. When the frame difference is greater than a threshold, it is considered as the beginning of new shots, which is the beginning of a new video segment. This is the easiest part to distinguish. For long-shot image sequences that are relatively difficult to segment because there is no dramatic change between adjacent frames. When the long shots continue for a while and the picture changes greatly, the difference between the current frame and the first frame of the segment becomes large. When the difference between the current frame and the first frame exceeds a threshold, it can be judged that this is a long shot and segmented from here. Combining the above two methods can achieve more accurate segmentation of the video.
16.3.2 Video Segment Feature Extraction After segmentation with the above scheme, we need to extract key information from the segment. The key information here is mainly divided into four parts: key frame, proportion of video segment, average gray level, and average composition ratio. (1) Key frame and proportion of video segment For key frames, we take the first frame of the normal video segment or the middle frame of the long-shot segment as the key frame. The proportion of video is the time proportion of each video segment in the video. We will save it when segmenting. The length of each segment will be used as the weight of the segment matching result, which plays a role in determining the classification result of the video. (2) Average gray level The average gray level of the segment is averaged over the average gray level of each frame in the segment to obtain the average gray level of the segment. The average gray level of each image is calculated as shown in formula (16.3): G( f i ) =
x,y
| f i (x, y)|
L×W
(16.3)
(3) Average composition ratio For the average composition ratio, the best global threshold processing based on Otsu’s method is used here [16]. By using this threshold to binarize the image, the information of the retained image can be relatively adequate. Record the proportion of 1 value in the binary image at this time and get a rough image composition ratio. The composition ratio of the segments is averaged to obtain the average composition
16 Near-Duplicate Video Detection Based on Temporal …
133
ratio of the segments. The main idea of Otsu’s is the largest inter-class variance, and the formula for variance between classes is shown in formula (16.4): σ B2 (k) =
[m G P1 (k) − m(k)]2 P1 (k)[1 − P1 (k)]
(16.4)
where m G is the global mean (average gray of the entire image), P1 (k) is the probability that the point gray is less than k in the graph, and m(k) is the average of cumulative histogram up to level k, as shown in formula (16.5), where pi is the value of level i in the video frame histogram. m(k) =
k
i pi
(16.5)
i=0
(4) SIFT feature point extraction for key frames We decide to use the SIFT [17] algorithm here, because after the image is subjected to many post-processing (such as scaling, rotation, down-sampling, or up-sampling), it has little effect on SIFT. So, after extracting the key frame, we perform a SIFT operation on it, get the SIFT key points of all the key frames of this class, and store all the key points in an n * 128 matrix.
16.3.3 Video Segment Matching After feature extraction of the video, store it and we get a class feature. For a new video, perform the same feature extraction on it, and then compare it with the existing class features. In the process of searching, a rule-based classifier is borrowed. First, the rough matching is performed, and the current video segment is compared with the blurred features of the video segments stored in the existing class. If the average gray level, the average composition ratio, and the difference between the number of feature points of the key frame are within a specific range, a rough match is considered to be transmitted, similar to a rule-based classifier. After that, the key frame of the segment of the current video can be matched with the feature points of key frame in the existing class, considering that there is a certain mismatch in SIFT. If the number of matching points is greater than a threshold, we consider it as criterion that the two key frames match, which also means the two video segments match, as well the segment is considered to belong to the matched class. After all the segments of the video have been matched, the classification result is weighted according to the proportion of the video length occupied by each segment. Finally, the most likely classification result is taken as the final classification result.
134
D. Zhang et al.
Fig. 16.1 a Flowchart for similar detection for given samples. b Flowchart for automatic recognition of similar videos
16.4 Two Classification Methods 16.4.1 Similar Detection for Given Samples This can be used for piracy detection on the network. Given some sample videos of some classes, based on these classes, look for similar videos through the unknown video data and classify them. The segment features of given samples are contained in the class as the criterion of classification. In this method, the content used for comparison will not be expanded at run time. This method has good effects and speed. The detection speed does not slow down as the number of detections increases. The flowchart of this process is shown in Fig. 16.1a.
16.4.2 Similar Video Automatic Classification This method can be used for local similar video classifications, such as organizing mobile phone albums. This method is characterized by not giving a sample set. For a series of unknown videos, the method follows the following principles. For a video, if it does not find its belonging class, it is treated as a new class and its characteristics are stored. Considering that many videos in this environment are similar but also have certain differences, the above matching weighting mechanism has been modified. When the
16 Near-Duplicate Video Detection Based on Temporal …
135
weighted result does not belong to any class, but a weighted result of a known class is greater than 25%, and the weighted sum of all other classes is less than 10%, the video is considered belonging to this known class. This method has both advantages and disadvantages. It can make the search relatively vague and has a certain learning ability, but it is also easy to accumulate errors, that is, errors in the classification may cause many subsequent errors. This is closely related to the threshold set in the method. If the threshold is too high, the classification is too strict, so that the videos with slight differences cannot be classified into one class; if the threshold is too low, the classification will be too fuzzy, resulting in poor classification. Therefore, the threshold in this method can be adjusted according to the actual situation to achieve a better effect. For a video, if its classification is found, but some segments of the video do not belong to the class, then the feature of those segments will be added to the characteristics of the class. According to the above principles, the measured data can have a better similar video classification effect on the unknown video set. The flowchart of this process has been shown in Fig. 16.1b.
16.5 Experimental Results 16.5.1 Dataset Introduction and Experiment Procedure The dataset is called CC_WEB_VIDEO [1]. In the experiment, we used 300 videos in the “1. The lion sleeps tonight” category in the dataset to test (the total length is about 8 h and 30 min.), using two methods: for a given sample video, similar detection for other videos (similar to finding pirated videos) and automatically classify 300 videos (automatic classification), and give the corresponding precision, recall, and corresponding time curve.
16.5.2 Analysis of Results For the first method, after testing, the performance is very good. Given a sample video, the method can quickly and accurately distinguish all near-duplicated videos (referring to videos with the same image and post-processing with varying degrees of sample video). After many tests, the correct rate of similar videos can reach more than 90%. The relationship between the average time consumption of this method and the number of videos is shown in Fig. 16.2a. It can be seen from the time curve that the number of processed videos has no effect on the average processing time. The fluctuations in the curve are mainly due to the difference in video length. In the experiment of 300 videos, the recall rate reached 96.67% and the precision rate reached 100%. Therefore, this method has certain practicability.
136
D. Zhang et al.
Fig. 16.2 a Average time for video detection (Method 1). b Average time for video detection (Method 2)
For the second method, it is similar to video automatic classification. Since the number of classifications obtained by automatic classification is large, a certain search cost is required. In this case, the efficiency of the classification is proportional to the number of videos. The relationship between the number of classified videos and the average time spent on each video is shown in Fig. 16.2b. In the experiment of 300 videos, the recall rate reached 93.80% and the precision rate reached 90.00%. Compared to the previous method, the classification effect and time performance are reduced. The reduction in classification effect is mainly due to 1. The unknown video is judged as having a lower threshold of the existing class in the classification, which may cause the mismatch of the individual video segments to affect the classification result of the entire video. 2. The mechanism of adding new features, which will cause the misclassified video to affect subsequent video classification.
16.6 Conclusion According to the experimental results, the error of the algorithm mainly appears in the scene where the shots are switched and moved quickly. Under such conditions, more video segments are generated and SIFT is not easy to extract effective features, so the error rate is relatively high and the efficiency is relatively low. For some scenes where the lens is relatively stable (such as some narrative videos), this method has higher efficiency and accuracy. Comparing the above two methods, the precision of Method 1 is very high. This is because the matching mechanism of Method 1 is relatively strict, and this mechanism sometimes leads to a decrease in the precision, but, in general, the check rate and the precision rate are relatively high. Method 2 can be automatically classified, but it will also bring some errors. The check rate and precision rate are obviously reduced, and the efficiency is also reduced. Therefore, the classification result of Method 2 is better used as a reference rather than a strict classification result. Generally speaking, these two methods are more suitable for
16 Near-Duplicate Video Detection Based on Temporal …
137
scenes with relatively slow shots or high resolution, because the characteristics of these two scenes are more obvious. Acknowledgements Thanks for the support of Beijing Laboratory of Advanced Information Networks under Grants No. JJ042001201801.
References 1. Wu, X., Ngo, C., Hauptmann, A.: CC_WEB_VIDEO: near-duplicate web video dataset 2. Aufderheide, P., Jaszi, P., Brown, E.N.: The Good, the Bad, and the Confusing: user-Generated Video Creators on Copyright. Center for Social Media, School of Communication, American University, Washington, DC (2007) 3. Sudhir, G., Lee, J.C.M., Jain, A.K.: Automatic classification of tennis video for high-level content-based retrieval, In: Proceedings IEEE International Workshop on Content-Based Access of Image and Video Database, pp. 81–90 (1998) 4. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014) 5. Huang, Z., Shen, H.T., Shao, J., Zhou, X., Cui, B.: Bounded coordinate system indexing for real-time video clip search. ACM Trans. Inf. Syst. (TOIS) 27(3), 17 (2009) 6. Liu, L., Lai, W., Hua, X.S., Yang, S.Q.: Video histogram: a novel video signature for efficient web video duplicate detection. Springer, pp. 94–103 (2007) 7. Su, P.C., Wu, C.S.: Efficient copy detection for compressed digital videos by spatial and temporal feature extraction. Multimed. Tools Appl. 1–23 (2015). https://doi.org/10.1007/s11042015-3132-1 8. Zhu, Y., Huang, X., Huang, Q.: Tian Q large-scale video copy retrieval with temporalconcentration sift. Neurocomputing 187, 83–91 (2016) 9. Ren, J., Chang, F., Wood, T., Zhang, JR.: Efficient video copy detection via aligning video signature time series. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval. ACM, p 14 (2012) 10. Yeung, M.M., Liu, B.: Efficient matching and clustering of video shots. In: Proceedings of IEEE ICIP, pp. 338–341 (1995) 11. Zhang, H.J., Wu, J.H., Zhong, D.: An integrated system for content-based video retrieval and browsing. Pattern Recogn. 30(4), 643–658 (1997) 12. He, X., Lu, G.H.: Algorithm of key frame extraction based on image similarity. Fujian Comput. 5, 73–74 (2009) 13. Liu, H.Y., Meng, W.T., Liu, Z.: Key frame extraction of online video based on optimized frame difference. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1238–1242 (2012) 14. Ding, H.L., Chen, H.X.: Key frame extraction altorithm based on shot content change ratio. Comput. Eng. 13, 225–231 (2009) 15. Pan, R., Tian, Y.M., Wang, Z.: Key-frame extraction based on clustering. In: 2010 IEEE International Conference on Progress in Informatics and Computing, pp. 867–871 (2010) 16. Gonzalez, R.C., Woods, R.E., Eddins, S.L.; Digital Image Processing Using MATLAB, 2nd edn (2009) 17. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Chapter 17
Surface Defect Detection Method for the E-TPU Midsole Based on Machine Vision Ruizhi Li, Song Liu, Liming Tang, Shiqiang Chen and Liu Qin
Abstract Expanded thermoplastic polyurethane (E-TPU) midsole is an emerging production. There is little industrial research on its surface defect detection in the modern society. The detection of E-TPU midsole is a new and developing field. However, the detection of E-TPU products still relies on manual detection, which is not only of high cost but is also not satisfied with the requirement of real-time online monitoring in the current industrial. Therefore, there is a surface defect detection method based on machine vision proposed in this paper. First, a second-time difference method is used to weaken the influence of background light coming from product images when it is collected and extract defective parts potentially. Then, the differences of adjacent elements are calculated by the second-order difference method, which is adopted to further test convexity–concavity and selected the appropriate threshold to identify whether there are any quality issues with these suspicious parts. In order to improve the detection effect in monitoring the equipment situation at any moment, we use MATLAB parallel computing to detect different products simultaneously. The result shows that this method can detect and identify various defects effectively on the E-TPU midsole’s surface with high detection efficiency. Meanwhile, it can meet the requirements of industrial real-time monitoring performance. However, this method needs a further study for the detection of smaller physical defects.
R. Li · L. Tang School of Science, Hubei Minzu University, Enshi, China S. Liu · S. Chen (B) · L. Qin School of Advanced Materials and Mechatronic Engineering, Hubei Minzu University, Enshi, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_17
139
140
R. Li et al.
17.1 Introduction With the development of technology and theory in computer vision as well as the mechanical industry, the image of the detected object is obtained by the camera, analyzed, detected, and then identified by computer, which formed an emerging discipline named machine vision. Machine vision is a technique measured and judged by machines instead of human eyes. However, it is not only limited to human eyes, but also the function of the human’s brain to extract information from the image of the target object, and the information is to be understood, analyzed, and processed, what’s more, making feedbacks from the results and then applied on detection, location, and control [1]. The working process can be divided into the following steps roughly: (1) The optical image is collected by the camera from the detected object and transmitted to the image processing system. (2) The received image information is analyzed and processed by computer to achieve discriminable processing results. (3) Judging and making decisions on the processing results and displaying the results on the screen to issue instructions. Compared with the traditional manual detection technology, machine vision has the incomparable advantages [2] like real-time performance, high degree of automation, non-contact, high precision, and reliability. As a kind of non-contact flexible detection means, machine vision is advantageous, such as sensitive to the spectrum, rich for information perception, convenient to analyze using theory and technology of modern information and accomplishing the task of the qualitative and quantitative detection, which is an important technical means to improve and guarantee the quality of modern industrial products [3, 4], and is widely applied in many industries as well. For example, Härter et al. [5] have described the evaluation of defects in electronic components. Radovan et al. [6] have introduced the detection of textile fabric defects based on color image processing system. Ghazvini et al. [7] have proposed two-dimensional wavelet transform and statistical features to detect ceramic tiles and Deng et al. [8] have used multinomial spline wavelet to detect ceramic tiles. Singhka et al. [9] have introduced the classification of surface defects on steel plates based on machine vision. Expanded thermoplastic polyurethane (E-TPU) [10] is a new type of environmental protection material, and it is an emerging product in the world. Its product detection and testing equipment are still scarce at present, so we introduce a kind of E-TPU midsole surface quality detection method based on machine vision. There are lots of detection methods on common injection molding products, such as histogram statistics, the co-occurrence matrix, autocorrelation, Fourier transform, and Gabor filter [11]. What’s more, an image processing algorithm based on improved Laplace transform is developed in [12, 13]. Paper [14] used the fuzzy logic algorithm and paper [15] used the method of singular value decomposition to process images. Zhao et al. [16] have improved statistical average difference method and extensive gray correlation method.
17 Surface Defect Detection Method for the E-TPU …
141
17.2 Relevant Knowledge 17.2.1 E-TPU Material E-TPU is a new type of TPU foaming material, which is made of thermoplastic polyurethane particles through physical foaming. And this polymer material is composed of highly resilient foam particles by changed and recombined TPU particles structure and together with numerous springy and light TPU foam particles [10] (Fig. 17.1). The E-TPU material is used to make the sole, which can greatly reduce the vibration force of the feet. It can be compressed to half its original size when the sole is pressed. And the sole can quickly rebound to its original shape while the pressure disappears. According to the characteristics of the material, the particles of E-TPU can be produced with the density range of 0.15–0.25 g/cm3 . This is non-toxic, biodegradable, and recyclable, and it has some advantages such as low density, high rebound, abrasion resistance, corrosion resistance, low-temperature performance, and environmentally friendly.
Fig. 17.1 a TPU material. b E-TPU particles. c E-TPU midsole. d Cross section
142
R. Li et al.
17.2.2 The Theory of Second-Order Difference Method The expression formula of the second-order difference algorithm [17] is as follows: Di j =
Hi2j + Vi2j + D1i2j + D2i2j ,
(17.1)
Hi j = A(i−1) j − Ai j ]−[Ai j − A(i+1) j ,
(17.2)
Vi j = Ai( j−1) − Ai j ]−[Ai j − Ai( j+1) ,
(17.3)
D1i j = A(i−1)( j−1) − Ai j ]−[Ai j − A(i+1)( j+1) ,
(17.4)
D2i j = A(i+1)( j−1) − Ai j ]−[Ai j − A(i−1)( j+1) .
(17.5)
Convexity–concavity is represented by the value of the second-order difference, and the larger the second-order difference value is, the greater the difference is between the element and the adjacent eight elements.
17.3 Proposed Method 17.3.1 The Theory of Second-Time Difference In the process of image acquisition, the image information acquired by the camera is different from the real image of the product that resulted from the interference of background light. Information of the image acquired can be expressed as follows: Ai j = γi j + βi j ,
(17.6)
A(i+1) j = γ(i+1) j + β(i+1) j ,
(17.7)
A(i+2) j = γ(i+2) j + β(i+2) j ,
(17.8)
where A is grayscale value of image by a camera, β is luminance of background light, and γ is actual grayscale value of the detected object. The first difference: 1 = A(i+1) j − Ai j = γ(i+1) j + β(i+1) j − γi j + βi j = γ(i+1) j − γi j + β(i+1) j − βi j ,
2 = A(i+2) j − A(i+1) j = γ(i+2) j − γ(i+1) j + β(i+2) j − β(i+1) j .
(17.9) (17.10)
17 Surface Defect Detection Method for the E-TPU …
143
As mentioned above, as the same light source, β = β(i+1) −βi j is approximated to a constant value. And injection products γ = γ(i+1) − γi j are equal to 0 approximately without defects, otherwise there will be a large difference on account of quality problem. The second difference: 2 = 2 − 1 = γ(i+2) j − γ(i+1) j + β(i+2) j − β(i+1) j ]−[ γ(i+1) j − γi j + β(i+1) j − βi j
(17.11)
=γ(i+2) j − 2γ(i+1) j + γi j .
If 2 ≈ 0, the products are up to standard, otherwise the next test is carried out. Further testing is still required to avoid identifying irregular texture of the product as defects.
17.3.2 Detection Algorithm Due to E-TPU production process, it is unable to use the method similar to other industrial products, which is selected template, registration, and background subtraction method for defect detection. Therefore, it can only process the collected product images by itself to achieve defect detection. It is the essence of defect detection that in order to distinguish the abnormal region from the normal region, and find some region in which gray value of the region is greatly different from the average gray value of the whole image. In general, the defect area is very small compared to the normal area in the whole image. Step1: Image Partitioning Image is calculated during processing in the form of matrix, and thus image partitioning is the matrix partitioning. The smaller the subblock size is, the higher the detection accuracy will be. Subblock size is defined according to different requirements. If the image cannot be partitioned evenly, the remaining part and its adjacent elements can be reused to form a subblock that satisfies the subblock size. The matrix cannot be divided to even subblocks, and the remaining parts as subblock alone which has fewer sample space elements and it is likely to lead ⎡ to false detection. ⎤ Obtain A11 · · · A1N ⎥ ⎢ the average value of subblock and make it to the I = ⎣ ... . . . ... ⎦, where Amn =
A M1 · · · A M N
ai j w∗w
and w is the subblock size.
Step2: Extract the Suspect Elements For rows and columns of I, the gray difference value of adjacent elements is calculated, respectively, and form P by (M − 1) * N composed of row difference value and Q by M * (N − 1) composed of column difference value. Calculate the average
144
R. Li et al.
value pi for each row of the P and set the threshold k. It is the same as to calcuwithout the late the average value q j for each column. Obtain the difference value threshold interval L i = [ pi − k, pi + k] and L j = q j − k, q j + k , and extract two corresponding elements. ⎡ ⎢ P =⎣
P 11 .. . P (M−1)1
⎤ ⎡ · · · P 1N A21 − A11 ⎥ ⎢ .. .. .. ⎦=⎣ . . . · · · P (M−1)N A M1 − A(M−1)1
⎤ · · · A2N − A1N ⎥ .. .. ⎦ . . · · · A M N − A(M−1)N (17.12) ⎤ · · · A1N − A1(N −1) ⎥ .. .. ⎦ (17.13) . . · · · A M N − A M(N −1)
⎤ ⎡ Q 11 · · · Q 1(N−1) A12 − A11 ⎥ ⎢ ⎢ .. . . . .. .. Q=⎣ . ⎦=⎣ . . Q M1 · · · Q 1(N−1) A M2 − A M1
N Pxl pi = l=1 x ∈ [1, M − 1] N
M Q ky y ∈ [1, N − 1]. q j = k=1 M ⎡
(17.14) (17.15)
For the above P, to calculate the difference value P˜ of two adjacent elements in each column again. ⎡ ⎢ P˜ = ⎣
P˜ 11 .. .
P˜ (M−1)1
⎤ ⎡ · · · P˜ 1N P21 − P11 ⎥ ⎢ .. .. .. ⎦=⎣ . . . · · · P˜ (M−1)N P(M−1)1 − P(M−2)1
··· P2N .. . · · · P(M−1)N
− P1N .. .
⎤ ⎥ ⎦.
− P(M−2)N (17.16)
˜ of two adjacent elements in each row Similarly, to calculate the difference value Q ˜ is not within the threshold for Q, set the threshold t. If the difference value P˜ or Q interval [0 − t, 0 + t], the two elements involved in P or Q are extracted (three elements in I). These elements are the parts that may be defective. There are three possibilities for the two elements in P or Q that cause the second-time difference value anomaly. (1) A value is within the threshold interval L and a value is outside the threshold interval L (two of the three relevant elements have been extracted in the first difference). (2) Both values are within the threshold interval L (these elements had not been detected in the first difference). (3) Both values are outside of the threshold interval L (these elements had been detected in the first difference).
17 Surface Defect Detection Method for the E-TPU …
145
The purpose of the second-time difference is to detect the second case above and determine whether there is a quality problem. The main reason is the omission of the non-uniform background light for the second case above. Step3: Detect Suspicious Elements After having extracted all the suspicious elements, the second-order difference method is used to further verify whether these elements are defective regions. Such verification is based on the original image. Select an appropriate value T as the threshold value, such as in Eq. (17.1) the value Di j is greater than T and consider it as defective region, and then the defective product can be determined. Otherwise, the original image of products is qualified.
17.4 Experiment and Analysis 17.4.1 Experimental Conditions and Methods System: WIN10 64bit; CPU: Intel(R) Core (TM) i5-6600 K @ 3.50 GHz*4; GPU: GTX1060; Memory: 8G; Develop environment: MATLAB 2018b.
17.4.2 The Experiment Content and Result During the experiment, different distinguishability, defects, and subblock sizes were detected and the detection time was recorded. Whether the defect detected conforms to the defect of the product or not. The image of the surface on E-TPU midsole is tested according to the method proposed in this paper and the steps are as shown in Fig. 17.2. After inputting the image and converting it to grayscale image, the image is partitioned according to size of subblock. Then, the second-time difference method was used to extract suspicious elements, k was used as the threshold for the first difference, and t was used as the threshold for the second difference. Finally, the convexity–concavity of the extracted suspicious elements is identified by the secondorder difference method. If the value D of second-order difference is less than the threshold value T which has been preselected, the product is judged as qualified product. Otherwise, the product is judged as defect product. The thresholds are analyzed with experiment experience, and it is k = t = 30 and T = 70 in this experiment. Before testing defective products, qualified products were tested to verify the stability of the system. The result of each step is shown in Fig. 17.3. The running time of different subblock size is shown in Table 17.1. Then, some kinds of defects
146
R. Li et al.
Fig. 17.2 Flowchart of surface defect detection algorithm
(a) Original image
(b) image after partitioned
(c) Second-time difference
(d) second-order difference
Fig. 17.3 Processing procedure of qualified product Table 17.1 The running time of different blocks
Qualified samples
Original size
Block size
Independent time (s)
Parallel time (s)
False rate (%)
196 * 291
2*2
0.142
0.895
44
3*3
0.119
0.454
8
4*4
0.106
0.439
38
17 Surface Defect Detection Method for the E-TPU …
147
(a) scorch products
(b) scorch test results
(c) impurity products
(d) impurity test results
(e) contaminated products
(f) contaminated test results
(g) scrap products
(h) scrap test results
(i) plastic residue products
(j) plastic residue test results
(k) indentation products
(l) indentation test results
Fig. 17.4 Preprocessing image
were detected and identified. The result and the running time are shown in Fig. 17.4 and Table 17.2.
17.4.3 Analysis of Experimental Results The analysis of experimental results is as follows and the accuracy rate is shown in Table 17.3: (1) Although the smaller the subblock size is, the higher accuracy of quality detection will be, some defects cannot be recognized by human eyes. At the same time, if the size of subblock is too large, the defect had been missed or mistaken. (2) Parallel computing plays an important role in the time-consuming independent program, but in less time-consuming programs, the processing time is increased. And the running time of parallel computing is affected by the most time-consuming independent program. (3) This method has a satisfactory effect on defects with the obvious color difference, such as scorch, impurities, and contaminated.
148
R. Li et al.
Table 17.2 The running time of different defects Scorch
Contaminate
Plastic residue
Impurity
Scrap
Indentation
Original size
Block size
Independent running time (s)
Parallel time (s)
207 * 694
2*2
0.315
0.912
3*3
0.160
0.657
4*4
0.204
0.682
2*2
0.674
1.307
3*3
0.375
1.172
4*4
0.247
1.049
2*2
0.247
0.687
3*3
0.142
0.627
4*4
0.116
0.611
2*2
0.133
0.692
3*3
0.095
0.534
4*4
0.103
0.425
2*2
0.207
0.737
3*3
0.124
0.537
4*4
0.103
0.434
2*2
1.644
3.011
3*3
0.142
0.627
4*4
0.116
0.611
406 * 1544
233 * 317
243 * 512
155 * 177
451 * 870
Table 17.3 Accuracy analysis Sample size
Accuracy rate (%)
Qualified samples
50
92
Defect samples (chemical defects)
50
100
Defect samples (physical defects)
50
72
The total number of samples
Accuracy rate (%)
150
89.33
(4) Some large physical defects also can be accurately identified, such as scrap. However, some small physical defects cannot achieve the effect desired, such as plastic residue and indentation.
17 Surface Defect Detection Method for the E-TPU …
149
17.5 Conclusion In order to satisfy the demand of real-time detection in industrial production, a surface defect detection method for the E-TPU midsole based on machine vision is proposed in this paper. The experimental results obtained from various images show that the method has short detection time and high accuracy, and it is quick and effective to detect defects on the product surface, and then distinguish whether the image is defective or not. According to the forming principle of product, parallel computing is used to detect multiple products simultaneously and reduce the detecting time greatly. However, further study is needed to achieve the purpose of defect detection if the defect is a small physical defect. Acknowledgements 2019.03.13. This work was supported in part by the Natural Science Foundation of China under Grant (61561019), the outstanding young scientific and technological innovation team of Hubei Provincial Department of Education (T201611).
References 1. Bian, Z.G.: Development of machine vision technology. China Instrum. (6), 40–42+65 (2015) 2. Newman, T.S., Jain, A.K.: A survey of automated visual inspection. Comput. Vis. Image Underst. 61(2), 231–262 (1995) 3. Li, Y., Gu, P.: Free-form surface inspection techniques state of the art review. Comput. Aided Des. 36(13), 1395–1417 (2004) 4. Xie, X.: A review of recent advances in surface defect detection using texture analysis techniques. ELCVIA: Electron. Lett. Comput. Vis. Image Anal. 7(3), 1–22 (2008) 5. Härter, S., Klinger, T., Franke, J., Beer, D.: Comprehensive correlation of inline inspection data for the evaluation of defects in heterogeneous electronic assemblies. In: 2016 Pan Pacific Microelectronics Symposium (Pan Pacific), pp. 1–6. IEEE (2016) 6. Radovan, S., Papadopoulos, G. D., Georgoudakis, M., Mitropulos, P.: Vision system for finished fabric inspection. In: Machine Vision Applications in Industrial Inspection X, vol. 4664, pp. 97– 103. International Society for Optics and Photonics (2002) 7. Ghazvini, M., Monadjemi, S.A., Movahhedinia, N., Jamshidi, K.: Defect detection of tiles using 2D-wavelet transform and statistical features. World Acad. Sci. Eng. Technol. 49(901–904), 1 (2009) 8. Deng, S., Latifi, S., Regentova, E.: Document segmentation using polynomial spline wavelets. Pattern Recogn. 34(12), 2533–2545 (2001) 9. Singhka, D.K.H., Neogi, N., Mohanta, D.: Surface defect classification of steel strip based on machine vision. In: International Conference on Computing and Communication Technologies, pp. 1–5. IEEE (2014) 10. Shan, T.K., Ma, W.L., Qin, L., Yang, T.: Research on the foaming mechanism and properties of thermoplastic polyurethane elastomer foam materials prepared from supercritical carbon dioxide. China Rubber Ind. 65(05), 514–517 (2018) 11. Wu, J.W., Yan, J.Q., Fang, Z.H., Xia, Y.: Surface defect detection of slab based on the improved adaboost algorithm. J. Iron Steel Res. (9), 14 (2012) ´ 12. Swiłło, S.J., Perzyk, M.: Automatic inspection of surface defects in die castings after machining. Arch. Foundry Eng. 11 (2011) ´ 13. Swiłło, S.J., Perzyk, M.: Surface casting defects inspection using vision system and neural network techniques. Arch. Foundry Eng. 13(4), 103–106 (2013)
150
R. Li et al.
14. Wong, B.K., Elliott, M.P., Rapley, C.W.: Automatic casting surface defect recognition and classification (1995) 15. Lu, C.J., Tsai, D.M.: Automatic defect inspection for LCDs using singular value decomposition. Int. J. Adv. Manuf. Technol. 25(1–2), 53–61 (2005) 16. Zhao, Y.F., Gao, C., Wang, J.G.: Research on surface defect detection algorithm for industrial products based on machine vision. Comput. Appl. Softw. 29(02), 152–154 (2012) 17. Song, L.M., Li, Z.Y., Chang, Y.L., Xing, G.X., Wang, P.Q., Xi, J.T.: A color phase shift profilometry for the fabric defect detection. Optoelectron. Lett. 10(4), 308–312 (2014)
Chapter 18
A Multi-view Learning Approach for Glioblastoma Image Contrast Enhancement Xiaoyan Wang, Zhengzhou An, Jing Zhou, and Yuchou Chang
Abstract Image enhancement technique is able to enhance structure of a lesion and filter out irrelevant information through an image processing method. It can make contrast of an image stronger, and therefore enhance diagnostic accuracy. Current image enhancement techniques apply global enhancement with some strategies like histogram equalization. However, local contrast information may be lost because of the use of a global enhancement. And, global enhancement may bring unnecessary information on irrelevant background tissues. For this reason, local contrast information needs to be incorporated in the global enhancement procedure. We propose a multi-view learning approach for glioblastoma image contrast enhancement. Each local contrast is accomplished by a single-view learning and therefore final enhancement is a result of multi-view learning. Experimental results demonstrate that the proposed method is able to outperform traditional global contrast enhancement techniques.
18.1 Introduction Magnetic resonance imaging (MRI) [1] technology obtains the images of human body from the characteristics of nuclear spin motion. Movements of patient’s body caused by long imaging time can result in artifacts in the reconstructed image or a reduction in signal-to-noise ratio (SNR). Parallel magnetic resonance imaging (Parallel MRI) X. Wang (B) · J. Zhou School of Physics and Electronic Engineering, Yuxi Normal University, Yuxi 653100, Yunnan, China e-mail: [email protected] Z. An School of Mathematics and Information Technology, Yuxi Normal University, Yuxi 653100, Yunnan, China Y. Chang Computer Science and Engineering Technology Department, University of Houston-Downtown, Houston 77002, TX, USA © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_18
151
152
X. Wang et al.
overcomes the limitation that traditional magnetic resonance imaging is susceptible to radio frequency (RF) hardware and magnetic field gradient performance, which results in an excessively long time. Parallel MRI [2] uses multiple phased array coils to simultaneously receive the required sensing signals. The difference in sensitivity encodes the spatial signal and the number of gradient encodings in the phase direction is gradually reduced, and therefore imaging speed is increased by shortening the scanning time. Image contrast enhancement has important applications, especially in medical image analysis, because visual inspection of medical images is necessary for the diagnosis of many diseases. The medical image has a low contrast due to its limitations and imaging conditions. Although details often exist in high-frequency signals, they are embedded in a large number of low-frequency background signals, which may reduce visibility. Therefore, proper high-frequency enhancement can improve visual effects and facilitate diagnosis. Image enhancement is able to improve the visual effect of the image, and therefore produce a more suitable format for human or machine analysis processing. In addition, meaningful information can be highlighted and useless information can be suppressed. The histogram equalization distributes the brightness better on the histogram, and therefore results in a higher contrast within the low-contrast region. This method has an adjustment effect when foreground and background are dark or bright. Traditional histogram equalization technique is the most widely used global image enhancement method. Histogram equalization remaps the image’s data using the cumulative histogram distribution probability. Although this method is simple, it does not take into account the local information. In addition, histogram equalization technique can produce excessive noise enhancement. To solve this problem, an improved histogram equalization technique has been proposed by Pizer et al. [3]. Local window estimation is processed for an adaptive histogram equalization algorithm based on multiple contexts in the image with limited contrast [3]. Each region is processed using histogram equalization, and then its neighborhood is combined using bilinear interpolation to eliminate artifacts caused by the boundary. A contrast enhancement algorithm called brightness-preserving weight clustering histogram equalization (BPWCHE) was proposed [4]. It assigns non-zero bins of the original image histogram to a separate cluster and computes each cluster’s weight. The number of clusters is reduced for merging pairs of neighboring clusters. The clusters acquire the same partitions as the resulting image histogram. Ghita et al. proposed a textured enhanced histogram equalization technique using total variation (TV) [5]. It is able to alleviate the intensity saturation effects which are introduced by standard contrast enhancement. In this paper, we propose a multi-view-learning-based image enhancement for brain glioblastoma images. Each local enhancement is considered as a single-view learning and the final enhancement is multi-view learning result. The paper is organized as follows. The first and second sections provide introduction and background information. The proposed method is given in the third section. Results and conclusion are presented finally.
18 A Multi-view Learning Approach for Glioblastoma Image …
153
18.2 Background In some practical problems, the same object can be described from a variety of different ways or from different viewpoints. Multiple descriptions constitute a multiview of the object [6]. In this paper, the subscript xi is used to represent the ith data point, and the superscripted x(i) is used to represent the ith view of the data. The multi-view data can be expressed as xi = {x1 , x2 , …, xn }, where the view is represented. Multi-view data widely exists in the real world. For example, in the webpage classification problem, the webpage may be classified according to the information contained in the webpage itself or may be classified by using the information contained in the hyperlink linked to the webpage. Therefore, webpage data can be represented by two views. The feature set that depicts the web page itself contains the first view, and the feature set that depicts the information contained in the hyperlink constitutes the second view. Multi-views are also used to represent different feature sets of data. Multi-views can also be used to represent different sources of data. For example, for different acquisition devices of the same data source, the multiple acquisition results contain different views of the data. In addition, multi-views can also be used to represent different relationships between data. A histogram is a representation of image color information. Many image processing algorithms can be implemented based on histograms. For example, a histogram is used to adjust the contrast of an image for image enhancement. Furthermore, a histogram is also used to perform large-scale lossless information hiding. If a pixel of an image is grayscale and pixel values are aggregated in a limited range, image contrast effect of the image is not satisfied. If a histogram of an image is spread over the entire distribution domain, contrast effect of the image is much better. This is a basic idea behind the histogram equalization algorithm for image enhancement.
18.3 Proposed Method When a medical image is observed, it is described from multiple views and then analyzed based on different views. Image information obtained from multiple views is referred to as the multi-view image. Clustering, as a classic unsupervised learning method in machine learning, is also developing rapidly in multi-view technology. Data constructed by different views of the same object is analyzed to preserve the independent information of each single-view data, and the collaboration between various views is to find common information between the views. Finally, an integrated decision-making method is used to obtain clustering results with global viewpoint. Since multi-view learning technology comprehensively studies data information of observed objects in various perspectives, the decision results in common ground while reserving differences are obtained and outperformed traditional data analysis based on only one perspective (single view). Decision-making results are more comprehensive and reliable.
154
X. Wang et al.
The representation of data is one of the key and difficult issues of machine learning, since the learning effect is often influenced by the method of data representation. For an object in the real world, its features are often extracted, and then the object is represented by a feature vector, i.e., xi = {x1 , x2 , …, xn }, where n represents the number of features. The features reflect the nature of this object, so that they can be used to learn the target concept. However, for a learning problem, the minimum feature set required is unknown. Without a priori information, more features can be extracted and provided to the learner, and the expected learner can obtain better prediction performance. Furthermore, the development of data collection technology allows describing objects through more complex and diverse ways, which also leads to produce more features of data. Among features of these descriptive objects, some features have different attributes, so it is not suitable for use. It is more appropriate to use a multi-view representation to represent the data as multiple feature sets. Then different feature sets are learned with different learning methods. Even if the features of the data can be learned using the same learner, multi-view learning may have advantages over single-view learning. For example, in the webpage classification problem, information contained in the webpage itself and the information contained in the hyperlink to the webpage are included. Both the web view and the hyperlink view can be represented in the form of a text vector, in which the same learner can be used for learning. However, if two views are combined into one view, feature vector loses its original meaning and may increase the dimension of the feature space. In addition, the multi-view representation of data can also take advantage of each view, using unmarked data to achieve the purpose of collaborative learning to improve learning performance. For a single brain glioblastoma image, it is divided by an operator to multiple overlapped sub-images. Traditional histogram equalization method is applied on each sub-image and therefore local contrast is produced on each sub-image. Each sub-image is considered as a single view for enhancing local contrast. After all sub-images’ contrast enhancement are completed, a majority voting from all locally enhanced sub-images is applied on each pixel, and then each pixel grayscale value is determined by majority multi-view image contrast enhancement operations. In Fig. 18.1, the block diagram of the proposed method is shown. The proposed method collects multiple contrast enhancement results by a group of multi-view contrast detectors. Each of contrast enhancement results is considered in a majority voting process for deciding whether a pixel belongs to glioblastoma region or normal tissue of a brain image. Each view focuses on different local regions of brain image. As a classical ensemble learning technique, majority voting produces the final result of contrast enhancement of brain tumor image.
18 A Multi-view Learning Approach for Glioblastoma Image …
155
Fig. 18.1 The block diagram of the proposed method
18.4 Experimental Results The dataset BRATS 2013 [7] is used for performance evaluation. It contains multiple MR glioblastoma images. An image without contrast enhancement is served as an original image. The traditional histogram equalization method [8] and the exact histogram equalization method [9] are compared to evaluate the proposed method. As shown in Fig. 18.2, it is seen that the proposed method makes contrast between tumor region and normal tissue region larger than contrast enhancement effects of the traditional histogram equalization method and the exact histogram equalization method. Furthermore, as shown in Fig. 18.3, the proposed method is able to make contrast between lesion and normal tissue regions stronger, but the traditional histogram equalization method and the exact histogram equalization method cannot achieve it [10].
18.5 Conclusion A glioblastoma image enhancement technique is proposed. Each sub-image is considered as a single-view learner and final result is determined by multiple local enhancements on sub-images. The proposed method outperforms the traditional histogram equalization method and the exact histogram equalization method. Future work will focus on parameters tuning like the size of sub-images and the number of sub-images [10]. In the future, we hope to apply the proposed method to more other applications [11–16].
156
X. Wang et al.
Fig. 18.2 Image enhancement for GBM tumor images. A reference image a is given for comparing image enhancement methods: traditional histogram equalization b exact histogram equalization c and the proposed method (d) [10]
18 A Multi-view Learning Approach for Glioblastoma Image …
157
Fig. 18.3 Image enhancement for GBM tumor images. A reference image a is given for comparing image enhancement methods: traditional histogram equalization b exact histogram equalization c and the proposed method (d) [10]
Acknowledgements This work was supported in part by the National Natural Science Foundation of China (No.61563055, No.61871373, and No.81729003).
References 1. Haacke, M., Thompson, M., Venkatesan, R., Brown, R., and Cheng, Y.: Magnetic resonance imaging: physical principles and sequence design, 1st edn. Wiley-Liss (1999) 2. Larkman, D.J., Nunes, R.G.: Parallel magnetic resonance imaging. Phys. Med. Biol. 52, 15–55 (2007)
158
X. Wang et al.
3. Pizer, S.M., Amburn, E.P., Austin, J.D., Cromartie, R., Geselowitz, A., Greer, T., Romeny, B.T.H., Zimmerman, J.B.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39(3), 355–368 (1987) 4. Chauhan, R., Bhadoria, S.: An improved image contrast enhancement based on histogram equalization and brightness preserving weight clustering histogram equalization. In: International Conference on Communication Systems and Network Technologies (2011) 5. Ghita, O., Ilea, D.E., Whelan, P.F.: Texture enhanced histogram equalization using TV-L-1 image decomposition. IEEE Trans. Image Process. 22(8), 3133–3144 (2013) 6. Zhao, J., Xie, X., Xu, X., Sun, S.: Multi-view learning overview. Inf Fusion 38, 43–54 (2017) 7. BRATS 2013 (2013). https://www.smir.ch/BRATS/Start 8. Gonzalez, R., Woods, R.: Digital Image Processing. Pearson Education (2004) 9. Coltuc, D., Bolon, P., Chassery, J.M.: Exact histogram specification. IEEE Trans. Image Process. 15(5), 1143–1152 (2006) 10. Wang, X., An, Z., Wang, H., Chang, Y.: MR brain image enhancement via learning ensemble. In: The 2018 IEEE International Conference on Intelligence and Safety for Robotics. ISBN. 978-1-5386-5546-7, pp. 282–285 (2018) 11. Dey, N., Ashour, A.S., Beagum, S., Pistola, D.S., Gospodinov, M., Gospodinova, E.P., Tavares, J.M.R.S.: Parameter optimization for local polynomial approximation based intersection confidence interval filter using genetic algorithm: An application for brain MRI image de-noising. J. Imaging 1(1), 60–84 (2015) 12. Kale, G.V., Patil, V.H.: A study of vision based human motion recognition and analysis. Int. J. Ambient Comput. Intell. (IJACI) 7(2), 75–93 (2016) 13. Beagum, S., Dey, N., Ashour, A.S., Sifaki-Pistolla, D., Balas, V.E.: Nonparametric de-noising filter optimization using structure-based microscopic image classification. Microsc. Res. Tech. 80(4), 419–429 (2017) 14. Wang, H., Wang, X., Chang, Y.: Automatic clustering of natural scene using color spatial envelope feature. In: Proceedings of the International Conference on Computing and Artificial Intelligence (ICCAI), pp. 144–148 (2018) 15. Ashourab, A.S., Dey, N.: Adaptive window bandwidth selection for direction of arrival estimation of uniform velocity moving targets based relative intersection confidence interval technique. Ain Shams Eng. J. 9(4), 835–843 (2018) 16. Wang, H., Zhou, Y., Su, S., Hu, Z., Liao, J., Chang, Y.: Adaptive volterra filter for parallel MRI reconstruction. EURASIP J. Adv. Signal Process. 34, 2019 (2019)
Chapter 19
Multi-focus Image Fusion Based on Convolutional Sparse Representation with Mask Simulation Chengfang Zhang
Abstract In order to further improve the contrast and clarity of the fused image, a multi-focus image fusion algorithm based on convolutional sparse representation with mask simulation (CSRMS) is proposed. Firstly, auxiliary variable alternation with additive mask simulation is applied to convolutional dictionary filters learning. Then, we propose CSRMS-based multi-focus image fusion framework, in which each source image is decomposed into base layer and detail layer. Lastly, six classical multi-focus images are used to demonstrate that our method outperforms the CSRbased method in terms of both objective assessment and visual quality. Moreover, the brightness of our method is higher than other methods and avoids boundary artifacts.
19.1 Introduction The limitation of the focusing ability of optical lens, clear and detailed information of image can only appear in the focusing area while the information of the nonfocusing area is not easy to be directly observed by human vision. To accurately extract useful information from multi-focus source image to obtain a clearer image and more conducive to human eye observation, multi-focus image fusion method has been proposed in recent year. Moreover, fusion technology has been widely used in many fields such as machine identification, target recognition, and artificial intelligence. In recent years, multi-focus image fusion methods based on multi-resolution analysis transform have been proposed. Non-subsampled contourlet transform (NSCT) has been the most popular multi-resolution image fusion method, which solves the Pseudo-Gibbs phenomenon in image fusion algorithm based on contourlet transform [1]. However, NSCT-based method has two main drawbacks: low contrast and difficulty in choosing decomposition layers of NSCT. To improve visual appearance, an over-complete dictionary is used in the sparse representation and is applied to C. Zhang (B) Sichuan Police College, Luzhou 646000, Sichuan, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_19
159
160
C. Zhang
multi-focus image fusion [2]. Although sparse representation (SR) fusion method is superior to NSCT fusion method, the DCT-based dictionary is studied and relevant information between source images is not considered for SR-based method. The joint sparsity representation (JSR) model is first introduced into multimodal image fusion [3]. The fused results of JSR-based method indicate that more spatial details of source images are integrated into fused image. Many improvements on JSR-based fusion method have been proposed to demonstrate better performance than SR fusion [4, 5]. However, the sparse domain fusion methods suffer from high sensitivity to misregistration and high computational time-consuming. The alternative convolutional sparse representation is first introduced into image fusion to solve the above defects of sparse domain fusion methods [6]. To overcome the defect of “absolute-maximum” measurements for sparse domain fusion methods and difficulty in choosing basis functions for NSCT-based fusion methods, Liu [7] introduced typical convolutional neural network (CNN) to multi-focus image fusion. To have lower memory requirements and reduce boundary overlap, additive mask simulation [8] is introduced in both convolutional dictionary learning and convolutional sparse coding in our paper (the proposed fusion method is abbreviated as CSRMS when convenient). Convolutional BPDN dictionary learning with mask simulation is first used to learn convolutional dictionary filters. Then, each source image is decomposed into low-pass layer and high-pass layer. Two fusion strategies are applied to both low-pass component layer and high-pass layer (detailed process could be found in Table 19.1 from Sect. 19.3). Finally, the fused image is obtained. The paper is organized as follows. The convolutional dictionary learning with mask simulation is described in Sect. 19.2. Section 19.3 shows CSRMS-based fusion method. The experimental results are given in Sect. 19.4. Section 19.5 concludes this paper. Table 19.1 The objective assessment for the first multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
0.9761
0.5736
0.4189
0.8313
SR [2]
1.1176
0.6168
0.4317
0.8387
JSR [3]
0.9907
0.5745
0.4213
0.8318
CSR [6]
1.0905
0.6420
0.4301
0.8370
CNN [7]
1.2059 (2)
0.6506 (2)
0.4356 (2)
0.8443 (2)
OUR
1.2172 (1)
0.6556 (1)
0.436 (1)
0.8446 (1)
19 Multi-focus Image Fusion Based on Convolutional …
161
Fig. 19.1 Part of the training image datasets
19.2 Convolutional BPDN Dictionary Learning with Mask Simulation Wohlberg [8] indicates auxiliary variable alternation with additive mask simulation (abbreviated as AVA-AMS when convenient) has higher performance than the primary variable alternation with mask decoupling (in term of convergence for ease of description, abbreviated as PVA-MD in our paper) algorithm. In this paper, we use the AVA-AMS to learn convolutional dictionary (the size of dictionary filters is 8 × 8 × 32 in our paper for fair comparison). Part of the USC-SIPI training image sets [3] is shown in Fig. 19.1.
19.3 Proposed CSRMS-Based Fusion Method Given a dictionary with M filters {dm } ∈ R n×n×m (n < m, m ∈ {0, 1, . . . , M}) (dictionary filters are {dk } ∈ R 8×8×32 in this paper) and source image s, W indicates diagonal weighting matrix with mask. The main idea of CSRMS-based image fusion can be written as M 2 M 1 xm 1 + C (x M ) dm ∗ xm − s + λ arg min 2 m=0 {xm } m=0
(19.1)
2
where ∗ is convolution operator, the indicator function of set L = {x ∈ R N : W x = 0} is C (•). The CSRMS-based algorithm is presented below: Input: test source images {s A , s B }, AVA-AMS dictionary filter {dm } ∈ R n×n×m (m ∈ {0, 1, . . . , M}).
162
C. Zhang
19.3.1 Source Multi-focus Image Decomposition Low-frequency components {l A , l B } and high-frequency components h k {k = A, B} are obtained by decomposing two test source multi-focus images {s A , s B }.
19.3.2 Fusion of High-Frequency Components Using learned dictionary filter set, fused coefficient map s f, m , m ∈ {1, . . . , M} of high-frequency component h k {k = A, B} is obtained in Eqs. (19.2) and (19.3). 2 M M 1 xk, m + C (xk, M ) dm ∗ xk, m − h k + λ arg min 1 {xk, m } 2 m=0 m=0
(19.2)
2
x f,m = xk ∗ , m , k = arg max(xk, m )
(19.3)
k
Thus, the fused result of high-frequency component h F is reconstructed by hF =
M
dm ∗ x f, m
(19.4)
m=1
19.3.3 Fusion of Low-Frequency Components “Absolute-maximum” rule is used in fusion of low-frequency components l F .
19.3.4 Image Reconstruction The final fused image s F is reconstructed by sF = h F + l F
(19.5)
19 Multi-focus Image Fusion Based on Convolutional …
163
19.4 Experiments and Analysis To verify the fusion performance of our algorithm, six multi-focus images are tested in Fig. 19.2 and six methods are used in our experiments. In this study, four evaluation metrics, such as QMI (Normalized Mutual Information) [9, 10], Qe (edge-dependent fusion quality index) [10, 11], QTE (mutual information and Tsallis entropy) [10, 12], and QNCIE (Nonlinear Correlation Information Entropy) [10, 13], are selected to apply to six fusion algorithms. For each of the above four Q-series metrics, a larger value indicates a better fusion performance. All fusion methods are implemented in MATLAB R2016a with a 3.20-GHz CPU and a 4.00-GB RAM. Tables 19.1, 19.2, 19.3, 19.4, 19.5, and 19.6 give the objective assessment of different methods on six multi-focus image fusion tasks, respectively. For each metric, the best score among all the six methods is tagged to use bold and scores at the second place are also underlined. For each of the best two values, its rank is given to a digit within a parenthesis. It can be seen from Tables 19.1, 19.2, 19.3, 19.4, 19.5, and 19.6 that our method outperforms other compared methods except that Qe , QTE of the third image and QTE of the fourth image are slightly lower than that of CNN-based method. However, the score of proposed method is in second place. Compared with CSR-based method, QMI , Qe , QTE , and QNCIE averagely increased by 20.91%, 1.99%, 1.72%, and 1.78%, respectively. As a whole, our algorithm is superior to the other five methods. In this subsection, six multi-focus image fusion results are shown in Figs. 19.3, 19.4, 19.5, 19.6, 19.7, and 19.8. The first two figures of each experiment show near-
Fig. 19.2 The source multi-focus source images used in our experiments
Table 19.2 The objective assessment for the second multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
0.8734
0.5333
0.3956
0.8245
SR [2]
1.0508
0.6011
0.4104
0.8337
JSR [3]
0.8449
0.5230
0.3893
0.8232
CSR [6]
0.9671
0.5794
0.4078
0.8289
CNN [7]
1.1371 (2)
0.6177 (1)
0.4223 (1)
0.8387 (2)
OUR
1.1455 (1)
0.6109 (2)
0.4158 (2)
0.8389 (1)
164
C. Zhang
Table 19.3 The objective assessment for the third multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
0.7601
0.5868
0.3969
0.8204
SR [2]
0.8805
0.5949
0.4022
0.8252
JSR [3]
0.7339
0.5601
0.3896
0.8194
CSR [6]
0.8887
0.6161
0.4010
0.8256
CNN [7]
1.0682 (2)
0.6197 (2)
0.4074 (1)
0.8346 (2)
OUR
1.0717 (1)
0.6209 (1)
0.4058 (2)
0.8347 (1)
Table 19.4 The objective assessment for the fourth multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
0.8256
0.7698
0.4025
0.8216
SR [2]
1.1115
0.7842
0.4188
0.8421
JSR [3]
0.7903
0.7701
0.4035
0.8201
CSR [6]
0.9418
0.7846
0.4218
0.8293
CNN [7]
1.1347 (2)
0.7847 (2)
0.4243 (2)
0.8449 (2)
OUR
1.1576 (1)
0.7865 (1)
0.4252 (1)
0.8470 (1)
Table 19.5 The objective assessment for the fifth multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
1.3993
0.8883
0.4472
0.8670
SR [2]
1.5342
0.8992
0.4576
0.8781
JSR [3]
1.2121
0.8775
0.4224
0.8545
CSR [6]
1.2440
0.8898
0.4395
0.8565
CNN [7]
1.5402 (2)
0.8999 (2)
0.4582 (2)
0.8786 (2)
OUR
1.5408 (1)
0.8999 (1)
0.4583 (1)
0.8786 (1)
Table 19.6 The objective assessment for the sixth multi-focus image Methods
QMI
Qe
QTE
QNCIE
NSCT
1.3247
0.8009
0.4596
0.8579
SR [2]
1.4730
0.8225
0.4628
0.8694
JSR [3]
1.0944
0.8004
0.4367
0.8439
CSR [6]
1.1667
0.8059
0.4618
0.8479
CNN [7]
1.4837 (2)
0.8235 (2)
0.4642 (2)
0.8705 (2)
OUR
1.4939 (1)
0.8236 (1)
0.4653 (1)
0.8713 (1)
19 Multi-focus Image Fusion Based on Convolutional …
165
Fig. 19.3 The first multi-focus image fusion results obtained with different methods
Fig. 19.4 The second multi-focus image fusion results obtained with different methods
focused and far-focused source images (see Figs. 19.3a, 19.4a, 19.5a, 19.6a, 19.7a, and 19.8a are near-focused images, Figs. 19.3b, 19.4b, 19.5b, 19.6b, 19.7b, and 19.8b are far-focused images). It can be seen that more spatial detailed information have been reserved to fused image (see the digits “11” in Fig. 19.3 and the desktop texture in Fig. 19.4). Moreover, the proposed method has better ability to preserve edge information than other methods (see the junction of gear and bottle in Fig. 19.5 and intersection between flower and wall in Fig. 19.6). The brightness of the proposed algorithm is higher than that of other methods on the whole (see fire balloon in Fig. 19.7 and leopard in Fig. 19.8).
166
C. Zhang
Fig. 19.5 The third multi-focus image fusion results obtained with different methods
Fig. 19.6 The fourth multi-focus image fusion results obtained with different methods
19.5 Conclusion In this paper, mask simulation is introduced into CSR-based fusion to overcome boundary overlap caused by incompact spatial mask representation in the discrete Fourier transform (DFT) domain. The main innovations of our method are convolutional dictionary filters that are learnt by using AVA-AMS. The main contribution of this paper could be summarized into the following two points: (1) we introduce auxiliary variable alternation with additive mask simulation (abbreviated as AVAAMS when convenient) into convolutional dictionary filter learning. AVA-AMS algorithm has much faster convergence than the primary variable alternation with mask
19 Multi-focus Image Fusion Based on Convolutional …
167
Fig. 19.7 The fifth multi-focus image fusion results obtained with different methods
Fig. 19.8 The sixth multi-focus image fusion results obtained with different methods
decoupling (PVA-MD) algorithm and (2) we propose a novel CSRMS-based fusion method which is applied to multi-focus image. Experimental results demonstrate that CSRMS-based method can achieve superior performance in terms of preserving spatial and edge information. Partial compared image fusion codes are available on http://www.escience.cn/people/liuyu1/index.html. Acknowledgements This work is supported by the Luzhou Science and Technology Program (2019-SYF-34), Scientific Research Project of Sichuan Public Security Department (Grant 201917), Sichuan Science and Technology Program (2019YFS0068 and 2019YFS0069).
168
C. Zhang
References 1. Zhang, Q., Guo, B.L.: Multifocus image fusion using the nonsubsampled contourlet transform. Signal Process. 89(7), 1334–1346 (2009) 2. Yang, B., Li, S.T.: Multifocus image fusion and restoration with sparse representation. IEEE Trans. Instrum. Meas. 59(4), 884–892 (2010) 3. Yin, H.T., Li, S.T.: Multimodal image fusion with joint sparsity model. Opt. Eng. 50(6), 067007 (2011) 4. Gao, Z.S., Zhang, C.F.: Texture clear multi-modal image fusion with joint sparsity model. Opt.—Int. J. Light Electron Opt. 130 (2016) 5. Zhang, C.F., Yi, L.L., Feng, Z.L., Gao, Z.S., Jin, X., Yan, D.: Multimodal image fusion with adaptive joint sparsity model. J. Electron. Imaging 28(1) (2019) 6. Liu, Y., Chen, X.: Image fusion with convolutional sparse representation. IEEE Signal Process. Lett. 23(12), 1882–1886 (2016) 7. Liu, Y., Chen, X., Peng, H., Wang, Z.F.: Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 36, 191–207 (2017) 8. Wohlberg, B.: Boundary handling for convolutional sparse representations. In: IEEE International Conference on Image Processing. IEEE (2016) 9. Qu, G.H., Zhang, D.L., Yan, P.F.: Information measure for performance of image fusion. Electron. Lett. 38(7), 313–315 (2002) 10. Liu, Z., Blasch, E., Xue, Z.Y., Zhao, J.Y., Wu, W.: Objective assessment of multiresolution image fusion algorithms for context enhancement in night vision: a comparative study. IEEE Trans. Pattern Anal. Mach. Intel. 34(1), 94–109 (2011) 11. Piella, G., Heijmans H.: A new quality metric for image fusion. In: Proceedings of International Conference on Image Processing (2003) 12. Cvejic, N., Canagarajah, C.N., Bull, D.R.: Image fusion metric based on mutual information and Tsallis entropy. Electron. Lett. 42(11), 626–627 (2006) 13. Wang, Q., Shen, Y., Jin, J.: Performance evaluation of image fusion techniques. Image Fusion: Algorithms Appl. 19, 469–492 (2008)
Chapter 20
A Collaborative Sparse Representation-Based Approach for Pattern Classification Yaofang Hu and Yunjie Zhang
Abstract Sparse representation-based classification (SRC) has been an important approach in pattern classification and widely applied to various fields of visual recognition. However, there are some practical factors to consider in real-world problems, such as pixel damage, block occlusion, illumination, and position change. In recent years, researchers have brought a large number of improvements and proposed various effective algorithms. We improve the degree of sparse representation (SR) based on the probabilistic collaborative representation framework in this paper. Furthermore, a new algorithm is proposed and successfully applied to face recognition. The effectiveness and efficiency of our method are demonstrated experimentally.
20.1 Introduction Pattern classification has developed rapidly and became a new subject in recent years. Generally speaking, pattern classification methods [1] mainly include parametric methods and non-parametric methods. The major distinction between these two classification patterns is that the non-parametric classification method directly uses training samples to determine which classes the test samples belong to. A type of widely used non-parametric classifiers is the distance-based classifiers whose principle is to allocate unknown samples to the category with the shortest distance. Recently, SRC algorithm [2] has attracted wide attention. A normal classification method is proposed for the target identification based on the sparse representation (SR) calculated by l1 -norm minimization. Subsequently, Zhang et al. [3] proposed that the collaborative representation is more important than l1 -norm sparsity in SRC. Therefore, collaborative representation classification (CRC) algorithm has been presented for visual identification and other pattern classification. Based on SRC and CRC methods, many other improved algorithms for image technology have emerged. Y. Hu · Y. Zhang (B) Department of Mathematics, Dalian Maritime University, Dalian 116026, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_20
169
170
Y. Hu and Y. Zhang
Affected by CRC and probabilistic subspace methods, a probabilistic collaborative representation-based classifier (ProCRC) is suggested by Cai et al. [4] from the perspective of probability for pattern classification. Similarly, we put forward an improved algorithm according to prior knowledge. In our paper, a more effective classification method is presented according to the probabilistic collaborative representation framework, which makes the test sample most likely to be classified into the target classes. Empirical experiments illustrate that this method can better reflect the sparseness of classification representation. By adding a weighted matrix, the items that determine the classification are given more weight. The expression coefficient and weighted matrix in the objective function are iteratively updated until convergence. Rest of the paper is arranged as follows. In Sect. 20.2, we review some relevant methods. Then we detail the proposed approach in Sect. 20.3. Section 20.4 shows conducted experiments on the face image dataset and conclusions are drawn in Sect. 20.5.
20.2 Related Work Our algorithm is based on some traditional algorithms [2, 4, 5], so we studied these traditional algorithms related to SR algorithm. The algorithm is also inspired by these methods, and therefore we introduce them and give our own understanding about these algorithms in this section.
20.2.1 Sparse Representation-Based Classification Wright et al. [2] presented an effective algorithm and applied it to robust face recognition. In the case of face recognition with noise and occlusion, the algorithm is still effective, and also can solve other problems in image processing successfully. Given sufficient training samples Ai = vi,1 , vi,2 , . . . , vi,n ∈ Rm×n of ith object class. Based on the linear subspace principle, we represent the test sample y in category i as y = αi,1 vi,1 + αi,2 vi,2 + · · · + αi,ni vi,ni
(20.1)
The matrix A = [A1 , A2 , . . . , Ak ] represents all training samples of k target classes. Assume that these training samples are known, and it can also be expressed as follows: y = A1 α1 + A2 α2 + · · · + Ak αk = Ax0
(20.2)
20 A Collaborative Sparse Representation-Based Approach …
171
T where coefficient vector x0 = 0, . . . , 0, αi,1 , αi,2 , . . . , αi,n , 0, . . . , 0 ∈ Rn is related to ith class. According to the SR theory and related researches, the sparse solution of above equation can be deduced as follows: xˆ0 = arg minx0 s.t. Ax = y
(20.3)
The equation is also equivalent to xˆ1 = arg minx1 s.t. Ax − y2 ≤ ε
(20.4)
For each category i, define the characteristic function δ i that chooses factors relative to class i. Therefore, one can reconstruct y by the training samples of class i as yˆi = Aδi (xˆ1 ). According to the minimum residuals between y and yˆi , the category to which the test sample belongs can be determined as . min ri (y) = y − Aδi (xˆ1 )2 i
(20.5)
The algorithm of SRC implementation reveals the importance of l1 -norm in sparse representation and lays a foundation for future research.
20.2.2 Weighted Group Sparse Representation Classification Tang et al. [6] proposed weighted group sparse representation classification (WGSRC) algorithm, which improves deficiency of SRC. When we represent the test samples, each category plays an irreplaceable role. WGSRC method mainly aims to assign weights based on the contribution of training samples and combines local differences and discriminative weights to constitute the weighted group sparse representation. The advantage of weighted group sparse representation algorithm is that it considers not only structural differences but also discrimination in recognition ability. The weighted l2, 1 -norm is represented as follows: c 2 βi 2 β ∗ = min y − X S −1 β 2 + λ β
(20.6)
i=1
In this case, y is the test sample, X is training sample matrix, and λ is group sparse regularization parameter. The matrices S = diag([s1 , s2 , . . . , sc ]), si = [si1 , si2 , . . . , sin ]T , where sik = wi dik , dik = exp(y − xik 2 /σ2 ) and σ 2 is bandwidth, are used to evaluate which training sample provides the maximum contribution for representing a test sample. According to the above representation, calculate sparse coefficient β * of y and determine which class the test sample y belongs to.
172
Y. Hu and Y. Zhang
20.2.3 Collaborative Representation-Based Classification Zhang et al. [3] raised questions about whether l1 -norm or collaborative representation plays a critical role in application fields such as visual recognition. They studied the problem deeply and proposed collaborative representation classification (CRC) algorithm. To verify their viewpoint that collaborative representation is vitally important, the brief explanation is given below. In order to simplify the calculation, the regularized least square method is used in CRC. So we get the target function: (ρ) ˆ = argmin y − Xρ22 + λρ22
(20.7)
ρ
where λ is the regularization parameter. Although the sparsity is more feeble than ˆ The solution l1 -norm, it also proposes a specific amount of sparsity for solution ρ. can be solved as follows: ρˆ = (X T X + λ · I )−1 X T y
(20.8)
Assume that P = (X T X + λ · I )−1 X T , the solution can be represented in terms of ρˆ = P y. The minimum residual can decide which class the test sample y belongs to as given below: identity(y) = argmin{ri }
(20.9)
i
where ri = y − X i ρˆi 2 ρˆi 2 .
20.2.4 The Probabilistic Collaborative Representation-Based Classifier Cai et al. [4] proposed ProCRC algorithm based on CRC. According to the mathematical derivation in the article, the probability P can be calculated by 2 2 γ 2 P(l(y) = k) ∝ exp − y − X αˆ 2 + λαˆ 2 + X αˆ − X k αˆ k 2 K
(20.10)
The first two terms are the same for all classes, and thus we can calculate P by 2 pk = exp − X αˆ − X k αˆ k 2 Then we can classify the query sample correctly by
(20.11)
20 A Collaborative Sparse Representation-Based Approach …
l(y) = argmax{ pk }
173
(20.12)
k
The Gaussian kernel can be replaced by the Laplacian kernel to make probabilistic collaborative representation classification more robust, so the probability representation can be measured by
P(l(y) = l(x)|l(x) ∈ l X ) ∝ exp −κy − x1
(20.13)
By taking the partial derivative, the robust ProCRC (R-ProCRC) model can be obtained as follows: K
γ 2 2 X α − X k αk 2 αˆ = argmin y − X α1 + λα2 + (20.14) K k=1 α Therefore, the test sample can be classified according to (20.12).
20.3 Proposed Method To facilitate the description of the algorithm, we first define some mathematical symbols and expressions. Given k known classes, expressed as k training samples: X = [X 1 , . . . , X i , . . . , X k ], where X i = x(i−1)n+1 , x(i−1)n+2 , . . . , xin . Suppose there are n training samples in each category and the test sample is represented by column vector y.
20.3.1 Description of the Proposed Method Some conventional algorithms ignore differences between classes. Thus, we improved this deficiency by the weighted least square method in our approach. In addition, inspired by the probabilistic collaboration representation classification, we consider the effect of each class of training samples in the linear representation. Our main purpose is to improve the recognition accuracy by expanding discriminant property of different classes and reducing intra-class differences. Therefore, we set the objective function as follows:
K γ 1 X α − X i αi 22 + λ1 α1 + λ2 α22 + (X α − y)T W X (X α − y) min α K i=1 2
(20.15)
174
Y. Hu and Y. Zhang
where γ , λ1 , and λ2 are positive constants. In our method, these coefficients help to balance the relationship between theterms of target function. The weighted matrix W X can be written as W X (i, i) = 1 |X (i, :)α − yi |, where X(i,:) refers to the ith row of X and y is a column vector representing test sample. According to the above formula, we take the derivative with respect to α and the solution to α can be obtained by taking the derivative as follows: ∂ ∂α
K γ X α − X i αi 22 K i=1
∂ = ∂α
K γ X αi 2 i 2 K i=1
K γ T X Xi α =2 K i=1 i (20.16)
where the equality X α − X i αi = X i α = X 1 α1 + · · ·+ X i−1 αi−1 + X i+1 αi+1 + · · · + X k αk is for ease of calculation, and X i = X − Si = X 1 , . . . , X i−1 , 0, X i+1 , . . . X k , Si = [0, . . . , X i , . . . , 0]. Next, we have ∂ λ1 α1 + λ2 α22 = (λ1 A + 2λ2 I )α (20.17) ∂α
where A = diag |α1 |−1 , |α2 |−1 , . . . , |αk |−1 and I is generally thought of identity matrix. At last, we get, ∂ 1 T (X α − y) W X (X α − y) = X T W X (X α − y) ∂α 2
(20.18)
Therefore, the derivative over α of (20.15) is K γ T X X i α + (λ1 A + 2λ2 I )α + X T W X (X α − y) K i=1 i
(20.19)
According to the properties of convex function, we can get the optimal solution as follows: αˆ =
K γ T X i X i + λ1 A + λ2 I + X T W X X K i=1
−1 X T WX y
(20.20)
where A = diag |α1 |−1 , |α2 |−1 , . . . , |αk |−1 . Our proposed method iteratively updates W X and coefficient vector α until convergence. The method combines the advantages of CRC and SRC, applies two norm values in the target function, and solves them. From the above, we name our method probabilistic collaborative sparse representation classification and call it ProCSRC for convenience. Based on the analysis and derivation, the procedure of ProCSRC is summarized as follows (Table 20.1).
20 A Collaborative Sparse Representation-Based Approach … Table 20.1 The ProCSRC algorithm
175
1. Input: the matrix of training samples and a test sample y 2. Standardize the columns of X to have unit l 2 -norm 3. Solve the model (20.15), W X and αˆ are updated iteratively, until convergence 4. Calculate the residuals ri (y) = y − X i α i 2 5. Output: identity(y) = argmini {r i (y)}
20.4 Experimental Verification We validate our method on a public database to prove the effectiveness of ProCSRC in this section. It can be proved that ProCSRC can improve the accuracy of classification recognition by comparing it with previous classification methods. We will then demonstrate that the recognition accuracy of ProCSRC is better than other methods in face recognition field with random corruption on the Extended Yale B dataset. There are three parameters in proposed method: γ , λ1 , and λ2 . In the experiment, we set the parameter γ = 10−2 . For parameter λ1 and λ2 , we use cross-validation to determine the specific value. The Extended Yale B [7, 8] database is used since it is commonly used in the original papers to evaluate SRC and CRC. We randomly select a specific percentage of pixels for each test image and replace them with a uniform allocation value between [0, 255]. In the random corruption experiment, the training dataset consists of 30 randomly selected images from each subject, and then takes rest of the images as test set. To demonstrate the effectiveness of ProCSRC algorithm, we conduct a set of comparisons with R-ProCRC. The different pixel corruption ratios are selected at intervals of 0.05. Multiple experiments are carried out so that we can better find out in which range our method has a preferable recognition rate. Table 20.2 shows that when the corruption rate is 0.2, the recognition rate on face images of ProCSRC is slightly lower than that of R-ProCRC, while the corruption ratio in other cases is higher than that of R-ProCRC. By contrast, the difference in recognition rate is the largest when the corruption ratio is 0.65. This shows that ProCSRC method has good stability to the random corruption on face images.
20.5 Conclusions and Discussions In this paper, a novel effective algorithm is presented for face recognition. We improve the ProCRC algorithm to make the target function more sparse and the recognition rate better. In the norm representation, l1 -minimization can get more sparse solution and l 2 -norm can avoid overfitting problem effectively. We combine these two terms to obtain a more optimized sparse solution. The difference between the test sample and training samples in all classes is enlarged by using weighted coefficient in our
176 Table 20.2 Recognition rate (%) with random corruption
Y. Hu and Y. Zhang Corruption ratio
Algorithms R-ProCRC
ProCSRC
0
95.05
96.08
0.05
94.43
96.00
0.1
96.15
96.31
0.15
95.84
96.08
0.2
96.62
96.08
0.25
95.13
95.53
0.3
94.43
96.47
0.35
94.51
95.21
0.4
94.03
95.21
0.45
93.01
93.96
0.5
90.97
91.84
0.55
88.15
88.46
0.6
83.52
84.14
0.65
73.31
76.92
0.7
62.40
63.34
0.75
46.23
47.10
0.8
31.32
31.56
method. According to the experimental results of face images, ProCSRC algorithm can effectually improve recognition rate under the condition of random corruption.
References 1. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pp. 1–8 (2008) 2. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S, Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intel. 31(2), 210–227 (2009) 3. Zhang, L., Yang, M., Feng, X.: Sparse representation or collaborative representation: Which helps face recognition? In: Proceedings of 2012 IEEE International Conference on Computer Vision (ICCV 2012), pp. 471–478 (2012) 4. Cai, S., Zhang, L., Zuo, W., Feng, X.: A probabilistic collaborative representation based approach for pattern classification. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp. 2950–2959 (2016) 5. Peng, Y., Li, L., Liu, S., Li, J., Wang, X.: Extended sparse representation based classification method for face recognition. Mach. Vis. Appl. 1–17 (2018) 6. Tang, X., Feng, G., Cai, J.: Weighted group sparse representation for undersampled face recognition. Neurocomputing 145(18), 402–415 (2014)
20 A Collaborative Sparse Representation-Based Approach …
177
7. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intel. 23(6), 643–660 (2001) 8. Yale, B.: Face database. http://vision.ucsd.edu/content/yale-face-database
Chapter 21
Research on an Irregular Pupil Positioning Method Yong Zhao, Shouming Zhang, Huan Lei, Jingqi Ma and Nan Wang
Abstract The traditional pupil positioning method is carried out under the assumption that the pupil boundary is a regular circle, and thus the rich texture features around the pupil boundary are eliminated, which directly affects the iris recognition accuracy. To solve this problem, this paper proposes an irregular pupil positioning method, finds the pupil boundary by preprocessing, then uses the least-squares method to accurately fit the pupil boundary. Finally, the algorithm is verified and analyzed by the CASIA database, and the traditional Hough transform method is used as a comparative experiment. The experimental results show that the proposed algorithm is not only suitable for irregular pupil boundaries (including noncircular and concave and convex areas), but also better in positioning time and positioning accuracy.
21.1 Introduction Biometrics has set off waves in the tech world and has spread to other areas and even changed the way people live. Biometric technology includes fingerprint recognition, face recognition, iris recognition, etc. Among them, fingerprint recognition and face recognition have been widely used in people’s lives, such as fingerprint locks, face authentication in and out of train stations, and so on. However, iris recognition is the safest and most reliable biometric technology, but it has not been widely used. Therefore, in recent years, a large number of researchers have invested in the algorithm research of iris recognition. The human iris is rich in texture features, which are basically formed in the eighth month of pregnancy and are stable for a long time, and will not change actively without external stimulation [1]. Iris positioning is the first step and the key step in Y. Zhao (B) Kunming University of Science and Technology, Kunming 650093, China e-mail: [email protected] Y. Zhao · S. Zhang · H. Lei · J. Ma · N. Wang Guangdong Institute of Intelligent Manufacturing, Guangzhou 510070, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_21
179
180
Y. Zhao et al.
iris recognition. The quality of iris positioning directly affects the accuracy of iris recognition. The purpose of iris localization is to segment the annular iris region with rich texture, so there are two tasks, one is to locate the inner boundary of the iris (the pupil and iris junction), and the other is to locate the outer boundary of the iris (sclera and iris junction). In recent years, as a large number of researchers have been involved in the research of iris recognition, many iris localization algorithms have been proposed. Zhu Lijun et al. proposed to use Hough transform to improve the speed of inner circle fine positioning [2]. Li et al. proposed a coarse to fine iris localization method, using Hough transform and calculus method to obtain the location of the rough inner and outer boundaries of the iris [3]. Kumar et al. proposed the use of edge maps and circumferential Hough transforms for pupil boundary detection [4]. Tisse enhanced Daugman’s approach to solving the problem of positioning the pupil center [5]. Liu et al. proposed an iris localization algorithm based on block search. The Hough algorithm based on edge detection is used to coarsely locate the inner circle, and the block search method is used to finely locate the inner circle [6]. However, the premise of these algorithms is to define the boundary of the iris as a regular circle, which will eliminate the rich texture around the pupil boundary and seriously affect the accuracy of iris recognition, as shown in the original iris map of Fig. 21.1a. It can be seen that the texture around the edge of the pupil is very rich. Aiming at this problem, this paper proposes a method to accurately fit the pupil boundary. First, the image is binarized according to the gray histogram, the boundary is detected by the improved Canny algorithm, and then the boundary sets are filled to eliminate the influence of the spot in the pupil. Then perform the open operation to eliminate the influence of the eyelashes, and then estimate the pupil center by the projection method. From the estimated pupil center, find the point with the largest gradient change in the radial direction. These points are the inner boundary points of the iris, and finally these are the nonlinear least-squares method. When the points are connected, the closed curve formed is the pupil boundary.
Fig. 21.1 a Original iris map; b grayscale histogram; c Binarization map; d eliminate the spot pattern; e eliminate eyelashes
21 Research on an Irregular Pupil Positioning Method
181
21.2 Iris Positioning 21.2.1 Image Preprocessing Since the eyelashes and spots in pupils greatly influence the positioning accuracy, it is necessary to perform necessary preprocessing on the images before the iris positioning to eliminate the influence of certain factors. Image preprocessing includes image binarization. The image edge gradient is detected by the Canny algorithm improved in this paper. The filling method and the opening operation are performed to eliminate the influence of pupil spot and eyelash, respectively. The whole process of image preprocessing is shown in Fig. 21.1. Since the gray value between the pupil and the iris is significantly different, and the gray value of the pupil is much lower than the gray value of other regions, according to this characteristic, the image can be binarized to separate the pupil region. The binarization of an image consists of two steps: A. Draw a histogram of the image, with the first trough point as the threshold T of binarization, as shown in Fig. 21.1b; B. The valley value (T value) obtained in step A is used as a threshold value, and the image is binarized by the binarization formula shown in the Eq. (21.1), and the binarized image is as shown in Fig. 21.1c. B(x, y) =
0 255
h(x, y) ≤ T h(x, y) > T
(21.1)
where B(x, y) represents the binarized gray value, and h(x, y) represents the actual gray value of the pixel point (x, y). The Canny algorithm is a classic machine vision edge detection algorithm, which is more effective in edge detection than other edge detection algorithms. Therefore, this algorithm is improved to better adapt to the iris edge detection. The traditional Canny detects the edges in four directions: vertical, horizontal, 45°, and 135°. Since the eyelashes are mostly gradient changes in the horizontal direction, the improved Canny algorithm takes the vertical direction to reduce the edge detection of the eyelashes. Gradient changes of 45° and 135°, while eliminating gradient changes in the horizontal direction, can reduce the influence of many non-pupil boundary gradients. The final edge detection map is shown in Fig. 21.1d. The open operation is mathematically a process of first eroding and then expanding. It removes the orphans, burrs, and bridges (connecting the small points of the two regions), eliminates the small objects, and smoothes the boundaries of the large objects, and does not change the area. This paper uses the opening operation to remove a small amount of independent eyelashes and eliminate the effect of eyelashes on iris positioning. The open operation formula is as shown in Eq. (21.2), and the effect after the open operation is as shown in Fig. 21.1e:
182
Y. Zhao et al.
S = (X B) ⊕ B
(21.2)
In the formula, represents an image before the opening operation, represents a circular structural element, and S represents an image after the opening operation. Image preprocessing is the calibration service for the pupil center point. Good pretreatment can eliminate the influence of non-pupil factors (eyelashes, spots, etc.) and extract pure pupil images, which greatly improves the accuracy of pupil positioning.
21.2.2 Pupil Center Point Positioning After preprocessing, the image of the eye is transformed into a pure binarized pupil image without eyelashes and spots. Without the influence of eyelashes and spots, the estimated value of the pupil center is closer to the actual value of the pupil center. There are two classic algorithms for estimating the pupil center point. One is the projection method to estimate the pupil center, and the other is the window to estimate the pupil center. A. The basic principle of the projection method is to calculate the row pixel accumulation sum in the X direction and the column pixel accumulation sum in the Y direction by horizontal projection and vertical projection, and take the largest row and column value as the estimated position coordinate of the pupil, and the pupil center coordinate estimation formula is as follows Eq. (21.3): ⎛ ⎞ ⎧ n m n ⎪ ⎪ ⎪ ⎪ x0 = x0 , i f B(x0 , y) = max ⎝ B(x, y)⎠ ⎪ ⎪ ⎨ y=1 x=1 y=1 ⎛ ⎞ ⎪ n m n ⎪ ⎪ ⎪ ⎪ B(x, y0 ) = max ⎝ B(x, y)⎠ ⎪ ⎩ y0 = y0 , i f
(21.3)
x=1
y=1 x=1
where (x0 , y0 ) is the estimated pupil center coordinate, m denotes that the image has a total of m columns of pixels, n denotes n rows of pixels, B(x, y) denotes a pixel value at coordinates (x, y), and U denotes a sum of pixel values in n rows or m columns. The projection method needs to operate all the pixels of the image to extend the running speed. In order to improve the positioning speed, the moving window method is used to calibrate the pupil center. B. Moving window that estimates the basic principle of the pupil center method is to set a position (x, y), a side pixel of size pixels, a square array window with elements all of 1 (x, y, size) and pixels in the image. The convolution operation is performed from left to right and top to bottom, and the position where the convolution value is the smallest is taken as the estimated position of the pupil center, and the pupil center is in Eq. (21.4):
21 Research on an Irregular Pupil Positioning Method
183
Fig. 21.2 a One-by-one convolution diagram; b convolution map at different locations; c pupil center estimation map
(x p , y p ) = (x, y)|min
window(x, y, size) ∗ B(x, y)
(21.4)
x,y
window(x, y, size) indicates the size of the moving window, and Ux,y indicates the convolution value of the moving window at the (x, y) pixel. Since the preprocessed image is a binarized image, only the pupil region has a pixel value, and this method needs to scan the entire image one by one, and the scan value after the pupil is zero, which also prolongs the positioning time, thereby this method is improved, and the specific improvement ideas are: When the window just enters the pupil area, the convolution value at the next moment is greater than the convolution value at the previous moment. When the window moves out of the pupil region, the convolution value at the next moment is smaller than the convolution value at the previous moment. Then, the largest convolution value is the first occurrence of this time is smaller than the convolution value of the previous moment, the convolution value of the previous moment is the largest convolution value, and the window center at this time is the estimation. The center of the pupil. In this way, only half-image or even a small part of the image needs to be convoluted, which greatly shortens the running time. As shown in Fig. 21.2, (a) shows that the window performs a convolution operation on the image one by one, (b) the 2 windows in the figure are the windows with the smallest convolution value, and (c) the figure is the final determined pupil center, the center of the window is the estimated center of the pupil.
21.2.3 Pupil Fine Positioning Starting from the center point of the pupil that has been obtained, N rays are randomly emitted around, and the points with the largest gradient change are searched along these rays, and the position information of the point is saved, then N gradient points have N gradient points, such as Fig. 21.3a.
184
Y. Zhao et al.
Fig. 21.3 a Pupil boundary point map; b pupil fine positioning map
By using the positional information of these points by least-squares method to fit a closed figure close to the pupil boundary, the pupil boundary can be accurately positioned. The positioning map is shown in Fig. 21.3b. The least-squares method is a statistical learning optimization technique whose goal is to minimize the sum of squared errors and find the optimal model. This model can fit the obtained boundary point data. The least-squares method is applicable to a variety of complex object models, and can intuitively give a measure of a certain fitting error, achieving a high fitting accuracy [7]. The target formula is as Eq. (21.5): J(θ ) =
m ( f θ (xi ) − yi )2
(21.5)
i=1
where f θ (xi ) is the model (taken from the hypothesis space) and yi is the boundary point data value. In summary, the sum of the squares of the distance between the data value and the fitted value is minimized as the target.
21.3 Experimental Design and Analysis In order to verify the effectiveness of the proposed algorithm, this paper uses the iris database CASIA-Iris4.0-Lamp [8] of the Institute of Automation, Chinese Academy of Sciences. CASIA-Iris-Lamp uses the handheld iris sensor produced by OKI to collect the left and right eyes of 411 objects, and produces 16,215 images with a size of 640 * 480. These images are divided into 819 categories and are close to the test. The lamp is turned on or off to introduce more pupil changes, so CASIA-Iris-Lamp is very suitable for the analysis and verification of this algorithm. A partial iris diagram of this database is shown in Fig. 21.4. In this paper, a left and right eye iris map was taken out from 411 objects, and a total of 822 iris maps were used to carry out the experiment. In order to highlight the advantages of the proposed algorithm, a comparative experiment was designed and the Hough transform method was used to
21 Research on an Irregular Pupil Positioning Method
185
Fig. 21.4 a Hough transform positioning map; b algorithm positioning map
experiment with the same database. Since the Hough transform is a classical circular transform algorithm, the use of the Hough transform for pupil positioning assumes that the pupil edge is a regular circle. The literature [9] is an improvement of the Hough transform, which is superior to the traditional Hough transform in positioning accuracy. Therefore, these two algorithms are used as the comparative experiments of the proposed algorithm. In order to better demonstrate the accuracy of the algorithm, the Euclidean distance is used as the evaluation standard for the accuracy of the detection algorithm, as shown in Eq. (21.6): ∂=
(x − x0 )2 + (y − y0 )2
(21.6)
∂ denotes precision, (x, y) denotes the row and column values of the pixel points, and (x0 , y0 ) denotes the row and column values of the standard pupil edge. Both the algorithm and the Hough transform algorithm experiment are programmed using C++ on the VS2015 software environment platform, running on Intel’s second-generation CORE i5-2450 M 2.5 GHz dual-core processor with 8 GB hardware. According to the experimental comparison of the two algorithms shown in Fig. 21.4, the proposed algorithm has better edge fitting for irregular pupils (including concave, convex and elliptical), and can acquire more iris textures for iris recognition. The accuracy rate has been greatly improved. Figure 21.5 shows a partial positioning map showing that the method is applicable to pupil positioning with irregularities (including non-circular and concave and convex points).
186
Y. Zhao et al.
Fig. 21.5 Partial positioning results
According to the accuracy and positioning time of the three algorithms shown in Table 21.1, the accuracy of iris localization using traditional Hough transform is 93.1%, and the average cost per map is 0.23 s. The accuracy of the literature [9] is 97.5%, the average positioning time of each figure is 0.6 s, and the accuracy rate of this paper is 95.6%, and the average cost of each picture is 0.20 s. The algorithm is not only higher than the Hough algorithm in accuracy, but also the average positioning time is slightly shorter than the Hough algorithm. Compared with the accuracy of the literature [9], the positioning time is much shorter. Table 21.1 Test results
Inner boundary of iris positioning method
Average iris map positioning time (s)
Accuracy (%)
Hough transform
0.23
93.1
Literature [9]
0.69
97.5
Method of this paper
0.20
95.6
21 Research on an Irregular Pupil Positioning Method
187
21.4 Conclusion The traditional pupil positioning algorithm is based on the premise that the pupil boundary is a regular circle, however, the regular circle will eliminate the rich texture features around the pupil boundary, which directly affects the accuracy of iris recognition. Therefore, this paper proposes an irregular pupil location algorithm that can accurately locate the pupil boundary. The experimental results show that the proposed algorithm can not only locate irregular pupil boundaries (including concave, convex, and elliptical), but also obtain more accurate positioning of pupil boundaries, thus obtaining richer iris texture, and there is also a slight improvement in positioning time and positioning accuracy.
References 1. Daugman, J.: How iris recognition works (Chap. 25). Essential Guide to Image Processing, pp. 715–739 (2004) 2. Zhu, L., Yuan, Y.: Non-ideal iris localization based on sum-checking and edge detection template. Appl. Res. Comput. 35(06), 1879–1882 (2018) 3. Li, Z.: An iris recognition algorithm based on coarse and fine location. In: Proceedings of 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA 2017). IEEE Beijing Section, Xi’an Jiaotong-Liverpool University: IEEE Beijing Section (Multinational Institute of Electrical and Electronics Engineers Beijing Branch) (2017) 4. Kumar, V., Asati, A., Gupta, A.: Accurate iris localization using edge map generation and adaptive circular hough transform for less constrained iris images. Int. J. Electr. Comput. Eng. 6(4), 1637–1646 (2016) 5. Tisse, C., Martin, L., Torres, L., Robert, M.: Person identification technique using human iris recognition. In: Proceedings of ICVI’02, pp. 294–299 (2002) 6. Liu, S., Liu, Y., Zhu, X.: Iris location algorithm based on block search 2017 August 29. Comput. Eng. Appl. (2017) 7. Tian, K.: Ellipse fitting algorithm based on least squares (2008) 8. http://biometrics.idealtest.org/findTotalDbByMode.do?mode=Iris. Accessed 26 Nov 2019 9. Yannian, Wang, Langyue, Zhao, Huimin, Liu: An accurate method of iris localization. Foreign Electron. Meas. Technol. 36(11), 34–37 (2017)
Chapter 22
Adaptive Step Size of Firefly’s Fuzzy Clustering Image Segmentation Yangyang Hu and Zengli Liu
Abstract Aiming at the local optimal problem and noise sensitivity problem of image segmentation in FCM, an adaptive step size firefly fuzzy kernel function clustering is proposed for image segmentation. Firstly, the firefly algorithm was used to optimize the clustering center of KFCM, and then an adaptive step size was proposed to solve the problems caused by firefly step size, so as to prevent the clustering algorithm from falling into local optimization due to the excessively long step size. Finally, the simulation experiment is carried out. The final experimental results show that the proposed algorithm improves the precision of segmentation, reduces the segmentation of the image edge, and improves the anti-noise performance.
22.1 Introduction Image segmentation [1, 2] is an indispensable part of image understanding, and it is a hot spot for scholars in China. Clustering is a static method of data analysis that is widely used in many fields, such as machine learning, artificial intelligence, pattern recognition, and image analysis. Among the many algorithms used in clustering are the k-means algorithm and the FCM algorithm. In image clustering, fuzzy c-means (FCM) clustering [3, 4] is also a hot topic, but FCM also has obvious shortcomings. Aiming at the problem that the FCM algorithm is easy to be affected by noise, and it has a fatal defect, allowing the local optimization, this paper proposes a fuzzy clustering algorithm of kernel based on adaptive variable step-size firefly algorithm. Clustering algorithm based on improved firefly optimization algorithm. Although the FCM algorithm is a soft clustering algorithm, the clustering result is greatly affected by the initial cluster center point, and it is also extremely sensitive to the anomaly point. The algorithm cannot obtain the global optimal. Firefly algorithm is mainly used to optimize the objective function of FCM algorithm. Due to the characteristics of firefly algorithm, it can alleviate the fatal problem of FCM algorithm falling into Y. Hu · Z. Liu (B) Kunming University of Science and Technology, Kunming 650500, Yunnan, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_22
189
190
Y. Hu and Z. Liu
local optimization. Kernel function is used to replace the Euclidean distance in FCM algorithm to overcome the sensitivity to noise determination.
22.2 Improved KFCM Clustering Algorithm Based On Adaptive Variable Step Size Firefly Algorithm (VSSFAKFCM) 22.2.1 Firefly Algorithm Firefly algorithm is an adaptive pseudo-algorithm, It is also a heuristic algorithm inspired by the survival process of natural fireflies. The core of the firefly algorithm, in the firefly swarm, each firefly moves by the brightness of other fireflies. Professor Yang [5, 6] of Cambridge University proposed the firefly algorithm, which is as follows: (1) Fireflies are gender-neutral, so a firefly will attract all other fireflies; (2) There is a positive correlation between attractiveness and brightness; (3) If there is no firefly brighter than the given firefly, it will move freely.
22.2.2 FCM Algorithm Based on Kernel Function A FCM algorithm based on the kernel model, the main idea is to replace the Euclidean distance metric in the traditional FCM algorithm with the kernel model. Kernel fuzzy c-means clustering (KFCM) algorithm maps the samples of the input space to the high-dimensional feature space through the kernel function, so that the difference between the various samples increases, thus overcoming the difference between the traditional FCM algorithm and the difficulty of processing. The objective function of the fuzzy kernel clustering algorithm (denoted as KFCM-II) in the input space is defined as Jm (U, V ) =
c n
m u ik xk − vi 2
(22.1)
i=1 k=i
where c is the number of clusters, n is iteration, x k are data set, v is center, U = {uik }, v = (v1 , v2 , v3 …), m > 1 is a constant, its constraint is c i=1
u ik = 1, ∨k = 1, 2, 3, . . . , n
(22.2)
22 Adaptive Step Size of Firefly’s Fuzzy Clustering …
191
22.2.3 After Improvement VSSFAKFCM Aiming at the insufficiency of the fuzzy c-means clustering algorithm to the initial value and the interference, it is easy to fall into the local optimum, combined with the good global optimization ability of the firefly algorithm and the strong local optimization ability, and the convergence speed is fast. To this end, this paper combines the firefly algorithm with the fuzzy C-means clustering method based on kernel function, and FAKFCM (fuzzy clustering method of kernel based on firefly algorithm). At the same time, when the kernel function is referenced, the problem that the FCM is sensitive to noise is overcome. In the standard FA, the way the steps are set up is static and does not reflect the entire search process. In general, it is useful to have a large new exploration space for firefly exploration, but it does not help globally optimal convergence. If the value of the iteration is small, the result is reversed. And convergence has a big impact, and it is beneficial to balance the capabilities of global exploration and local development. It should focus on the status quo. To this end, we have designed a firefly algorithm VSSFA (variable step size firefly algorithm) [7] that can control the step size a. In this article, we quote nonlinear equations and design the step size adjustment scheme. The calculation formula of the step length is as follows: a(t) = 0.4/(1 + exp(0.015 ∗ (t − max_iter)/3))
(22.3)
where max_iter is iterations. The maximum number of iterations is set by us as needed. The objective function of fuzzy clustering becomes the brightness of firefly, and the brightness function is defined as I (V ) = c i=1
n
1
m k=i u ik Φ(x k )
− Φ(vi )2 + 1
(22.4)
The basic steps and flowchart of the fuzzy clustering algorithm based on the adaptive step size firefly algorithm are as follows: The basic steps of the fuzzy clustering algorithm based on the kernel function of the firefly algorithm based on adaptive step size are as follows: Step 1 Initial value of the initial algorithm, number of clusters, initial values of the firefly algorithm and the KFCM algorithm,number of clusters = 3. Step 2 Use adaptive step size to optimize the iteration of firefly migration, using formula (22.3). Step 3 Use the firefly algorithm to optimize the kernel objective function of fuzzy clustering. Step 4 Get the optimal clustering center. Step 5 Image segmentation using KFCM algorithm. Step 6 Updates the iterative initial clustering center, fuzzy membership function. Step 7 Display the returned image data.
192
Y. Hu and Z. Liu
22.3 Experimental Analysis In order to verify the feasibility of the algorithm, this paper conducts segmentation experiments on images and natural images with different degrees of noise, and compares and analyzes the FCM algorithm, KFCM algorithm, FAFCM algorithm, and the experimental results obtained by this algorithm. The experimental environment, Windows 10, 1.6 GHz CPU and 4G memory of the computer, run matlab2016a environment.
22.3.1 Noiseless Segmentation Effect Three common gray images, Lena, women, and camera, were used to observe the segmentation results without noise. The experimental results show the difference between the improved algorithm and the original algorithm (Figs. 22.1, 22.2, 22.3 and 22.4).
22.3.2 Segmentation After Adding Noise Add the appropriate amount of noise to the original image, specifically Gaussian noise, the mean is 0, the Gaussian noise with a variance of 0.004, from left to right are the three pictures of Lena, woman, camera, and the addition is an equal amount of noise. The experimental input is as follows (Figs. 22.5, 22.6, 22.7 and 22.8). Through simulation experiments, it can be noticed that the improved algorithm has a strong filtering effect on the noise region. After adding an appropriate amount of Gaussian noise in the FCM algorithm, the image segmentation effect is very poor,
Fig. 22.1 Original image
22 Adaptive Step Size of Firefly’s Fuzzy Clustering …
Fig. 22.2 FCM process
Fig. 22.3 KFCM process
Fig. 22.4 VSSFAKFCM process
193
194
Fig. 22.5 variance of 0.004 noise in Lena and camera
Fig. 22.6 FCM process
Fig. 22.7 KFCM process
Y. Hu and Z. Liu
22 Adaptive Step Size of Firefly’s Fuzzy Clustering …
195
Fig. 22.8 VSSFAKFCM process
and the difference is huge when there is no noise. The introduction of the kernel function only needs to improve the objective function in the FCM to have a good denoising effect on the image. It can also be seen from the simulation diagram in the FCM without the kernel function. There is a lot of noise, and there is still a lot of noise in the KFCM that introduces the kernel function. However, compared with the original algorithm without the kernel function, the noise is reduced. After using VSSFAKFCM, the noise of the image is very obvious. Here, we use the indicator of the separation factor to measure the segmentation effect. We use the partition coefficient Vpc [8]. It is defined by Eq. (22.5), and the larger segmentation effect is better. V pc =
c n
2 u ik /n
(22.5)
k=1 i=1
Table 22.1 Separation coefficient of image segmentation results by each method Vpc Image
Parameter
FCM
KFCM
VSSFAKFCM
Lena
V pc
0.7834
0.8191
0.9046
Camera
V pc
0.7907
0.8844
0.9067
Table 22.2 The separation factor of the segmentation result after each method adds noise to the image Vpc Image
Parameter
FCM
KFCM
VSSFAKFCM
Lena_n
V pc
0.7516
0.8055
0.8912
Camera_n
V pc
0.7803
0.854
0.8812
196
Y. Hu and Z. Liu
Tables 22.1 and 22.2 compare the parameters of V pc and the three graphs of Lena, woman, and camera, each of which is the average of 10 tests. It can be seen from Table 22.1 that the algorithm obtains the optimal index value in the segmentation of each graph, and again shows that it is superior to other comparison algorithms.
22.4 Conclusion The VSSFAKFCM algorithm proposed in this paper combines the firefly algorithm with the standard FCM algorithm. Firstly, the optimal solution of the algorithm is obtained by using the firefly algorithm. As the initial clustering center of KFCM algorithm, the membership degree and clustering center are updated circularly. Improved algorithm, the algorithm overcomes the shortcomings of the FCM algorithm which is easy to fall into the local optimal solution, and makes up for the shortcomings of the traditional FCM algorithm which are sensitive to initial values and noise points. Experiments show that the VSSFAKFCM algorithm has a strong global search capability, and the FCM algorithm is no longer sensitive to the initial partitioning, which accelerates the convergence speed and improves the clustering effect. And the algorithm has a strong filtering effect on noise. Although the algorithm solves some of the problems of the FCM algorithm, the location information of the image is not fully applied.
References 1. Zhou, L.L., Jiang, F.: Review of image segmentation methods. J. Appl. Res. Comput. 7 (2017) 2. Li, X.C., Liu, H.K., Wang, F., Bai, C.Y.: Fuzzy clustering method in image segmentation. Chin. J. Image Graph. (2012) 3. Zhang, M., Yu, J.: Fuzzy partitional clustering algorithm. J. Softw. 15(6), 858–868 (2004) 4. Hu, X.G., Yan, S.: Image segmentation algorithm based on FCM clustering. J. Comput. Eng. Des. (2018) 5. Yang, X.S.: Nature-Inspired Metaheuristic Algorithms, 1st edn. Luniver Press, Frome (2008) 6. Yang, X.S.: Firefly algorithms for multimodal optimisation. In: Watanabe, O., Zeugmann, T. (eds.), Proceedings Fifth Symposium on Stochastic Algorithms, Foundations and Applications. Lecture Notes in Computer Science, vol. 5792; pp. 169–78 (2009) 7. Yu, S.H., Zhu, S.L., Ma, Y., Mao, D.M.: A variable step size firefly algorithm for numerical optimization. Appl. Math. Comput. 263, 214–220 (2015) 8. Qu, F.H.: Research and application of a class of fuzzy clustering algorithm. Jilin University (2009)
Chapter 23
Research on QR 2-D Code Graphics Correction Algorithms Based on Morphological Expansion Closure and Edge Detection Liu Peng, Liu Wen, Li Qiang, Duan Min, Dai Yue, and Nian Yiying Abstract As a kind of widely used form of two-dimensional codes, the QR codes have been widely applied in various aspects of our life, but in the process of the use of the QR codes, the QR code images captured often suffer from deformity because the QR code label is attached onto the object irregular surface (for example, a curved surface) or the camera, and the QR code label is different in position. This paper improves the definition of images and thus, effectively improves the recognition rate by adopting the morphological closure and edge detection algorithm and effectively improving the classic correction algorithm for QR code image based on shape function as for QR code image with surface distortion.
23.1 Introduction At present, due to the convenience of QR codes, more and more types of codes appear in people’s daily lives. It is hoped that the QR code can achieve 100% of its convenience in a wide range of applications, but for various reasons, the QR code image always has an unrecognizable state during the collection process, which greatly delays the progress of the work. Based on this, since the QR code came out, the image preprocessing algorithm for QR code symbols has been the focus and difficulty of scholars at home and abroad. Vijayan Asari proposed a mathematical model using polynomial mapping to represent the mapping of distorted image space to corrected image space [1]. Similarly, Wang Kena et al. proposed BP neural network based on data from input and output images. Learn to summarize the correlation between distorted images and non-distorted images to achieve the correction of distorted QR codes [2]. Zhou [3], Liu and Dong [4], and Park et al. [5] proposed the establishment of standard grid images. The distortion model uses the deformation parameters according to the standard mesh to correct the image under the same conditions. Ohbuchi et al. [6] and L. Peng (B) · L. Wen · L. Qiang · D. Min · D. Yue · N. Yiying China Institute of Standardization, Beijing, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_23
197
198
L. Peng et al.
Ming et al. [7] used the inverse perspective transform to correct the image. Lei et al. use an elliptical arc mesh. The method is to correct the QR code image directly marked on the surface of the cylinder [8]. He et al. proposed a method based on sequence image and image stitching, and finally realized the division and distortion correction of each part of the module [9]. Chen et al. proposed a geometric preprocessing algorithm based on affine transformation correction of finder images and positioning patterns to solve image distortion caused by tilting of shooting angles [10]. The bridge method is connected to the key points of the edge contour of the two-dimensional code, and the key points are fitted into the curves of the four edges, and the corresponding points on the two-dimensional code are calculated by the calculus knowledge. In general, the linear transformation relationship is not achieved [11]. Ono et al. proposed to add recognition to the auxiliary lines of distortion and occlusion, and calculate the relationship between the positioning lines according to the iterative DP matching [12]. Shi proposed based on three-dimensional perspective transformation and ellipse. The parameters are constructed to transform the matrix to convert the 2D barcode pixel values from the 2D image plane to the 3D image space, but the curvature of the cylinder is not very large [13]. Liu Fengjie proposed to add parallel to the QR code. The auxiliary rectangle of the outer contour is corrected by two-dimensional plane, and then the three-dimensional space is corrected by the region restoration algorithm with relatively high efficiency, and finally the corrected image is obtained [14]. Kuen-Tsai proposed a mathematical model based on parametric cylindrical surface to correct QR Code image [15]. The above algorithm can effectively solve the problem of image distortion.
23.2 Identification Process of QR Code The identification process of QR code [3] usually includes three steps. Firstly, gray the color images acquired to improve the subsequent operating speed. Secondly, denoise. Cross median filtering is adopted for denoising. The main noise which interferes QR code is salt noise. Then, the gray histogram tool and the iterative method are utilized to select proper threshold. The QR code is processed by binarization to make the QR code into black bar code with white background. Finally, confirm the position detection graphics, position the code, rotate to horizontal direction, and acquire the code data for decoding in the next step.
23.2.1 Graying The graying of image refers to converting the multi-channel colorful image to singlechannel gray image. The color component in QR code images acquired by mobile
23 Research on QR 2-D Code Graphics Correction Algorithms …
199
phone has little effect on barcode recognition. Therefore, at the beginning, the QR code images acquired shall be converted to gray image.
23.2.2 Denoising There are many denoising methods. The median filtering method is selected, because median filter is a nonlinear smoothing filter. The median filter can denoise and keep the details of image. The median filter can well protect the edge of images. The shape and size of filtering window of median filter have certain influence on the filtering effect. The common median filter window shapes are usually square, cross, etc. As far as general experience is concerned, the square window is applied to the image for object with long contour which changes slowly and the cross window is applied to the image for object with sharp angle. In general, the window size is smaller than the minimum effective size in the image.
23.2.3 Binarization The binarization refers to converting the color image to the image with only two colors, black or white. There are also too many binarization methods. A common method is threshold method. Select a proper threshold to make every pixel point in object or background area so as to generate corresponding binary image. Let the original image as f (x, y). Find out the characteristic threshold in the image based on a certain criterion. Divide the image into two parts. The image after dividing is b0 , f (x, y) < T (23.1) g(x,y) = b1 , f (x, y) ≥ T where b0 is the black and b1 is the white. T is determined by calculating the median of the intensity range of the image. As for binarization of QR image, define b0 = 1 and b1 = 0, i.e., g(x,y) =
1, f (x, y) < T 0, f (x, y) ≥ T
(23.2)
Based on the sampling scope of confirmed threshold, the binarization method includes global threshold method and local threshold method. The system adopts the median method.
200
L. Peng et al.
23.2.4 Positioning QR code has three position detection graphics with same shapes. Before rotation, three position detection graphics are in upper left corner, upper right corner, and lower left corner of QR code. These three position detection graphics jointly form the image graphics.
23.3 Classic QR Code Image Correction Algorithm and Existing Problems The polyharmonic deformation may be divided into the barrel distortion and pincushion distortion. The greatest difference between them and the trapezoidal distortion is that it is impossible to restore the original image directly through calculating the interpolation of transformation matrix. Zhou Jiejing et al. intended to correct the QR code through the function deformation method of the shape function. The features of such algorithm are as follows: firstly, it is required to apply the geometric transformation to change the QR code into a square; then, it is required to rebuild the standard QR code image through the scanning sampling method, while it is not required to determine the correction method as per the image distortion. Chen Miaorong et al. used the four-node Pascal’s triangle and eight-node Pascal’s triangle to realize the correction of distortion image, which also achieves a good effect. However, some defects still exist for the effect image obtained by using the classic QR code correction algorithm based on the shape function, as shown in Figs. 23.1 and 23.2.
Fig. 23.1 Hyperbola distortion classic QR code correction algorithm procedure chart
23 Research on QR 2-D Code Graphics Correction Algorithms …
201
Fig. 23.2 Barrel distortion classic QR code correction algorithm procedure chart
As shown in Figs. 23.1 [16] and 23.2 [17], it is found that if the classic QR code correction algorithm based on the shape function is adopted, the corrected image will be featured in the poor resolution, fuzzy boundary, and other defects, and even will cause the recognition to fail again in severe cases.
23.4 QR Code Image Correction Algorithm Based on the Morphological Operation and Existing Problems Upon the morphological operation, it is firstly required to select an appropriate structural element which is used to iterate through the image and make the calculation as per the specific rules. Many structural elements are available for selection, while the difference of variable structural elements is mainly demonstrated at the shape and size. At present, the common structural elements include the roundness, cruciform, square, etc. There is one reference point inside the structural element. When the structural element is used to iterate through the image, the pixel point on the image that is coincident with the such reference point is the point to be operated. The operation of all pixel points on the image may be completed by moving the structural element. The selection of structural element is the key for the success of the morphological operation, while the structural element with the variable shapes or sizes will produce the different results. Therefore, upon the selection of structural element, it is required to make the selection as per the image features, which generally follows two rules as follows: The size of the structural element shall be smaller than the original image, while the geometric shape of the structural element shall be simpler, compared with the image to be operated.
202
L. Peng et al.
The best shape for the structural element shall be convex polygon, because the convex polygon will obtain more useful information. The common convex polygons include the roundness, cruciform, and square.
23.4.1 Binary Erosion Operation and Dilation Operation The corrosion operation may extract the key information and remove the glitch, isolated pixel, and point set smaller than the structural element, which will remove the extrusion at the edge of the image and disconnect the tiny connection in the image. The erosion operation will zoom down the boundary of the zone consisting of the nonzero pixel in the binary image. In the erosion operation, the structural element B will be used to erode A, which is written as AB. The structural element B will be moved row by row and column by column on image A. When all the pixel values on image A intersected with the structural element B are 1, the position where the reference point on the structural element B will be set as 1. Otherwise, it will be set as 0. When the structural element B is used to iterate through every pixel on the image A, and the aforesaid operation is made, the erosion effect will be obtained. The erosion operation will be defined as follows: AB = {x : B + x ⊂ A, x ∈ A} = (x|(B)x ⊆ A)
(23.3)
The dilation operation is mainly used to fill in the cavity inside the image and the concave region at the edge, so as to expand the boundary of the image. The dilation operation will dilate the boundary of the region formed by the pixels with the binary image of 1. During the process of dilation operation, the structural element B will dilate A, which is written as A ⊕ B. The binary image will be dilated as per the following steps: At first, it is required to obtain the mapping B1 of the structural element B, while it is required to move all points of B1 ’s original point on the image A. When B1 is intersected with A, the set consisting of all the points through which B1 passes will be the result of the dilation operation. The dilation operation formula is as shown below: A ⊕ B = {x|(B) x| A = ∅}
(23.4)
23.4.2 Binary Opening and Closing Operations The binary morphological erosion operation and dilation operation are the most fundamental morphological operation, based on which the morphological opening and closing operations are established. The closing operation is denoted by the symbol “•”, while the opening operation is denoted by the symbol “◦”.
23 Research on QR 2-D Code Graphics Correction Algorithms …
203
For the closing operation, it is required to firstly carry out the dilation operation to the image and then conduct the erosion operation, which is written as A • B = (A ⊕ B)C
(23.5)
The closing operation will be used to fill in the cavity that is smaller than the structural element, and the continuous and smaller disconnection. When the edge of the image is smoothed, it is required not to obviously modify the size of the image. For the opening operation, it is required to firstly carry out the erosion operation to the image and then conduct the dilation operation, which is written as A ◦ B = (AB) ⊕ C
(23.6)
The opening operation may be used to eliminate the small object and separate the object at the thin place. Meanwhile, when the boundary of the larger object is smoothed, it is required not to obviously modify its area.
23.4.3 Improved Algorithm In this section, the improved algorithm is described in detail. The block diagram of the improved algorithm is shown in Fig. 23.3. The steps are given below. 1. 2.
3.
Gray and binarize. It is required to grayscale and binarize the input image. Calculate the minimum bounding rectangle. It is required to find out the minimum bounding rectangle of the QR Code symbol and use it as the correction reference criteria. Rotate MBR to horizontal. As per the inclination angle of the minimum bounding rectangle, the bar code will be horizontally rotated by the coordinate changing equation.
Gray and binarize
Calculate minimum bounding rectangle (MBR)
Rotate MBR to horizontal
Edge detection using Canny operator
Morphological grayscale dilation and erosion
Segment mesh and reconstruct the image
Find out four vertexes of MBR
Obtain the MBR
Distortion correction via shape function
Sampling and reconstruction
Input image
Fig. 23.3 The block diagram of the improved algorithm
204
4.
L. Peng et al.
Find out four vertexes of MBR. It is required to find out four vertexes of the QR code, namely, P = {P1, P2, P3, P4}. (1) It is required to find out the center of the minimum bounding rectangle. (2) It is required to separately draw the parallel lines of axis x and y through the center, dividing the QR code symbol into four sections. For each section, it is required to separately calculate the black pixel point with the maximum distance away from the center, namely, to obtain the four vertexes of the QR code symbol (irrespective of the case when the gray value of the fourth vertex pixel is light colored).
5.
Segment mesh and reconstruct the image. It is required to make the mesh segmentation upon the corrected image and reconstruct the image. (1) It is required to find out the finder pattern, count the pixel number of the black and white module, and determine the network division ratio. (2) It is required to find out the timing pattern, calculate the pixel number of the black and white module, and obtain the modular number of each side of the QR code image. Then, it is required to count the pixel number of the black and white module, and determine the network division ratio. (3) It is required to create the mesh as per the network division ratio, sample the image pixel on each intersection of the network, and construct a bitmap where the binary digits “1” and “0” represent the dark and shallow pixels, respectively.
6.
Morphological grayscale dilation and erosion. It is required to make the morphological grayscale dilation and erosion operations upon the obtained reconstructed image, while the algorithm is as shown below: (1) It is required to binarize the input image. (2) It is required to obtain the morphological closure, firstly implement the erosion operation for 15 times and then conduct the dilation operation for 15 times.
7.
The criterion for selecting the erosion and dilation operation times in the morphological closure is to obtain the small white block morphological closure. The erosion and dilation operation times shall be the same, which may ensure that there is almost no deviation between the closure size and the QR code size. Edge detection using Canny operator. (1) Noise elimination Generally, the Gaussian filter is used for edge-preserving smoothing. (2) It is required to operate as per the steps of Sobel filter: It is required to use a pair of convolution array:
23 Research on QR 2-D Code Graphics Correction Algorithms …
205
⎡
⎡ ⎤ ⎤ −1 −2 1 −1 0 1 G y = ⎣ 0 0 0 ⎦ G x = ⎣ −2 0 2 ⎦ 1 2 1 −1 0 1
(23.7)
It is required to use the equation to calculate the amplitude and direction: G=
G 2x
+
G 2y
Gy θ = arctan Gx
(23.8)
(3) Non-maximum suppression: It is required to exclude the non-edge pixel at this step. (4) Hysteresis thresholding: During the Canny detection process, it is required to use two thresholds, namely, one high threshold and one low threshold, so as to judge the edge. If the pixel value at a given point exceeds the high threshold, the pixel of such point will be kept as the edge pixel. If the pixel value at a given point is less than the low threshold, such point will be excluded. If a pixel is between such two thresholds, such pixel will be kept only when such pixel is connected to another pixel exceeding the high threshold. 8. Obtain the MBR. It is required to obtain the minimum bounding rectangle. 9. Distortion correction via shape function. It is required to carry out the distortion correction by using the shape function. (1) It is required to move the four vertexes of the QR symbol to the four vertexes of the minimum bounding rectangle by using the 2D four-node Pascal’s triangle, and select the Pascal’s triangle element shape function. (2) It is required to find out the center of each size of the shifted QR code image, namely, M = {M1, M2, M3, M4}. (3) It is required to use the center M of each side and four vertexes P to the rectangle by using the 2D eight-node Pascal’s triangle and select the Pascal’s triangle element shape function. (4) After the transformation of the shape function, an erosion operation is required, so as to fill in the gap resulted from the transformation and tensile of the shape function. 10. Sampling and reconstruction.
23.5 Testing Results The test is made for the algorithm procedure proposed in this paper and prepared by using the Matlab software on the PC (intel 4-processor 2.8 GHZ, 4G memory). The qualitative evaluation and quantitative evaluation are given in follow subsections.
206
L. Peng et al.
Fig. 23.4. Results obtained after the grayscale and binarization operation. Left is the original distorted QR code, the right is the processed image
Fig. 23.5 Morphological closure obtained after the erosion and dilation operations
23.5.1 Qualitative Evaluation The qualitative testing result is given in this section. The simulation example figures at variable stages are given below (the simulation adopts the python language, PyCharm development tool, and OpenCV library) (see Figs. 23.4, 23.5, 23.6, 23.7, and 23.8) [17].
23.5.2 Qualitative Evaluation To compare the proposed method to other existing method, 50 2D QR code images are collected. Among the 50 images, 20 images are with normal condition, 15 images are with hyperbola distortion, and 15 images are with barrel distortion. The baseline methods to compare include the default algorithm of ZXing (“Zebra Crossing”)
23 Research on QR 2-D Code Graphics Correction Algorithms …
Fig. 23.6 Obtained bounding rectangle
Fig. 23.7 Results obtained after the transformation of shape function
Fig. 23.8 Filling operation made for the results of shape function transformation
207
208 Table 23.1 Recognition rate of the proposed method and compared methods
L. Peng et al. Method
Correct recognized/Total
Recognition rate (%)
Ours
45/50
90
ZXing
20/50
40
WeChat
30/50
60
barcode scanning library and 2D code scanner tool of WeChat. The previous is an open-source library which are publicly available online and is referred as ZXing. The latter is an application developed by Tencent and is referred as WeChat. Quantitative evaluation about the accuracy enhancement of the 2D QR code recognition, compared to other existing algorithms, is given in Table 23.1. It is shown that the proposed method outperforms the other methods with a large margin.
23.6 Conclusion This paper proposes an improved method to identify QR code image with surface distortion. In the proposed method, edge detection and morphological operation are first used to enhance the contrast of the image, and then the shape function is utilized to correct the distortion including hyperbola distortion and barrel distortion, and finally, standard barcode scanning process is used to identify the information of QR code. Experiments on collected QR code image demonstrate the effectiveness of the proposed method. Acknowledgements The paper was supported by “The Fundamental Research Funds for the Central Universities” 562017Y-5285 Research on Construction and Key Technologies of Traceability System for Important Products.
References 1. Vijayan Asari, K., Kumar, S., Radhakrishnan, D.: A new approach for nonlinear distortion correction in endoscopic images based on least squares estimation. IEEE Trans. Med. Imaging 18(4), 345–354 (1999) 2. Ke-na, W. A. N. G., Beiji, Z., & Wenmei, H.: A distorted image correction method based on neural networks. J. Image Graph (2005) 3. Zhou, J.: Study on two dimensional barcode recognition in inartificial conditions. Master’s thesis, Xi’an Technology University (2003) 4. Liu, T.-Y., Dong, A.-H.: Precisely correct radial and oblique distortion of camera image. J. Image Graph. 12(10), 1935–1938 (2007) 5. Park, J., Byun, S.C., Lee, B.U.: Lens distortion correction using ideal image coordinates. IEEE Trans. Consum. Electron. 55(3), 987–991 (2009) 6. Ohbuchi E, Hanaizumi H, Hock L A. Barcode readers using the camera device in mobile phones. In: 2004 International Conference on Cyberworlds. IEEE (2004)
23 Research on QR 2-D Code Graphics Correction Algorithms …
209
7. Ming, A., Ma, H., Zhao, Q.: Correction technique to defocused and distorted QR barcode images. J. Comput.-Aided Des. Comput. Graph. 19, 1080–1084 (2007) 8. Lei, L., He, W., Zhang, W.: Distortion correction of data matrix symbol directly marked on cylinder surface. In: International Conference on Artificial Intelligence & Computational Intelligence. IEEE (2010) 9. He, W.P., Lin, Q.S., Wang, W.: Research on cylinder data matrix barcode recognition. In: 2012 AASRI Conference on Modeling, Identification and Control, vol. 3, pp. 319–327 (2012) 10. Chen, W., Yang, G., Zhang, G.: A simple and efficient image preprocessing for QR decode. In: The 2nd International Conference on Electronic & Mechanical Engineering and Information Technology, pp. 234–238 (2012) 11. Xu, Y.: Research and implementation of recognition algorithms for sketchy curve twodimensional codes. Shanghai Fudan University (2013) 12. Ono, S., Kawakami, Y., Kawasaki, H.: A two-dimensional barcode with robust decoding against distortion and occlusion for automatic recognition of garbage bags. In: International Conference on Pattern Recognition. IEEE (2014) 13. Shi, Z.: Cylindrical QR code recognition method based on three-dimensional perspective transformation. Mod. Electron. Technol. 37(8) (2014) 14. Fengjie, L., Ming, C.: Recognition and realization of cylindrical QR code. Comput. Mod. 2, 110–112 (2015) 15. Lay, K.T., Wang, L.J., Wang, C.H.: Rectification of QR-code images using the parametric cylindrical surface model. In: International Symposium on Next-Generation Electronics. IEEE (2015) 16. Chen, M.R.: Research on QR bar code recognition method of agricultural product individual traceability based on android system. Zhejiang University (2013) 17. Liu, P., Duan, M., Liu, W., Wang, Y., Dai, Y.: Research on the graphic correction technology based on morphological dilation and form function QR codes. In: 2016 International Conference on Network and Information Systems for Computers (ICNISC). IEEE (2016)
Chapter 24
Some New Hermitian Self-orthogonal Codes Constructed on Quaternary Filed Hongying Jiao, Jinguo Zhang, and Miaohua Liu
Abstract The self-orthogonal codes are linear codes, which are the basis for constructing quantum codes. In this paper, Hermitian self-orthogonal codes are studied with dual distance of 3 over the field F4 . For two types of codes length n, by using the method of recursion and combination, quaternary self-orthogonal codes are constructed and dual distance is 3. Quaternary quantum codes are obtained by CSS structure method, which have good parameters.
24.1 Introduction The theory of quantum error-correcting code is a branch of quantum information science, which can fight against decoherence in quantum communication systems and quantum computation. It has become a foundation for implementing quantum computation and quantum information processing. Communication security has attracted the attention of various departments, especially in today’s society with a high degree of information, communication security is closely related to everyone, e.g., 3D data cryptography and security protocol, high-performance computer for 3D data. Since 1995–1996, on account of the outstanding work of Shor [1] and Steane [2], many researchers are working on quantum error-correcting codes. The better parameter codes are constructed through self-orthogonal codes. In the development of the past, the primary method is by using Calderbank–Shor–Steane (CSS) construction method [3] and Steane construction method [4] through binary classical codes. In the literature [3, 5], Calderbank et al. discovered that the classical self-orthogonal code under inner product can construct binary quantum stable error-correcting code. H. Jiao (B) · M. Liu School of Science, Air Force Engineering University, Xi’an 710051, China e-mail: [email protected] H. Jiao School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, Shaanxi, China J. Zhang School of Mathematics, Jiangxi Normal University, Nanchang 330022, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_24
211
212
H. Jiao et al.
Later, many quantum error-correcting codes have been constructed by the classical linear self-orthogonal code that regards Euclidean inner product or Hermitian inner product, see literature [4, 6–9]. The study of quantum error-correcting codes in F4 is the first in the literature [5]. In [10], many self-orthogonal codes [24, 11, 8]4 and dual codes [24, 13, 7]4 were found. Thus, the quantum code [[24, 2, 7]]4 is obtained. However, the research results of quaternary quantum codes are relatively less than that of binary quantum codes. In order to study quaternary quantum codes, it is necessary to study self-orthogonal codes in quaternary fields.
24.2 Preliminaries In this paper, let Galois field and four elements be F4 = {0, 1, 2, 3}, where 1 + 2 = 3 = 22 , 23 = 1, and the conjugate of the elements x¯ = x 2 . Codes over F4 are often called quaternary. Let F4n be the n-dimensional row vector space over quaternary fields. A C = [n, k]4 is k-dimensional subspace of F4n and length n; it is called quaternary linear code. For x ∈ F4n , the nonzero coordinate number of x is called Hamming weight ωt (x). For x, y ∈ F4n , the Hamming distance p(x, y) is defined by p(x, y) = ωt (x − y). If P(C) = min{ p(x, y)|x, y ∈ C, x = y}, then it is called the minimum distance of linear code C. Then C is denoted as C = [n, k, d]4 . Two vectors x = (t1 , t2 , . . . , tn ) and y = (u 1 , u 2 , . . . , u n ) from Fqn are defined by (x, y) H = t1 u 1 + t2 u 2 + · · · + tn u n , √ q
where u¯ i = u i for u i ∈ Fq , it is called Hermitian inner product. Especially, for q = 4, (x, y) H = t1 u 21 + t2 u 22 + · · · + tn u 2n . In this paper, we only considered the inner product. If (x, y) H = 0, then it is called orthogonal, for an code C. If C ⊥H = {x ∈ F4n |(x, y) H = 0, ∀y ∈ C} of code [n, k]4 , then C ⊥H is called dual code of C, of and C ⊥H = [n, n − k]4 . If C ⊆ C ⊥H then C is called a Hermitian self-orthogonal code. Theorem 24.1 (CSS construction) A classical linear code C = [n, k, d]q and C ⊇ C ⊥H , then the quantum stabilizer code is existence, and its parameters are [[n, 2k − n, d]]q .
24 Some New Hermitian Self-orthogonal Codes Constructed …
213
Let 1n , 0n , and 0m×n be the all-ones vector, the zero vector of length n, and the zero matrix of size m × n, respectively.
24.3 Main Conclusions Let C = [n, k]4 be the linear code of F4 , G be the generated matrix. Everybody knows that a code C is called Hermitian self-orthogonal iff G · G H = 0. Next, we use the first column of vectors to construct the generated matrix of the Hermitian self-orthogonal code. Definition 24.1 Let α be a column of vectors of F4 if the first nonzero component of α is called the first column vector. For k ≥ 2, denoted by Nk = 4 3−1 , we know that Nk+1 , Nk and Nk−1 have relationships: Nk+1 = 4Nk + 1 = 16Nk−1 + 5. The number of vectors in the first column is exactly Nk of F4 . Next, we recursively construct the generating matrix G k,Nk of the upper k dimension Simplex code with the first column vectors: Let G 2,5 be the following matrix: k
G 2,5 =
11110 . 01231
For k = 3, we have G 3,21 as G 3,21 = where S3,0 =
05 G 3,16 = ( S3,0 S3,1 ), G 2,5
05 G 2,4 G 2,4 G 2,4 G 2,4 , S3,1 = G 3,16 , and G 3,16 = . G 2,5 04 14 2 · 14 3 · 14
For k = 4, we have 05 016 S3,1 S3,1 S3,1 S3,1 02,5 016 G 4,64 = G 4,85 = G 2,5 G 3,16 S3,0 S3,1 016 116 2 · 116 3 · 116 = S4,0 S4,1 S4,2 S4,3 S4,4 S4,5 . H = 0. Easy to check S4,i · S4,i Next, for k ≥ 3, we construct it recursively G k+1,Nk+1 :
G k+1,Nk+1 = Sk+1,0 Sk+1,1 Sk+1,2 · · · Sk+1,Nk−1 ,
214
H. Jiao et al.
where G k,Nk = Sk,0 Sk,1 Sk,2 · · · Sk,Nk−2 . It is obvious that we can construct selforthogonal code with length of n = 5 + 16m and n = 16m using the submatrix of G k,Nk . Theorem 24.2 For any dimensions greater than 3, and length n is between Nk−1 and Nk , if n = 5 + 16m or n = 16m, it is an [n, k]4 Hermitian self-orthogonal code with dual distance of 3. 05 G 3,16 = ( S3,0 S3,1 ), it is easy to know that Proof For k = 3, G 3,21 = G 2,5 H G 3,21 · G 3,21 = 0. For k = 4, by S4,1 and S4,i of G 4,85 , we can construct G 4,32 , G 4,48 , G 4,64 , G 4,80 , then delete, respectively, G 4,32 , G 4,48 from G 4,85 , and we get G 4,53 and G 4,37 . It is easy to verify that the above matrix multiplied by its conjugate transpose matrix is 0, so the theorem is true for k = 4. We are going to generalize in terms of k, for k ≤ s(s ≥ 4). Now, we proved the theorem for k = s + 1. Let Ms = Ns − 5; we have Ms = 4Ms−1 + 16, for n = 16m, (Ms /16 ≤ m ≤ Ms+1 /16 ). H = 0, submatrix G s,Ms of G s,Ns is also the same Assuming that k = s, G s,Ns · G s,N s H G s,Ms · G s,Ms = 0. H = 0. For k = s + 1, G s,Ms = Ss,1 , Ss,2 , Ss,3 , . . . , Ss, Ms−1 +1 , where Ss,i · Ss,i 4 016 Ss,1 · · · Ss,Ms Ss,1 · · · Ss,Ms So G s+1,Ms+1 = Ss,1 016 · · · 016 116 · · · 116 Ss,1 · · · Ss,Ms Ss,1 · · · Ss,Ms 2 · 116 · · · 2 · 116 3 · 116 · · · 3 · 116 H = 0, any two columns in G s+1,Ns+1 are linearly independent Due to the Ss,i · Ss,i and easy to check
016 Ss,1
H H Ss,1 016 Ss,1 = 0, = 0, · · Ss,1 016 016
and
Ss,2 016
H H Ss,2 Ss,Ms Ss,Ms · · = 0, . . . , = 0, 016 3 · 116 3 · 116
H hence, we have G s+1,Ms+1 · G s+1,M = 0. There is an G s+1,Ns+1 Hermitian selfs+1 orthogonal generating matrix. So the theorem holds. Quaternary quantum codes with a distance of 3 directly by using CSS method can be constructed on Fq .
24 Some New Hermitian Self-orthogonal Codes Constructed …
215
Theorem 2.2 For ∀k ≥ 3, and Nk−1 ≤ n ≤ Nk if n = 16m or n = 16m + 5, it is an [[n, n − 2k]]4 quantum code with distance three.
24.4 Conclusions In this paper, a class of linear self-orthogonal codes with dual distance of 3 on quaternion domain is constructed by CSS construction method. Simultaneously, quaternary quantum codes are obtained by CSS structure method, which have good parameters. The construction of quantum code in quaternion domain provides theoretical support for 3D data cryptography, security protocols, and 3D data of high-performance computers.
References 1. Shor, P.W.: Scheme for reducing decoherence in quantum computer memory. Phys. Rev. A 52, 2493–2496 (1995) 2. Steane, A.M.: Error correcting codes in quantum theory. Phys. Rev. Lett. 77, 793–797 (1996) 3. Calderbank, A.R., Rains, E.M., Shor, P.W., Sloane, N.J.A.: Quantum error correction and orthogonal geometry. Phys. Rev. Lett. 78, 405–409 (1997) 4. Steane, A.M.: Enlargement of Calderbank-Shor-Steane quantum codes. IEEE Trans. Inf. Theory 45, 2492–2495 (1999) 5. Calderbank, A.R., Rains, E.M., Shor, P.W., Sloane, N.J.A.: Quantum error correction via codes over GF(4). IEEE Trans. Inform. Theory 1369–1387 (1998) 6. Aly, S.A., Klappenecker, A., Sarvepalli, P.K.: On quantum and classical BCH codes. IEEE Trans. Inf. Theory 53(3), 1183–1188 (2007) 7. Kim, J.L.: New quantum error-correcting codes from hermitian self-orthogonal codes over GF (4). Journal. In: Proceedings of the sixth international conference on Finite fields and applications, Oaxaca, Mexico, pp. 21–25 (2001). Springer, pp. 209–213 (2002) 8. Liang, F.: Self-orthogonal codes with dual distance three and quantum codes with distance three over. Quantum Inf. Process. 12, 3617–3623 (2013) 9. Chen, G., Li, R.: Ternary self-orthogonal codes of dual distance three and ternary quantum codes of distance three. Des. Codes Cryptogr. 69, 53–63 (2013) 10. Ketkar, A., Klappenecker, A., Kumar, S.: Nonbinary stablizer codes over finite fields. IEEE Trans. Inf. Theory 52, 4892–4914 (2006)
Chapter 25
Graphic Data Analysis Model Oriented to Knowledge Base of Power Grid Data Center Liang Zhu, Lin Qiao, Li Bo Xu and Bi Qi Liu
Abstract Power grid data center is an important component of the power grid, and the knowledge base is an accurate analysis of the relationship between the operation and maintenance of the power grid data. The knowledge base is the cornerstone of the operation and maintenance work of the data center, and is also the key point to ensure the core performance of the data center. Therefore, based on graph database technology, the knowledge base analysis model faces the characteristics of the power grid business established through distinguish, control and maintenance. It is aimed to accurately manage the power grid’s IT resources, to achieve efficient control and management of changing IT infrastructure, to meet the needs of IT services, agile, efficient, accurate, and to ensure flexible response to the development of State Grid business.
25.1 Introduction With the increasing scale of IT resources and the increasing complexity of business application architecture, it is very important to find information about the relationship between automated resource allocation and maintenance data [1]. Using the automatic discovery of resource allocation, automatic collection and mining of information system object information, with highly unified planning, management, storage of resources, attributes, relationships, and other configuration information at the enterprise level. The operation and maintenance teams can accurately and timely grasp the dynamic resources of the data center, and have more authenticity in monitoring and early warning, fault analysis and disposal, change control, risk assessment, work planning and also in other fields. So the data analysis has value to improve the degree L. Zhu (B) Global Energy Interconnection Research Institute Co. Ltd., Nanjing, Jiangsu, China e-mail: [email protected] State Grid Key Laboratory of Information & Network Security, Nanjing, Jiangsu, China L. Qiao · L. Bo Xu · B. Qi Liu State Grid Liaoning Information and Communication Branch Company, Liaoning, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_25
217
218
L. Zhu et al.
of information management, the capability of operation and maintenance support in the application of knowledge base. Compared to the other databases, it requires less memory, runs faster, updates data more easily, and expresses more intuitively. The algorithm also improves the computing efficiency of the knowledge base on the scale of millions or even billions of nodes. In this paper, many applications of a graph database in the data center knowledge base are studied, and the association rule method for data management of the state grid’s database operation is studied. The business logic of data center requirement and the modeling language of graph database are closely linked. Through the rules of point and edge designed by the modeling language, various data analysis models are formed according to the rules and relations, and graph data are used. This paper is trying to use the 3D graph database technology to further improve the query performance of the data center knowledge base.
25.2 Data Center Knowledge Base Data Center Knowledge Base (DCKB) refers to the set of rules applied in expert system design, which contains the facts and data associated with the rules, and all of them constitute a knowledge base. Knowledge base is based on an expert system and has intelligence [2]. Not all intelligent programs have a knowledge base, only knowledge-based systems have a knowledge base. The knowledge base structure is shown in the Fig. 25.1. (1) The knowledge data in the knowledge base is reconstructed into an organized form with convenient and regular structure, according to their respective application direction characteristics, background characteristics (information obtained at the time), use characteristics, and attribute characteristics [3]. The pieces of knowledge are basically modular. (2) The knowledge base is hierarchical. The most basic is “fact knowledge”, in the middle is used to control the “fact” language (with rules, process, and other information); The final level is the “strategy”, which takes the intermediate Fig. 25.1 Knowledge base structure
25 Graphic Data Analysis Model Oriented to Knowledge …
219
knowledge information as the control object. Policies are also often thought of as further rules of the rules. So the general structure of the knowledge base is a hierarchy, which is determined by the nature of its knowledge itself. In the knowledge base, there are usually interdependent relationships between knowledge. Rules are the most typical and commonly used pieces of knowledge. (3) There is a special form of knowledge in the knowledge base that not only belongs to a certain level (or exists at any level)—credibility (or trust, confidence measure, etc.). For a certain problem, relevant facts, rules, and strategies can be labeled as credible. Thus, an augmented knowledge base is formed. There is no uncertainty measure in the database. Because everything in the database processing belongs to the “deterministic” type. (4) There may also be a special part of the knowledge base that is commonly referred to as the typical method library. If the solution to some problems is positive and inevitable, it can be stored directly in the typical method library as part of the fairly positive solution. Storage in this sense would constitute another form of knowledge base. In using this form, machine learning and reasoning will be used only to select a specific part of a particular method library [4].
25.3 Graph Data Analysis Model 25.3.1 Data Model Data is the record of the information of the objectively existing things, and the model is the expression of the rules of the world. The data analysis model described in this paper is an expression of various characteristics of data [5]. The data analysis model summarizes various characteristic values of the whole system from the overall structure. The main contents of the data analysis model include analysis structure, data action, and data relation. (1) Analysis structure: the specific type, information, and category of the analysis model, as well as the correlation relationship between them, are elaborated. (2) Data action: the data action in the data analysis model is the action mode between data, including various data operation actions, and the selected analysis structure can be modified by various actions [6]. (3) Data relationship: the data relationship is mainly the contact information between data in the data analysis model, including the coexistence relationship between data. The data relationship is dynamic.
220
L. Zhu et al.
25.3.2 Clustering Model for Running Data of IT Infrastructure The first step of constructing the knowledge base of association rules for running data of IT infrastructure in the data center is to process and classify the increasingly complex and huge data of system operation such as alarm, fault, log, performance, etc. and construct a date set for running data of IT infrastructure. The original disordered data such as alarm, fault, log, and performance are labeled and converted into labels. Signature data. When transforming the data center running source data into system event data, it is necessary to use the category attributes in the running source data as the category of corresponding system event data. Since the source data of the system has no category attribute, it is necessary to label the source data of the original information system, such as alarm, fault, log, performance, and so on first, and then convert the source data of the system into corresponding system events based on the category of the data. The classification labeling of system operation data is divided into two processes: clustering and classification [7]. The clustering process completes the task of extracting the features of information system operation data categories and constructing the knowledge base of category features. The classification process completes the task of matching and labeling the running data without class labeling according to the class feature knowledge base. According to the category feature knowledge base, the alarm, fault, log, and performance data can be accurately classified, and the operation data clustering model of IT infrastructure can be constructed to prepare for the next construction of association rule knowledge base.
25.3.3 Design and Analysis Model of Graphic Database Rules in Association Rules Knowledge Base Aiming at the research and application of graph database technology in the data center, this paper studies the method of graph database modeling for power grid data center operation data management [8]. The business logic of data center requirement and the modeling language of graph database are closely linked. By designing point and edge rules of the modeling language, various data analysis models are formed according to rules and relationships. The characteristics of dynamic real-time update further improve the data query performance of the data center. As shown in Fig. 25.2, through the analysis of the basic software system, operating system, computer system, network system, and other aspects such as alarm, fault, log, and performance data in the dynamic source database, this paper studies the inherent time sequence and causality of data information, finds the correlation, frequent patterns and causality among data, and studies Apriori algorithm, FP-Growth algorithm, and other related aspects. The association analysis algorithm selects the algorithm suitable for the national network to mine the association relationship. Then,
25 Graphic Data Analysis Model Oriented to Knowledge …
221
Fig. 25.2 Flow chart of knowledge base construction
the association rules discovered by manual verification analysis are filtered, and the knowledge base of the association rules is constructed and updated. The knowledge base covers three layers from bottom to top: the factual knowledge, the rules, and the strategies [9]. (1) Analysis of Knowledge Base Rules By extracting the collected data such as alarms, faults, logs, performance, and so on, using the time series and causal method to mine association rules from the complex and huge multi-source operation data, a knowledge base of association rules for system events can be constructed, which will be used for source fault location and fault prediction. (2) Analysis of Knowledge Base Strategy The highest level of the knowledge base is “strategy”, which is also often regarded as the rule of rules. For example, the mature experience of solving some faults, which takes the middle level knowledge as the control object, fills the knowledge base with the experience of effective fault location and fault prediction, enriches and updates the knowledge base constantly, so that the operators and maintenance personnel can directly use the similar fault experience.
222
L. Zhu et al.
25.4 Conclusion This paper studies the formation of the knowledge base in the power grid data center, using the thought of chart data graphical, to deal with the issues of power grid data center knowledge base management. And it uses 3D graphic database analysis modeling to model the power grid knowledge based on the complex multiple correlation models. So graph database can deal with the advantage of the vast amounts of complex multiple correlation data. Because of the advantages of graph database technology in dealing with massive data, compared with traditional database technology, it can greatly improve the efficiency of the database. Also, it provides a reference for further research on the application of graph data technology at home and abroad. Acknowledgements First of all, I would like to thank my good colleague Xu Min, who has provided me with a lot of help in the writing of this paper, so that I can better understand the application of graph data technology in knowledge base. In addition, I would like to thank my company for giving me the basic conditions for continuous research. This paper was supported by the science and technology project in State Grid Corporation of china, which name is ‘Research on Key Technologies of Data Center IT Infrastructure Dynamic Perception and Operation Optimization’.
References 1. Wood, A.J., Wollenberg, B.F.: Power Generation Operation and Control, 2nd edn. Wiley, New York (1996) 2. Chen, X., Wang, Y., Yang, H.: NICSLU: an adaptive sparse matrix solver for parallel circuit simulation 32(2), 261–274 (2013) 3. Pothen, A., Toledo, S.: Elimination structures in scientific computing 1.1. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Structures and Applications. Chapman and Hall/CRC (2001) 4. Booth, J.D., Thornquist, H.K.: Basker: a threaded sparse LU factorization utilizing hierarchical parallelism and data layouts (2016) 5. Chen, Y., Davis, T.A., Hager, W.W., Rajamanickam, S.: Algorithm 887: CHOLMOD, supernodal sparse cholesky factorization and update/downdate. ACM Trans. Math. Softw. 35(3), 22:1–22:14 (2008) 6. Schweppe, F., Wildes, J.: Power system static-state estimation, part I: exact model. IEEE Trans. Power Appar. Syst. PAS-89, 120–125 (1970) 7. Xie, L., Choi, D.-H., Kar, S., Poor, H.V.: Fully distributed state estimation for wide-area monitoring systems. IEEE Trans. Smart Grid 3(3), 1154–1169 (2012) 8. Abur, A., Tapadiya, P.: Parallel state estimation using multiprocessors. Electr. Power Syst. Res. 18, 67–73 (1990) 9. Chen, Y., Jin, S., Rice, M., Huang, Z.: Parallel state estimation assessment with practical data. In: Proceedings of IEEE PES GM, Vancouver, BC, Canada (2013)
Chapter 26
Eyeball Image Registration and Fusion Based on SIFT+RANSAC Algorithm Anqi Liu
Abstract At present, there are still many problems: the related technologies for registration and fusion of medical images are not mature enough, and the time for feature extraction and matching is too long; the matching points are prone to redundancy, and the image fusion has gaps. Or it may lead to blurring and other phenomena. During the medical image analysis, it is necessary to put together several images of the same patient for analysis, thereby obtaining comprehensive information of the patient in various aspects and improving the level of medical diagnosis and treatment. To quantitatively analyze several different images, we must first solve the strict alignment problem of these images. This is what we call the registration of images. Image fusion technology combines various images to display their own information on the same image, providing multi-data and multi-information images for clinical medical diagnosis. This becomes an application-critical technology, and accurate and efficient image matching criterion is a key and difficult point. Therefore, image registration and fusion technology are of great significance both in computer vision and in clinical medical diagnosis. Through the retrieval of existing related technologies, this paper used a medical image registration and fusion method based on scale-invariant feature transform (SIFT) and random sample consensus (RANSAC) algorithm, which is used to solve the following technical problems of medical image registration and fusion: feature points’ extraction and matching time is too long; the matching correction problem; and the fusion of the image is prone to gaps or blurs. The method of this paper is suitable for clinically detecting the disease information of the same patient, and more valuable information can be obtained through registration and fusion.
A. Liu (B) School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_26
223
224
A. Liu
26.1 Introduction The main task of medical image registration is to make the corresponding points of the two images absolutely consistent in anatomical structure and spatial position by calculating a certain spatial transformation. The target of registration is to make all the images on the template image and the test image. Anatomical points, or at least all of the key points of diagnostic significance and surgical area, can be optimally matched [1]. In recent years, the image registration method based on feature extraction [2] has achieved rapid development. Compared with the region-based image registration algorithm [3], the feature-based image registration method has more powerful distinguishing ability and can achieve registration in near real time in the case of image viewing angle change, noise, or even occlusion. Image registration is a typical problem and technical difficulty in the field of image processing research. Its purpose is to compare or fuse images acquired under different conditions for the same object. For example, images maybe from different acquisition devices, time and the shooting angles, etc., sometimes require image registration problems for different objects. Specifically, for two images in a set of image datasets, one image is mapped to another image by finding a method of spatial transformation, so that the points corresponding to the same position in the two images are in one-to-one correspondence, thereby achieving the purpose of image information fusion [4]. Zahra Hossein-Nejad and others proposed a mean-based adaptive RANSAC. In this method, the result of experiments confirms the superiority of the proposed methods in comparison with classic ones in terms of mismatches ratio, total number of matching, and the true positive rate evaluation criteria [5]. This paper used a method of medical image registration and fusion based on SIFT [6] and RANSAC algorithm [5], which is used to solve the following technical problems of medical image registration and fusion: the time of feature point extraction and matching are too long, the problem of matching correction, and the fusion of the image is prone to gaps or blurs.
26.2 Method The SIFT algorithm is widely used in the extraction of image features [7], which has scale invariance, illumination invariance, and rotation invariance. The RANSAC algorithm is used to calculate the outliers contained in a set of data. In our method, it is used to eliminate the wrong matching points, thus improving the correct rate of matching. In medical image registration, the RANSAC algorithm is used to eliminate erroneous matching points [8]. The original SIFT algorithm is easy to generate false matching points. It is not enough to set the contrast threshold and the matching points removed by the Hessian matrix. The RANSAC algorithm can better remove the points in false matching.
26 Eyeball Image Registration and Fusion Based …
225
26.2.1 The SIFT Algorithm The Gaussian scale space of the image is defined as a function L(x, y, σ ), produced by a variable-scale Gaussian convolution G(x, y, σ ) whose output image is I (x, y), that is, L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y). In the two-dimensional space and scale space where the image is located, it is necessary to find feature points with good uniqueness and stability. Therefore, the difference of Gaussian (DoG) operator is defined. The DoG operator is an approximation of the normalized Laplacian of Gaussian (LoG) operator. It can be obtained by the DoG image. The more ideal feature points, that is, the feature points with constant dimensions. It maybe contains more specially features which help us get more informations from the images. The feature point is the local extremum point, and it is in the Gaussian scale space. The process of finding the feature point is the process of finding the extremum point. The SIFT algorithm removes unstable points by setting the contrast threshold and the Hessian matrix.
26.2.2 Image Registration When the SIFT feature vectors of the two images are generated, the Euclidean distance of the feature point feature vectors on the two images is calculated as the similarity determination metric. The feature points with the smallest distance are used as the initially matching points, and according to the nearest neighbor, the nearest neighbor’s ratio of the distance is less than a certain threshold T. According to the threshold T, we can determine the matching point we need. Image registration is achieved by connecting the reference image and the matching points in the moving image with lines. Regarding the selection of the proportional threshold, if this threshold is lowered, the number of SIFT matching points will decrease, but it is more stable. If this threshold is increased, the number of SIFT matching points will increase, and the number of corresponding mismatch points will increase. Our goal is to choose a relatively stable number of matching points and determine the required proportional threshold. The optimal threshold calculated by the SIFT algorithm is 0.6. However, there are still many wrong match pairs for our eye dataset.
26.2.3 RANSAC Algorithm and Our Method From the test image to the reference image, randomly select four pairs of points to get the homography matrix H, and then the feature points in the test image are projected to the reference image by the matrix H, and detected the distance between the accurately features in the reference image and the points in the test image. First,
226
A. Liu
Fig. 26.1 The flow of the entire experiment
set a threshold: the feature points in the test image and the reference image meet a linear regression relationship through the homography matrix H, and the sum of the distances from the wrong matching point to the linear regression line is the determined threshold value. If it is less than the setting threshold, it is judged as an inner point; and if the number of internal points is larger than the setting threshold, it is judged as an available homography matrix, and then all the inner points are used again to calculating the homography matrix. The linear weighting is performed according to the homography matrix, the relationship between the two images is obtained by the linear weighting function, and finally the two images are merged into one image. The fused image is then modified. In order to make the fusion effect of the two images look more natural, the brightness or color adjustment of one of the reference images and the test image is performed with color or brightness information corresponding to the inner point. The adjusted image is the final result. Figure 26.1 shows the flow of the entire experiment.
26.3 Experiment and Analysis This paper used MATLAB software to process experimental data. The image dataset [9] consists of 129 retinal images, forming 134 image pairs. The image dataset contains: the retinal images are arranged in pairs. Figure 26.2 is experiment data and contains (a) and (b): the X of the reference image is (a) and the test image is (b). We have randomly selected one set of data for our experimentation, and we adjust the pixel value of the images to 512 × 512 (Fig. 26.2).
26.3.1 Parameter Settings The main parameters of the method of SIFT-RANSAC algorithm in this paper are as follows: The feature points of SIFT take 4 × 4 sub-regions centered on the key points in their scale space, and each feature point has eight directions. Therefore, there are 4 × 4 × 8 = 128 data, that is, a 128-dimensional SIFT feature vector. The ratio of the Euclidean distance of the nearest neighbor and the nearest neighbor of the matching point is determined to be less than a certain ratio threshold T. The value
26 Eyeball Image Registration and Fusion Based …
227
Fig. 26.2 Experimental data
of T is 0.6. The erroneous matching is applied to the RANSAC algorithm, and the candidate image is randomly selected from the moving image to the reference image. We obtain the robust Homography by RANSAC algorithm and obtain the projection transformation matrix through the homography matrix for image fusion.
26.3.2 Experiment In this paper, we have randomly selected one set of data, and experiments are performed according to the set parameters and compared with the results of the traditional SIFT algorithm. The traditional SIFT algorithm processes a set of data for 0.32 s and finds 32 matching points. As can be seen from Fig. 26.1, the number of incorrect matching pairs is large. The matching result of the method in this paper is shown in Fig. 26.3. We found 48 feature points in the reference image and found 56 feature points in the moving image. The feature points are points that conform to scale invariance, illumination invariance, and rotation invariance. Moreover, there are 15 matching pairs. According to the RANSAC algorithm, we calculated the homography matrix as 0.6610 0.1398 51.3181 H = −0.0496 1.0031 4.1505 −0.0005 0.0004 1.0000 which is linearly weighted according to the homography matrix and get the relationship between the reference image and the test image. Finally, the two images are merged into one image, and the fusion result is shown in Fig. 26.4. In order to get the
228
A. Liu
Fig. 26.3 The matching result of the method
Fig. 26.4 The fusion result
better result, we adjust the brightness or color of one of the images with the corresponding inner point,and the adjusted image is the final result, as shown in Fig. 26.5. Obviously, the fusion effect of the adjusted image is better than before. The sum of the time required for feature extraction, registration, removal of mismatch, fusion, and adjustment results is 3.37 s.
26.4 Conclusions The method in this paper can quickly and effectively realize the complete process for feature extraction, matching, fusion, and correction of medical images. Image features not only have scale invariance and rotation invariance. Moreover, the method
26 Eyeball Image Registration and Fusion Based …
229
Fig. 26.5 The final result
of this paper is suitable for clinically detecting the disease information of the same patient, and more valuable information can be obtained through registration and fusion.
References 1. Zhao, H., Ming, L., Lingbin, B., et al.: Medical image registration based on feature points and Renyi mutual information. Chin. J. Comput. 38(6), 1212–1220 (2015) 2. Lan, S., Guo, Z., You, J.: Non-rigid medical image registration using image field in Demons algorithm. Pattern Recognit. Lett. 125, 98–104 (2019) 3. Hu, S.-B., Shao, P.: Improved nearest neighbor interpolators based on confidence region in medical image registration. Biomed. Signal Process. Control. 7, 525–536 (2012) 4. Jiao, D., Li, W., Ke, L., et al.: An overview of multi-modal medical image fusion. Neurocomputing 215, 3–20 (2016) 5. Hossein-Nejad, Z., Nasri, M.: A-RANSAC: adaptive random sample consensus method in multimodal retinal image registration. Biomed. Signal Process. Control. 45, 325–338 (2018) 6. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004) 7. Jia, Y., Xu, Z., Su, Z.: Mosaic of crop remote sensing images from UAV based on improved SIFT algorithm. Trans. Chin. Soc. Agric. Eng. 33(10), 123–129 (2017) 8. Vourvoulakis, J., Kalomiros, J., Lygouras, J.: FPGA accelerator for real-time SIFT matching with RANSAC support. Microprocess. Microsyst. 49, 105–116 (2017) 9. Hernandez-Matas, C., Zabulis, X., Argyros, A.A.: An experimental evaluation of the accuracy of keypoints-based retinal image registration. In: Hernandez-Matas, C., Argyros. A.A. (eds.) Conference 2017, EMBC, pp. 377–381. Jeju Island (2017)
Chapter 27
On Digitization of Chinese Calligraphy Zhiwei Zhu and Shi Lin
Abstract In order to digitize the handwriting process and handwriting marks of Chinese calligraphy, we proposed a method which extracted the handwriting effect and the Chinese brush state at the same time from a video of handwriting process automatically. The method used various object detection algorithms to obtain the changing process of the handwriting marks and the Chinese brush. Convex hull detection, convex hull filling, and image subtraction algorithms were used to separate the images of Chinese brush and handwriting marks. Comparing with the method which captured images of handwriting marks using a video camera placed under the half-transparent writing plane which only captured the handwriting marks instead of brush states, our method captured the handwriting marks and brush states at the same time although the digitization of brush states is still under developing.
27.1 Introduction Chinese calligraphy is the treasure of Chinese traditional culture. It is of great significance to carry forward Chinese culture to realize the digitization of calligraphy. The digitization of calligraphy includes pen, ink, and paper, among which the digitization of writing brush is crucial. Chu et al. [1] proposed to install an ultrasonic annunciator and a gyroscope on a real brush pen and installed three ultrasonic receivers on the display to receive the information transmitted by the ultrasonic annunciator, and further calculated the position of the brush in the three-dimensional space to effectively simulate the brush behavior. Xu et al. [2] proposed to use the virtual brush model to write and draw on the virtual paper by moving the mouse directly. Although these studies had basically realized the digitization of calligraphy from different perspectives, there were still the following problems:
Z. Zhu (B) · S. Lin School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_27
231
232
Z. Zhu and L. Shi
(1) It was impossible to obtain information about the handwriting marks and brush states at the same time; (2) The intervention of the sensor device increased the weight of the brush, and the creation under the unnatural state lacks the calligraphy visual and individual expression; (3) The problem that the images of Chinese brush and handwriting marks were not separated during writing process has not been solved, resulting in the inability to extract specific writing dynamic details and related data. In view of the fact that today’s calligraphy digitization cannot simultaneously acquire the dynamic process of handwriting marks and brush states, in response to the above problems, the main work of this paper was to use the shadowless lamp to shoot a piece of calligraphy video as the input object of calligraphy digitization when the brush was in a natural state. Algorithms were used to process the related video and image; the handwriting marks and brush states in the writing process were obtained in a more detailed manner. This method not only restored the calligrapher’s calligraphy creation in the natural state, but also showed its feasibility and effectiveness in the process of calligraphy digitization.
27.2 Handwriting Marks and Brush States Extraction Most of the research on calligraphy digitization was based on the digital modeling of the brush and the sensing device on the brush to quantify the pen movement process. Instead of starting from these research ideas, this paper carefully observed the creation process of calligraphy and found that the effect of ink on the paper was close to the form of calligraphy. Therefore, it was proposed to record the information of the handwriting marks and the status of brush, so as to restore the real form of calligraphy to the maximum extent and realize the digitization of calligraphy.
27.2.1 Input Design of Calligraphy Digitization The input of calligraphy digitization was to shoot a piece of calligraphy creation video through an SLR camera without pre-installed any sensing device on the writing brush. Considering that the video contained a clear brushstroke process, the video shooting environment required strict control of the light source. The core problem of light source control was that the brush cannot cast shadows on the paper during the writing process. The shadows problem added some useless information which extracted video image frame, as shown in Fig. 27.1. In the early stage of the experiment, two solutions were adopted for the shadows problem in the writing process of the brush: (1) Making a light box environment; and (2) A ring light environment driven by a stroboscopic power supply. After many experiments and comparisons of treatment effects, the second solution was adopted to solve the brush shadows problem, as shown in Fig. 27.2.
27 On Digitization of Chinese Calligraphy
233
Fig. 27.1 Shadows without light control. The left picture was a frame image from source video, and the right picture is the result of the ViBe algorithm processing. Among them, the red rectangle represented brush shadows
Fig. 27.2 Two light source control environment. The picture on the left was a light box environment made from a paper box and four LED lamps of equal power. The right picture was a ring-shape light environment which contained an SLR camera (Nikon Digital camera), paper, brush, and ink
27.2.2 Calligraphy Video Preprocessing Since the handwriting marks and the brush states were rendered in the same color and they were connected to each other, it was difficult to distinguish the brush states from the handwriting marks. In order to obtain the complete states of the brush and the individual handwriting marks motion information, it was necessary to preprocess the shooting video. The key step in the preprocessing process was to use the principle of foreground object detection and related algorithms to identify the region of interest in the video to achieve separation between the moving target and the background. In a large number of motion foreground object detection algorithms, after a large amount of code debugging and comparison of processing effects, the ViBe algorithm and FramDifference algorithm [3, 4] in the BGSLibrary developed by Andrews Sobral were finally selected to preprocess the input video, as shown in Fig. 27.3. Among them, the ViBe algorithm was a background modeling method and BGSLibrary was a library that provides more than 30 video foreground extraction algorithms. The implementation of ViBe algorithm mainly included three steps: initialization of background model, foreground detection process, and update strategy of background model. However, the improvement of ViBe algorithm in this paper was
234
Z. Zhu and L. Shi PreProcessing
ViBe
Video binarization
Foreground motion goal
FramDifference
Edge extraction
Brush tip outline
Video input
Fig. 27.3 The flowchart of calligraphy video preprocessing. The foreground motion goal was obtained after the ViBe algorithm binarized the input video. The brush tip outline was obtained after the FramDifference algorithm extracted the edge of brush from input video
mainly reflected in a separation between updating and output masks and an inhibition of propagation for some pixels in the updating mask. Especially, when dealing with source input video, there was no smear. At the same time, given the continuity of the video sequence captured by the camera, if there were no moving targets in the scene, the continuous frame changed very weakly. If there was a moving target, there would be a significant change between successive frames. Therefore, the FramDifference algorithm in this paper mainly performed differential operation on two consecutive frames or three frames of images, and the pixels corresponding to different frames were subtracted to determine the absolute value of the grayscale difference. When the absolute value exceeded a certain threshold, it could be judged. For the purpose of moving targets, the detection function of the target was achieved.
27.2.3 Calligraphy Video Image Frame Extraction and Image Subtraction In order to obtain more details of the brush stroke in the writing process from the calligraphy video, it was necessary to perform image frame extraction on the original video. After the original video was processed by the image frame extraction algorithms, the video was divided into a set of image sequences containing 462 frames. (1)–(5) in Fig. 27.4 were the 199th, 239th, 309th, 374th, and 418th frame images in the source video, (6)–(10) in Fig. 27.4 were the frame images with handwriting marks and brushes together, and (11)–(15) in Fig. 27.4 were the frame images of brushes. In order to get the frame images of handwriting marks, the brushes information needed to be eliminated on each frame image with handwriting marks and brushes together. Here, using the subtraction principle of basic image operation in OpenCV could solve this problem. In Fig. 27.4, the frame images (11)–(15) were successively subtracted from the frame images (6)–(10), respectively, to obtain frame images of handwriting
27 On Digitization of Chinese Calligraphy
235
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
Fig. 27.4 Frame images of handwriting marks and brushes. (1)–(5), frame images of source video; (6)–(10), frame images with handwriting marks and brushes together; (11)–(15), frame images of brushes; (16)–(20), frame images of handwriting marks
marks (16)–(20). It could be clearly seen from the frame images (16)–(20) that the separation of handwriting marks and brushes was more convenient for observing the dynamic change of the handwriting marks. The fps of all the pictures in Fig. 27.4 were 50, and the resolutions of all the pictures in Fig. 27.4 were 120 dpi.
27.2.4 Convex Hull Detection and Convex Hull Filling The calligraphic video was processed by the ViBe algorithm and the image frame extraction algorithm to obtain a set of binarized sequence images connected by handwriting marks and brushes. In order to further acquire the individual handwriting marks information, we found that using image subtraction could solve this problem. However, when the FramDifference algorithm performed edge extraction on the brush in the video and acquired the outline of the brush, it was found that the brush tip was hollow, which could not be completely offset by the solid brush tip in the frame images with handwriting marks and brushes together obtained by the ViBe algorithm. Here, it was proposed to solve the problem by using the Graham scan method to find the convex hull of the graph [5]. After the convex hull detection was completed, the hole filling algorithm mentioned by Gonzalez in digital image processing was improved to achieve the filling of the convex hull [6], as shown in Fig. 27.5. Figure 27.6 shows a flowchart of convex hull detection algorithm based on the Graham scanning method.
236
Z. Zhu and L. Shi
Fig. 27.5 The left pictures were frame images of brushes. (a)–(e), frame images of brushes contour; (f)–(j), frame images of convex hull detection; (k)–(o) frame images of convex hull filling
27.3 Comparison with Other Methods The experimental platform was built on a toolbox developed by Microsoft Visual Studio 2015 (referred to as VS2015) and calls the third-party open-source library OpenCV2.4.13, using C++ programming language to implement related algorithm functions. The operating system was Microsoft Windows and the host CPU was Intel(R)core(TM)i7-7500U with a frequency of 2.7 GHz. In order to verify the feasibility and effectiveness of the proposed method, we conducted related experiments from the instruments of shooting and the perspective of shooting. In the later stage of the experiment, we used a Kinect camera which could capture the depth image of the pen and ink information. At the same time, in order to verify the rationality of the top-down shooting angle of this paper, a comparative experiment of camera bottom-up shooting was designed. In the experiment, an A4 paper-sized transparent acrylic plate was used as the writing plane, and the paper was placed on the top. The ink state of the brush during the pen-printing process was reflected on the transparent plate in real time. The camera shoots the acrylic plate from below to get the dynamics of the handwriting marks. By comparing with the source video frame, as shown in (1)–(5) of Fig. 27.6, the experimental results showed that the image acquired by the depth camera contained
27 On Digitization of Chinese Calligraphy
237 Start
Find the origin and record it as Connect
with other points in the point set
Calculate the angle between the line segment and the horizontal line
Is there a point with the same angle?
Yes
No Put
on the stack
Push
onto the stack
No
Take the farthest point and delete the rest
Loop through the points
Is the two points on the top of the stack connected to the in a clockwise direction? Yes Remove the point in the depression
End
Fig. 27.6 The flowchart of convex hull detection algorithm based on the Graham scanning method
less pen and ink information, which was not conducive to extraction, as shown in (6)–(10) of Fig. 27.6. Meanwhile, in terms of shooting angle, although the method extracts information of handwriting marks, the state of the brush and the details of the brush were blocked, resulting in the loss of a large amount of written information, which was not conducive to the complete realization of calligraphy digitization, as shown in (11)–(15) of Fig. 27.7.
238
Z. Zhu and L. Shi
Fig. 27.7 Images captured in various ways. (1)–(5), images captured in this paper; (6)–(10), images captured using a video camera placed under the half-transparent writing plane; (11)–(15), depth images captured by a Kinect depth camera
27.4 Conclusion Extracting Chinese handwriting marks and brush state from a video of natural handwriting process was feasible on digitization of Chinese calligraphy. In the research on the digitization of the writing process of the brush, this paper realized the automatic extraction of ink and its changing state from the writing of the writing brush. At the same time, the information of the change of the brush state during the whole writing process also laid a foundation for the further calligraphy digitization
References 1. NS.–H, Chu. An Efficient Brush Model for Physically-based 3D Painting. IEEE CS Press: 413–421 (2002) 2. SH.Xu, CF.Xu, ZM.Liu. Virtual brush model for electronic calligraphy and painting creation. Science in China Series E, 34(12): 1359–1374 (2004) 3. Ding, Z., Lu, W.Z.: Moving Target Detection Algorithm Based on Vibe Background Modeling. Computer Systems 28(04), 183–187 (2019) 4. Lu ZX., Song J., Jie FX. Visual Monitoring Experiment System Based on Improved Frame Difference Method. Laboratory Research and Exploration, 37(09):21–24 + 28, (2018) 5. Wang K. Research on Spatial Point Cloud Structured Algorithm Based on Graham Scanning Algorithm. Modern electronic technology„41(14):139–142 + 146 (2018) 6. Gonzalez, YZ.Yu. Digital Image Processing. Publishing House of Electronics Industry, Beijing: 654–663 (2009)
Chapter 28
Augmented Reality Guidance System for Ultrasound-Guided Puncture Surgery Training Zhaoxiang Guo, Hongfei Yu, Zhibao Qin, Wenjing Xiao, Yang Shen, and Meijun Lai Abstract Ultrasound-guided renal biopsy is one of the most widely applicable diagnostic methods. The procedure involved ultrasound scanning of the lesion area and sampling using a puncture needle. The success rate of sampling depends largely on the accuracy of needle placement and requires a high degree of skill on the part of the operator. In this paper, we describe a visual puncture guidance system designed to increase the accuracy of needle puncture. Our solution is based on the Vuforia engine, which tracks the location of the puncture needle and ultrasonic probe in space. Our system can reduce the difficulty of puncture by providing an intuitive way to display the visual track. The operator’s field of view is enhanced on the screen, thus exercising 3D vision and eliminating the need to consult external monitors. Based on this prototype, our goal is to develop a complete system, reduce the training cost, and improve the overall precision of puncture, to improve the success rate of sampling.
28.1 Introduction Navigation work shows potential in the medical field by supporting guidance on various diseases [1]. In recent years, with the breakthrough of technical, augmented reality technology has been widely used in hepatobiliary surgery [2], lung puncture [3], renal biopsy, and other operations [4]. Ultrasound-guided renal biopsy is a very common treatment. It involves scanning the area of the lesion with an ultrasound image and then sampling it with a puncture needle. Generally, in 2–3 sampling (no more than four samplings), each sampling is no more than 50 mm [6], ensuring minimal damage to the patient. Z. Guo · H. Yu (B) · Z. Qin · W. Xiao · M. Lai Yunnan Key Laboratory of Opto-electronic Information Technology, Yunnan Normal University, Kunming 650500, China e-mail: [email protected] Y. Shen National Engineering Laboratory of Cyberlearning and Intelligent Technology, Beijing Normal University, Beijing, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_28
239
240
Z. Guo et al.
However, the accuracy of sampling depends on the exact location of the needle to ensure that the desired target volume is captured. Current methods include percutaneous computed tomography, intraoperative ultrasound-guided, or X-ray [7, 8]. But this process requires a high degree of skill and spatial imagination, and doctors need to convert 2D images into 3D images in their own brains. The field of medical image guidance aims to solve this problem, and a great deal of work has proved its potential for clinicians [9]. Sharma K and Virmani J of Thapar University developed a decision support system for renal diseases, and the system uses a small feature space to detect medical renal disease, which contains only second-order GLCM statistical features calculated from raw kidney ultrasound images [10]. The diameter of the glomerulus and the thickness of the capsule are determined by performing a process such as binarization and other processing methods of the original RGB image [11]. In the article by Eman A et al., they conducted a clinical examination of one hundred patients, urine analysis and renal–urinary ultrasonography (US), to find a link between kidney disease and drinking water quality. SPARQL (SPARQL Protocol and RDF Query Language) is used to perform data mining to formalize the relationship between water quality and kidney disease [12]. This has led to several mixed reality, augmented reality systems that combine standard color video with medical imaging modes (CT, MRI, US, gamma) for use in kidney puncturing. In this paper, we propose an AR system based on the augmented reality engine (Qualcomm Vuforia), which will allow the user to scan the organ with an ultrasonic probe, determine the needle entry path, plan the deviation between the needle insertion path and the correct needle entry path, and provide visual guidance during use. The puncture needle used in this work is adopted (Precisa HS 16G). AR will allow us to visualize ultrasound images and show the correct puncture path in the field of the surgical field. This paper presents the results of conceptual research on the system using puncture needle and human model prosthesis. A visual system combining the correct puncture path with the actual puncture path is proposed and the future work is discussed.
28.2 Methods 28.2.1 Augmented Reality Guidance The hardware part consists of surgical instruments and two-dimensional codes; the software part consists of a guiding system and graphics rendering. And this article is mainly about the design of the system and does not involve algorithms. The system block diagram is as follows: Vuforia Engine is a software platform for creating augmented reality apps. Developers can easily add advanced computer vision functionality to any application, allowing it to recognize images and objects, and interact with spaces in the real
28 Augmented Reality Guidance System …
241
world [12]. Inside it is a coordinate system that locates the physical environment according to the camera. This allows us to register virtual objects with real objects. The system is mainly through the identification of the map for positioning. The scale space of an image (L(x, y, σ )) is defined as a convolution of a Gaussian function of a varying scale (G(x, y, σ )) with the original image (I (x, y)). L(x, y, σ ) = G(x, y, σ ) ∗ I (x, y)
(28.1)
“*” indicates convolution operation. 1 − (x− m2 ) +2( y− n2 ) 2σ e 2π σ 2 2
G(x, y, σ ) =
2
(28.2)
We developed the application as a generic Windows platform (UWP) application using the Unity3d engine. The application’s main function is to register virtual objects with their physical counterparts, for example, by superimposing a virtual needle on top of a real one to show the user what it is currently tracking. And it also allows the user to display the “correct needle entry path” for the needle, and then draw a visual guide to show the distance and angle offset between the track and the current track. Users interact with the app, eliminating the drawbacks of traditional training (piercing animal entrails) and reducing the risk of contamination. Figure 28.2 shows the system composition.
Fig. 28.1 Block diagram of system
242
Z. Guo et al.
Fig. 28.2 Schematic of the experimental setup showing the relative positions of the AR camera, virtual needle, and US probe components
28.2.2 Ambient Occlusion In our system, we used the Logitech C922 Pro camera to track the identification number attached to the puncture needle and ultrasonic probe. The Vuforia engine can recognize up to five markers at once, but only two are used in this article (plus body recognition markers in the future). Three quick response codes (QR code) are generated and used to calibrate the spatial location of the virtual US probe and needle. It is a challenge to make the virtual and real superposition. First, we create the virtual object to create the virtual puncture needle in Unity; Second, materials in unity were adjusted to be transparent in order to superimpose a virtual puncture needle on a real puncture needle. Finally, we add ambient occlusion by writing shader code. The shader script is used to add occlusion to superimpose the puncture needle with the virtual puncture needle (Fig. 28.3). For the virtual ultrasonic probe, we add a trigger and real-time animation to simulate the real ultrasonic operation, and the probe can only display the ultrasonic image and needle entry path in the correct position. For the virtual puncture needle, we add the injection route and environmental mask on it to reduce the error as much as possible.
28.3 Results As shown in Fig. 28.4, the ultrasonic probe displays the ultrasonic image and the correct needle entry path. The built-in video display ACTS is a monitor on a static plane, avoiding the trouble of looking at other monitors during training. Our appli-
28 Augmented Reality Guidance System …
243
Fig. 28.3 After adding the shader script, the virtual needle is superimposed on the real needle
Fig. 28.4 Demonstration of the complete system, live tracking of the needle, and visual guide showing error with respect to the planned trajectory
cation allows the operator to enter the correct needle track indicating the point of entry to the lesion. Based on this data, the system plots the trajectory and displays the offsets between the real-time trajectory and the target trajectory. Operators can see them puncture along a real-time track and display offsets (Fig. 28.4).
244
Z. Guo et al.
28.4 Conclusions and Future Work In this study on the verification of concept, the potential of the AR technique for US Biopsy guidance has been confirmed. By adding environmental occlusion, we successfully achieve the positioning of virtual surgical instruments and reduce the errors. While a visual guide superposition on the needle can observe the needle’s progress, the transfer of ultrasound images to the plane component can provide more useful information for real-time surgical training. Our system can provide an intuitive way to present the needle’s trajectory, thereby reducing the difficulty of puncture. Providing all the necessary data in the operator’s field of vision can reduce the time and frequency of puncture, reduce the pain to patients, and increase the success rate of sampling. In the future, we will design experiments to quantify the effectiveness of training. Future developments may include navigation of the degree of penetration and writing scripts to quantify the depth of penetration (for example, skin, fat, muscle, bone, and viscera). Track records were made after the operation. In terms of positioning, we plan to add optical markers in the future to increase the accuracy of puncture. In terms of display mode, we plan to release the system in HoloLens Pro in the future to increase the sense of immersion for operators. Once Microsoft releases the HoloLens Pro and HoloLens Computer Vision API that allows direct access to raw sensor data, the limitations of our system will disappear. This will improve the applicability of the whole system. As discussed earlier, although it increases the volume of the system, it significantly improves the effect of the system.
References 1. Van Oosterom, M.N., Van Der Poel, H.G., Navab, N., Van De Velde, C.J.H., Van Leeuwen, F.W.B.: Computer-assisted surgery: Virtual- and augmented-reality displays for navigation during urological interventions. Curr. Opin. Urol. 28, 205–213 (2018) 2. Wilhelm, D., Marahrens, N.: Enhanced visualization: from intraoperative tissue differentiation to augmented reality. Visceral Med. (2018). https://doi.org/10.1159/000485940 3. Eliodoro, F., Giulia, F., Domiziana, S., Giacomo, L., Francesco, G.R.: Percutaneous low-dose CT-guided lung biopsy with an augmented reality navigation system: validation of the technique on 496 suspected lesions ✩. Clin. Imaging 49, 101–105 (2018) 4. Tang, R., Ma, L.F., Rong, Z.X., Li, M.D., Zeng, J.P., Wang, X.D., Liao, H.E., Dong, J.H.: Augmented reality technology for preoperative planning and intraoperative navigation during hepatobiliary surgery: a review of current methods. Hepatobil. Pancr. Dis. Int. 17(02):101–112 5. Muthuppalaniappan, V.M., Blunden, M.J.: Renal biopsy. Medicine (United Kingdom) (2015). https://doi.org/10.1016/j.mpmed.2015.04.013 6. Chan, W., Xiaoqiong, W., Xianba, L., Shengzao, L., Chimeng, H.: Influencing factors of nosocomial infection in patients undergoing percutaneous renal biopsy under CT guidance. Chinese J. Nosocomiol. 11, 1669–1672 (2019) 7. Hospital, J.C.: Diagnostic value and safety of ultrasound-guided percutaneous. Renal Biopsy Nephrop, 10–13 (2019)
28 Augmented Reality Guidance System …
245
8. Sato, Y., Nakamoto, M., Tamaki, Y., Sasama, T.: Image Guidance of Breast Cancer Surgery Using 3-D Ultrasound Images and Augmented Reality Visualization 17, 681–693 (1998) 9. Sharma, K., Virmani, J.: A decision support system for classification of normal and medical renal disease using ultrasound images: a decision support system for medical renal diseases. Int. J. Ambient Comput. Intell. 8, 52–69 (2017) 10. Kotyk, T., Dey, N., Ashour, A.S., Balas-Timar, D., Chakraborty, S., Ashour, A.S.: Measurement of glomerulus diameter and Bowman’s space width of renal albino rats. Comput. Methods Programs Biomed. 126, 143–153 (2016) 11. Alkhammash, E., Mohamed, W.S., Ashour, A.S., Dey, N.: Designing ontology for association between water quality and kidney diseases for medical decision support system. In: VI International Conference Industrial Engineering and Environmental Protection IIZS, Zrenjanin, Serbia (2016) 12. VuforiaTM API Reference https://library.vuforia.com/content/vuforia-library/en/reference/ unity/index.html
Chapter 29
Development of Subjective and Objective Assessment for the Brain Surgical Simulation—A Review Chengming Zhao, Hongfei Yu, Tao Liu, Yang Shen, and Yonghang Tai
Abstract With the improvement of medical means and the combination of augmented reality and virtual reality technology, the implementation method of brain tumor surgery has been rewritten, and more and more research teams and institutions have proposed relevant simulators to solve this complex problem. Aiming at the implementation of brain tumor surgery simulator in recent years, this paper proposes a new evaluation method, which comprehensively examines subjective factors and objective factors and combines with the specific implementation steps of surgery, to complete the surgical evaluation system that is insufficient at the present stage.
29.1 Introduction With the development of modern medicine, the surgical training simulators based on Augmented Reality (AR) and Virtual Reality (VR) technology have been developing continuously in the past decades, facilitating surgeons to face the challenges for various difficult surgical trainings. Moreover, it provides novices that acquire the opportunity to achieve the satisfied results in a safe and reliable environment. Research shows that VR-based simulation can play an important role in the acquisition of surgical skills [1]. In order to meet these needs, commercial applications have developed a variety of simulators. The Congress of Neurosurgeries Surgeons established a Simulation Committee in 2010, which proposed the construction of simulation including vascular, skull, and spinal components [2]. The goal of the committee is to create VR and physical simulation to maximize the living standard
C. Zhao · H. Yu (B) · T. Liu · Y. Tai Yunnan Key Laboratory of Opto-electronic Information Technology, Yunnan Normal University, Kunming 650500, China e-mail: [email protected] Y. Shen National Engineering Laboratory of Cyberlearning and Intelligent Technology, Beijing Normal University, Beijing, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_29
247
248
C. Zhao et al.
of residents and standardize the evaluation of equipment. However, no similar assessment methods have been proposed for brain tumors surgical simulation. Brain tumors refer to creative organisms growing in the cranial cavity, also known as intracranial tumors, brain cancer, can originate from the brain, meninges, nerves, blood vessels, and brain accessories. Therefore, people should pay attention to it. Therefore, the related simulator of brain tumor surgery training was proposed to solve the training problems of surgeons in peacetime, so that the operation can be carried out quickly, efficiently, and accurately. In order to illustrate the comprehensive effect of a simulator, evaluation is inseparable. Series of procedures were needed to be verifies whether the simulator is helpful as a training application. A reliable simulator should be close to reality in both subjective and objective factors, which will enable novices to train better on the simulator. Only when the ability of experts is reached, they can have the strength and confidence to improve the success rate in the future real surgery process. Although the development of hospitalization surgery is a key component of neurosurgery training, there are limited means to assess the progress of hospitalization surgery and skill acquisition. Many problems still need to be solved for the evaluating the performance, skills, and training of resident doctors in the operating room and in quantifying surgical skills. The cumbersome surgical procedures for brain tumors have led to the absence of standardized tools for assessing physicians and performance in the operating room [3]. At present, different evaluation methods are still used in the simulators developed for VR and augmented reality technology. It can be seen that for different simulation devices, customized evaluation methods needed to be evaluated, which makes the evaluation method lose its universality and credibility. Table 29.1 shows the evaluation methods of tumor surgery at this stage. Figure 29.1 shows our simulation of a human brain skull. This paper reviews the implementation steps of brain tumor surgery and proposes a novice complete evaluation method for the simulator at this stage. Starting from subjective and objective factors, participants can complete brain tumor resection by the specific simulator and fill in relevant questionnaires to feedback the process experience. Performance indicators include the reality of the surgical instrument, the degree of simulation of the surgical environment, the use of skull clips, the difficulty of scalp removal, the rotation and feeding speed of the drill bit, the process of cutting the cranial bone flaps with milling cutter, the haemostatic effect of electrocoagulation forceps, the volume of the resected tissue, the length of the tool path, the force applied, etc. Comprehensive analysis of the experimental results can help to select different simulators for specific patient problems in order to achieve the best surgical results. Comprehensive analysis of the experimental results can help to select different simulators for specific patient problems in order to achieve the best surgical results.
29 Development of Subjective and Objective Assessment …
249
Table 29.1 Current evaluation methods Number
Author
Year
Researchers
Evaluation methods
1
Kersten-Oertel et al. [4]
2014
It consisted of 13 normal subjects and 6 neurosurgeons
In two novice experiments and one expert experiment, subjects were asked to determine which of the two blood vessels displayed was closer to them. At the end of the experiment, each participant scored the difficulty of judgment
2
Alotaibi et al. [5]
2015
The study included six neurosurgeons, six junior residents, and six senior residents
Based on the definition of tactile and visual feedback, the length, cutting rate, and force ratio of tumor incision were analyzed on NeuroTouch platform, and then the success rate of the simulator was analyzed by statistical principle
3
Alzhrani [6]
2015
A total of 33 participants included 17 neurosurgical specialists, 7 senior neurosurgical residents, and 9 primary neurosurgical residents
Define the reference standard level. Based on the performance of surgeons, prune averaging program was used to exclude the extreme fraction in the calculation. The resection rates of brain tumors, volume resection force, total apical path length, and pedal activation frequency were analyzed
4
Barthélemy et al. [1]
2015
They were randomly divided into two groups: 373 cases aged 20–60
From the operation completion, control group, operation time injury, a mortality, and morbidity, we use random method to investigate, chi-square test analysis, and data reports are mean standard deviation (continued)
250
C. Zhao et al.
Table 29.1 (continued) Number
Author
Year
Researchers
Evaluation methods
5
Hadley et al. [7]
2015
667 assessments and 6 neurosurgeons
According to the professional assessment scale, it is divided into seven parts: respect and responsibility, time and action, instrument operation, instrument knowledge, smooth operation, auxiliary use, and specific procedure knowledge
6
Ribaupierre et al. [3]
2015
There were 7 neurosurgeons and 1 specialist
Record the time and accuracy, analyze the performance of trainees through indicators, and record the project time. Experts scored the trajectory accuracy according to the cross-dissection area and calculated the performance index of speed × accuracy
7
Meier et al. [8]
2016
Sixty-four cell tumors, including 12-month follow-up images, were analyzed and recorded by two scorers
The potential of the volume of longitudinal brain tumors was explored by using the automatic analysis method of brain tumors images, and the real data were obtained by comparing the automatic volume estimation with the results of surgical segmentation
8
Pelargos et al. [9]
2017
The authors used randomized controlled trials, clinical trials, and case analysis to evaluate the results (continued)
29 Development of Subjective and Objective Assessment …
251
Table 29.1 (continued) Number
Author
Year
Researchers
Evaluation methods
9
Pfandler et al. [10]
2017
Six different types of participants: 6 students, 15 residents, 5 junior doctors and researchers, 2 surgeons, and technicians
Using the quality tools of medical education research, the quality of the methods included in the study was evaluated, sampled survey, data summary, and score analysis
10
Mazur et al. [11]
2018
The experiment included 6 junior residents, 6 senior residents, and 6 attending neurosurgeons
The reporting methods of systematic review and meta-analysis were used. The parameters measured included the amount of tumor resection, the amount of healthy brain resection, the length of instrument path, the effect and efficiency of surgery, and the performance of participants in four courses was compared linearly to evaluate the learning curve
Fig. 29.1 Human skull model for the simulation
252
C. Zhao et al.
29.2 Methods Assessment refers to the process of judging the value, correctness, feasibility, and desirability of things. In practice, it is necessary to clarify the purpose of evaluation, determine the evaluation object, scope, and the key to choose the type and method of evaluation value, as well as in the process of instrument evaluation. Aiming at the evaluation of instrument, it is mainly divided into the subjective evaluation and objective evaluation. The most comprehensive and effective evaluation of instrument is given from different angles [12].
29.2.1 Subjective Evaluation After visiting and using the simulator, subjects were subjectively evaluated. In this test, 30 general medical students, 10 neurosurgery, and 5 brain tumor surgery experts were selected for the questionnaire survey. The questionnaire adopted Likert scale scoring method with a 5-point scale, and the measurement results were averaged. Figure 29.2 shows the current subjective evaluation and objective evaluation of the simulator. Table 29.2 is the Likert questionnaire of subjective evaluation. Through the test, we collected the subjective evaluation of equipment and instruments from ordinary groups and expert groups, and the results were calculated and
Simulator evaluaon method
Subjecve evaluaon
Vision
Touch
Appearance Feeling of Instrument
The Reality of Instruments Operang Hand of Instrument
Fig. 29.2 Simulator evaluation methods at this stage
Objecve evaluaon
Problem solving ability Problem solving me Problem-solving success rate
29 Development of Subjective and Objective Assessment … Table 29.2 Subjective questionnaire
253
Content of evaluation (Score 1–5) Q1: Sense of reality of the instrument Q2: Picture accuracy Q3: The authenticity of medical images Q4: Degree of simulation of the operating environment Q5: How easy it is to fix the skull Q6: Feed speed when the bit is working Q7: Punch force feedback Q8: The degree of difficulty in excising the cranial bone flap Q9: What is the haemostatic effect of electric coagulation forceps Q10: Effect of temperature control on brain bone Q11: Whether the use of scissors is smooth Q12: Ease of tumor extraction
analyzed by using IBM SPSS 20.0 to comprehensively investigate and analyze the age, occupation, gender, training level, handedness, weekly playing time of electronic games, and musical instruments of the participants. According to statistical theory, when P-value is less than 0.05, the data difference is significant. Figure 29.3 shows the human brain tumor model simulated by the simulator. Fig. 29.3 Simulator simulating human brain tumor model
254
C. Zhao et al.
Table 29.3 Objectively evaluating the surgical procedure table Medical students
Neurosurgeon
Brain oncologist
P
Electrocoagulation forceps to stop blood flow Length of the hole entry path The extraction rate of brain tumor Whether damage to other tissues The operation time
29.2.2 Objective Evaluation In terms of objective evaluation, it is necessary to evaluate the haemostatic effect of electric coagulation forceps, the observation of drill bit operation, and the tumor extraction rate. First of all, the bleeding, accompanied by a large amount of bleeding in brain tumor surgery, and electric coagulation forceps haemostatic effect, becomes particularly important, and according to medical data it shows that normal male surgical blood loss should be controlled in 800–1500 ml; otherwise, there will be the possibility of shock, and operator’s goal is to maximize the removal of the tumor and makes the bleeding at least, measured in cubic centimeter as the unit volume blood loss [13]. Second, the cutting degree of the drill bit is evaluated. The cutting degree of the drill bit on the skull is quantified by the volume of bone chip removal, and the drilling degree is measured by cubic centimeters as unit volume. Finally, the removal rate of the tumor was evaluated. The operator needed to maximize the removal of the brain tumor without removing the surrounding brain tissue. The extraction degree of the gelled and solid tumors was measured in cubic centimeters per volume, and the measured results were compared with the standard values to evaluate the operation effect of the simulator. Table 29.3 lists the objective evaluation methods, and Fig. 29.4 shows the simulation of drilling the skull with the drill bit.
29.3 Discussion Through the subjective evaluation of the experimental performance in the virtual environment, the effectiveness of simulation is proved, which can more intuitively represent the simulation of the real situation of the simulator and can give a fair and objective evaluation of the simulator from the perspective of touch and vision. The simulator builds a highly reproducible, truly tangible surgical laboratory environment that allows subjects to more immersive participate in the surgical procedure
29 Development of Subjective and Objective Assessment …
255
Fig. 29.4 Cranium drilling experiment with the simulator
Table 29.4 Comprehensive evaluation table Score The overall feeling of the simulator Ease of operation Visually and tactile Do you recommend this simulator as a test tool for brain tumor surgery Other
and reduces the sense of operating tension and pressure. After objectively evaluating the real operation process of the simulator, subjects can evaluate the usability of the simulator more intuitively and effectively through surgical operation. In addition, evaluation forms can be compared with experimenters at different levels to find problems in the operation and improve the performance. After a comprehensive evaluation, a 5-point Likert form is adopted to evaluate the whole process of participating in the simulator test, so that the real effect of the simulator can be directly revealed in people’s eyes. Table 29.4 is the comprehensive evaluation table of the whole system after subjective evaluation and objective evaluation.
29.4 Conclusion The advantage of this evaluation system is that it proposes comprehensive subjective evaluation and objective evaluation methods for the few brain tumor surgery simulators at the present stage. The evaluation method relies on the principle of statistics, and
256
C. Zhao et al.
the subjects from ordinary medical students to brain tumor surgery experts increase the universality of the experiment and make the data true and reliable. In addition, this evaluation method highly relies on the operation steps of brain tumor surgery and starts from every detail to accurately consider the restoration degree of the simulator and improve the success rate of surgery. These systems have been evaluated to allow us to focus on how neurosurgeons actually operate and to evaluate surgeons for their expertise. Get beginners involved as early as possible, get more effective information, and reduce the errors caused by real surgery. Excellent simulation system can make brain tumor surgery successful and effective.
References 1. Barthélemy, E.J., Melis, M., Gordon, E., Ullman, J.S., Germano, I.M.: Decompressive craniectomy for severe traumatic brain injury: a systematic review. World Neurosurg. 88, 411–420 (2016) 2. Chang, K.E., Zada, G.: Book review: comprehensive healthcare simulation: neurosurgery. Oper. Neurosurg. 17(1), E39–E40 (2019) 3. Ribaupierre, S.D., Armstrong, R., Noltie, D., Kramers, M., Eagleson, R.: VR and AR simulator for neurosurgical training. In: 2015 IEEE Virtual Real Conference VR 2015—Proceedings, pp. 147–148 (2015) 4. Kersten-Oertel, M., Chen, S.J.S., Collins, D.L.: An evaluation of depth enhancing perceptual cues for vascular volume visualization in neurosurgery. IEEE Trans. Vis. Comput. Graph. 20(3), 391–403 (2014) 5. Alotaibi, F.E., et al.: Assessing bimanual performance in brain tumor resection with neurotouch, a virtual reality simulator. Oper. Neurosurg. 11(1), 89–98 (2015) 6. Alzhrani, G., et al.: Proficiency performance benchmarks for removal of simulated brain tumors using a virtual reality simulator neurotouch. J. Surg. Educ. 72(4), 685–696 (2015) 7. Hadley, C., Lam, S.K., Briceño, V., Luerssen, T.G., Jea, A.: Use of a formal assessment instrument for evaluation of resident operative skills in pediatric neurosurgery. J. Neurosurg. Pediatr. 16(5), 497–504 (2015) 8. Meier, R., et al.: Clinical evaluation of a fully-automatic segmentation method for longitudinal brain tumor volumetry. Sci. Rep. 6, 1–11 (2016) 9. Pelargos, P.E., et al.: Utilizing virtual and augmented reality for educational and clinical enhancements in neurosurgery. J. Clin. Neurosci. 35, 1–4 (2017) 10. Pfandler, M., Lazarovici, M., Stefan, P., Wucherer, P., Weigl, M.: Virtual reality-based simulators for spine surgery: a systematic review. Spine J. 17(9), 1352–1363 (2017) 11. Mazur, T., Mansour, T.R., Mugge, L., Medhkour, A.: Virtual reality-based simulators for cranial tumor surgery: a systematic review. World Neurosurg. 110, 414–422 (2018) 12. Wright, T.L.: Design and Evaluation of Neurosurgical Training Simulator (2018) 13. Latka, D., et al.: Biomedical Engineering and Neuroscience. 720, 1–10 (2018)
Chapter 30
Virtual Haptic Simulation for the VR-Based Biopsy Surgical Navigation Lin Xu, Chengming Zhao, and Licun Sun
Abstract A kidney biopsy is a pathological examination of a small amount of living kidney tissue taken from a patient’s kidney by puncture or surgery. The traditional surgical method is to simulate the animal kidney, but with the change of medical means and complex pathology, the current solution cannot meet the increasingly complex surgical problems. Therefore, in order to simulate the process of needle insertion, a dynamic puncture biomechanical experimental architecture was proposed. In order to accurately model the mechanical effect of insertion, a dynamic force model was first proposed and also introduce the idea of continuous modeling of percutaneous force. The dynamic puncture model we proposed boosted the static puncture force modeling and simulated the actual procedure more accurately. It provides a more realistic, reliable, and accurate application for machine-assisted surgery.
30.1 Introduction With the development of modern medicine, minimally invasive surgery has gradually entered the public’s view and become the first choice for surgeons and patients. Minimally invasive surgery has the characteristics of less wound injury, less pain for patients, short postoperative recovery time, and low incidence of disease. In MIS, many clinical surgeries involve percutaneous insertion therapy; in needle insertion procedures, the needle needs to be inserted into the human trunk to reach a target [1]. Kidney biopsy (also kidney puncture) is a medical procedure that involves inserting a needle into soft tissue and testing it. It is an invasive procedure, in which the small part of kidney tissue is removed from the human body for examination. In the earliest years, the open operation is the only approach to collect the sample tissue from a L. Xu (B) · C. Zhao · L. Sun (B) Yunnan Normal University, Kunming, China e-mail: [email protected] L. Sun e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_30
257
258
L. Xu et al.
live person. With the computer-assisted technology and medical image, the equipment has been gradually updated and improved, and the real-time imaging guidance (ultrasound, CT scanning, and MIR) is widely utilized to assist the nephrologist during the needle insertion, having improved the safety and reduced the operation time. The puncturing process of the needle is generally divided into two parts, that is, before and after a puncture [2]. Prior to the puncture, the patient is lying flat on his back. The surgeon injects a local anesthetic into the patient’s skin layer and passes it from the subcutaneous tissue to the surrounding kidneys. After these preliminary preparations, the surgeon will perform the operation guided by real-time medical images. Real-time medical imaging provides important information for the surgical personnel to effectively analyze the condition, quickly develop treatment plans, and reasonably evaluate the surgical effect. In these processes, we hope that puncture insertion will be more accurate, require less time to operate, and cause less damage to surrounding tissues [3]. At present, many scholars in domestic and foreign have studied the insertion mode of the needle in renal puncture and established different mechanical models according to numerous experimental data [4].
30.2 Materials and Method 30.2.1 Materials In this paper, the kidney is utilized as the insertion object to simulate the actual renal percutaneous insertion procedure; due to the mechanical properties of pig, kidneys are similar to those of human soft tissues; the kidney was inserted in vitro by needle that can describe properly the performance of needle insertion into the human body [5]. As far as we know, the kidney is one of the vital organs of the human body and the main part of the urinary system; it plays an important role in human self-healing function. But in recent years, more and more people suffer from kidney disease, such as kidney stones, kidney inflammation nephrotic syndrome, and so on; most of them need to be diagnosed by needle insertion [6]. As shown in Fig. 30.1, the structure of kidney is mainly composed of renal capsule, renal cortex, and renal medulla. Figure 30.2 shows the simulated process of renal puncture.
30.2.2 Experiment Design Like many procedures, surgeons usually give anesthesia to the patient before the percutaneous insertion and require the patient to hold his or her breathing as far as possible during insertion into tissue target [7]. From this perspective, most researchers currently focus on the static puncture process, assuming that the experiment was conducted under ideal conditions. However, it is inevitable that during the operation,
30 Virtual Haptic Simulation for the VR-Based Biopsy …
259
Fibrous capsule Renal cortex Major calyx Renal pelvis Renal medulla Renal column
Fig. 30.1 The frontal section of the right kidney
Fig. 30.2 The insertion force (left: initial data and average, right: mean-value modeling of force and depth)
the moving organ will have a breathing process, which directly affects the accuracy of the insertion. To resolve the above difficulty, this paper is based on the previous research, considering the patient will have tiny breathing during the operation; the experiment equipment was designed and improved, and then dynamic puncture was proposed in this work [8]. Before the experiment, it is necessary to note the following points: • The experimental equipment in this study must be placed on an absolutely horizontal surface to avoid the impact of jitter on the accuracy of the experiment. The needle should be in a vertical state with a distance from the soft tissue surface. • The porcine kidneys collected from the market must be performed within 5 hours. According to many medical data, the feeding speed of a puncture inserted tip is between 0.1 and 10 mm/s. In this paper, 3 mm/s speed is used for stable injection. • The kidney was used for the test. In order to reduce the experimental error, the experiment should be carried out in three different areas. Then, the data should be sorted out and averaged. The average value should be taken as the analysis data.
260
L. Xu et al.
Table 30.1 The fitting degree cooperation of each mechanical model Fitting degree
Fourier
Second polynomial
Nonlinear
Exponent
0.9993
0.9787
0.9183
0.9661
30.2.3 Insertion Force Modeling In this paper, the experimental process is modeled according to the Fourier principle and the above classical algorithm. Table 30.1 shows the different fitting degrees of different mechanical models [9]. Through the analysis of the Fourier model, it can be seen from the table that the fitting accuracy for the stiffness reaches 0.9993.
30.3 Results and Discussions Through the Fourier model, the nonlinear least square method is used to fit and insert the force. Moreover, for the rigor of the experiment, we analyzed three different regions of the same individual, compared the trend of insertion force, calculated the average value, and finally modeled it. Therefore, compared with the same individual, the data difference of different regions will be very small, and the same model can be used for modeling (Sun and Wu 2011). Table 30.2 shows the fitting degree of different mechanical models [10]. Like Gordon et al. (2015), the data were segmented by MATLAB and modeled. In the process of the experiment, the Fourier series should be adjusted according to different regional characteristics according to different requirements, and finally, the model parameters should be estimated. The established Fourier model can well reflect the experimental data, and it can be seen from the fitted curve that the fluctuation phenomenon in the process of needle insertion is also consistent with the experimental data. Moreover, the feasibility of the Fourier model is verified by relevant experiments. In the following work, we performed puncture experiments in four different areas: heart, lung, liver, and pig kidney. Three different regions were selected from each group of experiments to test and calculate the average value. The Fourier model proposed in this paper was used to analyze, calculate, and model the experimental data. Tables 30.3, 30.4, 30.5, and 30.6 show the data collection and modeling results for each organization type [11]. Table 30.2 The fitting degree of the liver mechanical model Phase
1
2
3
4
5
Average
Fitting degree
0.9993
0.9927
0.9998
0.9952
0.9953
0.9965
30 Virtual Haptic Simulation for the VR-Based Biopsy …
261
Table 30.3 The liver mechanical model fitting Phase
1
2
3
4
5
Average
Fitting degree
0.9996
0.9988
0.9998
0.9994
0.9999
0.9995
Table 30.4 The lung mechanical model fitting Phase
1
2
3
4
5
Average
Fitting degree
0.9997
1
1
0.9979
0.9993
0.9994
Table 30.5 The heart mechanical model fitting Phase
1
2
3
4
5
Average
Fitting degree
1
0.9999
1
0.9310
0.9999
0.9862
Table 30.6 The pork mechanical model fitting Phase
1
2
3
4
5
Average
Fitting degree
0.9997
0.9979
0.9999
0.9999
0.9996
0.9994
30.4 Conclusions and Future Works In this paper, we propose a new mechanical modeling method based on Fourier transform and design a dynamic percutaneous biomechanical experimental structure for renal puncture surgery, which facilitates accurate rate modeling of insertion force. The dynamic force model was proposed for the first time, and the idea of a continuous percutaneous force model was introduced. In future work, we will deeply explore the algorithm, combine it with experiments, and cooperate with hospitals to put the method mentioned in the article into practical application, so as to make kidney puncture and other operations more efficient and safe. Acknowledgements This research is funded by the NSFC: 61705192.
References 1. Carra, A.: Avilavilchis JC. Needle insertion modeling through several tissue layers. 1(1), 237– 240 (2010) 2. Abolhassani, N., Patel, R., Moallem, M.: Control of soft tissue deformation during robotic needle insertion. Minim. Invasive Ther. Allied Technol. 15(3), 165–176 (2006) 3. Kobayashi, Y., Onishi, A., Watanabe, H., Hoshi, T., Kawamura, K.: Development of an integrated needle insertion system with image guidance and deformation simulation. Comput. Med. Imaging Graph. 34(1), 9–18 (2010)
262
L. Xu et al.
4. Taylor, R.H., Stoianovici, D.: Medical robotics in computer-integrated surgery. IEEE Trans. Robot. Autom. 19(5), 765–781 (2003) 5. Daraio, P., Hannaford, B.: Menciassi A, Smart surgical tools and augmenting devices. IEEE Trans. Robot. Autom. 19(5), 782–792 (2003) 6. Brouwer, I., Ustin, J., Bentley, L., Sherman, A., Dhruv, N., Tendick, F.: Measuring in vivo animal soft tissue properties for haptic modeling in surgical simulation. In: Proceedings of Medicine Meets Virtual Reality, pp. 69–74 (2010) 7. Brown, J.D., Rosen, J., Kim, Y.S., Chang, L., Sinanan, M.N., Hannaford, B.: In-vivo and in-situ compressive properties of porcine abdominal soft tissues. In: Proceedings of Medicine Meets Virtual Reality, pp. 26–32 (2003) 8. Maurin, B., Barbe, L., Bayle, B., Zanne, P., Gangloff, J.: In vivo study of forces during needle insertions. In: Proceedings of the Scientific Workshop on Medical Robotics, Navigation and Visualization, Remagen, Germany, pp. 415–422 (2004) 9. Barbé, L., Bayle, B., Mathelin, M.D., Gangi, A.: Needle insertions modeling: identifiability and limitations. Biomed. Signal Process. Control 2(3), 191–198 (2007) 10. Menciassi, A., Eisenberg, A., Carrozza, M.C., Dario, P.: Force sensing micro in structure for measuring tissue properties and pulse in microsurgery. IEEE/ ASME Trans. Mechatron. 8(1), 10–17 (2003) 11. Han, L., Noble, J.A., Burcher, M.: A novel ultrasound indentation system for measuring biomechanical properties of in vivo soft tissue. Ultrasound Med. Biol. 29(6), 813 (2003)
Chapter 31
An Experiment Process Driving Software Framework for Virtual–Reality Fusion Yanxing Liang, Yinghui Wang, and Yifei Jin
Abstract Aiming at the adaptability of the traditional software framework to the experimental process supported by the virtual–reality fusion in high school, a workflow-driven software framework was proposed. The software framework is composed of three parts: multimodal input based on adapter pattern, experimental workflow engine based on graph algorithm, and simulation presentation based on scriptable render pipeline. This software framework not only effectively solves the problem of diversity of simulation experiment process in high school, but also has certain reference significance for the design and development of simulation experiment software of the same type.
31.1 Introduction In recent years, with the development of Virtual Reality (VR) technology, the virtual simulation experiment software of high school has come into being. This software can complete experiments that traditional experimental means cannot complete. However, the variability of each high school experiment is mainly reflected in the diversity of the process. This includes the correct basic process and the wrong non-basic process. The basic process refers to the process of “students operating experimental equipment, carrying out experiments according to regulations, demonstrating experimental phenomena, and obtaining experimental results,” and the process under the wrong operation is more complex and diverse, which is the supplement of the basic process, and also the key to reflect the experimental inquiry, especially in the realization of virtual–reality fusion simulation experiment software is more important. How to design a software framework to support the above two kinds of processes to support the effective response to the changes of various experimental processes in the Y. Liang (B) · Y. Wang Xi’an University of Technology, Shaanxi, China e-mail: [email protected] Y. Jin Beijing LeBu Educational Technology Co., Ltd., Beijing, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_31
263
264
Y. Liang et al.
virtual–reality fusion experiment in high school has become the problem of designing and implementing virtual simulation experiment software. Therefore, we propose a workflow-driven software framework to support the virtual–reality fusion simulation experiments in high school. Compared with WISE [1], PhET [2], and other virtual experimental systems, our framework has more flexibility in experimental process control and experimental step management, and is suitable for virtual reality fusion technology.
31.2 Research Status Since the emergence of commercial office automation management system in the early 1990s, workflow [3] theory, methods, and technologies have been rapidly developed and widely used [4]. Although the office automation management system has supported the dynamic modification of workflow, this workflow has limited function and its scope is seriously limited by business processes. In order to solve this problem, flexible workflow technology has been developed and applied. The flexible workflow technology system is enriched by ADEPT2 project [5], case-handling method [6], and declare method [7]. However, when combined with a software framework, this workflow technology still exists only as a component of the software framework [8]. It does not penetrate the whole software framework and then controls the logic behavior of the whole software, which leads to the narrow application of workflow technology, unable to adapt the business process which is similar to the virtual simulation experimental process as the core, and make the correct response to various processes. Many papers published in Science reflect the development of the field of virtual laboratory application research. For example, WISE [1] is a web-based scientific inquiry environment developed by the University of California, Berkeley, which aims to develop students’ research and design abilities. The PhET [2] interactive simulation system, jointly developed by the University of British Columbia and the University of Colorado, has served users globally more than 360 million times in total. This virtual laboratory, however, the experimental process itself, is not fully supported, so it can only be experimented according to the present experimental templates and cannot support flexible experimental process changes. In order to realize the scalability, the general strategy is to set extension points on the software framework; these extension points are divided into interface extension points, function extension points, and data extension points [9]. In order to run workflow technology through the whole software framework, the strategy of extension point cannot meet the needs of running workflow through the software framework. In addition, some typical workflow software frameworks have been developed in recent years, such as service-oriented, workflow management-based flexible software framework [8], and PetriNet-based and agent pattern-based software framework [10]. However, the integration and penetration of workflow as a core framework of the framework are still very weak. At present, the software framework based on
31 An Experiment Process Driving Software Framework …
265
workflow is mainly oriented to the traditional commercial field, aiming at the optimization and improvement of data addition, deletion, retrieval, and modification, which is not suitable for the high school experimental operation process with the abovementioned complex process characteristics.
31.3 Process-Driven Software Framework The whole workflow-driven software framework is divided into three parts: multimodal input based on adapter pattern, experimental workflow engine based on graph algorithm, and simulation presentation based on experimental workflow. The overall structure of the framework is shown in Fig. 31.1.
31.3.1 Adapter Mode-Based Multimodal Input In the traditional software framework, the user’s input mode and output terminal are determined, such as PC, Android phone, Apple phone, electronic whiteboard, and so on. For the virtual–reality fusion of high school experiments, users will use VR, AR head display, and other devices in addition to the traditional equipment. An adapter pattern is a software design pattern that allows the interface of an existing class to be used as another interface, typically to make the existing class work with other classes without modifying the source code. At the same time, it is necessary to judge, restrict, or guide the user’s experimental operation according to the user’ s real-time experimental operation, so as to realize the process guidance and judgment scoring of user’s experiment. In the software framework of this paper, as shown in Fig. 31.2 the multimodal input adaptation mechanism based on adapter mode is proposed to meet the needs of various input devices. Each step of the user’s operation of the lab equipment may trigger monitoring points that are present in the experimental workflow topology diagram. The operation triggers the monitoring point and passes the operation information back to the lab process engine. In order to meet the requirements of experimental process control, the engine selectively blocks the operation of users by transmitting state control commands to the interactive input management according to the free exploration mode, practice mode, and examination mode of the high school virtual experiment. The input management system locks the specified user input and experimental equipment through the experimental status message transmitted through the process control information interface and does not allow the user to operate. This control design strategy based on lock-in mechanism, the low coupling between the experimental evaluation guidance system, and the input management system is ensured, and at the same time, the input control logic is controlled separately by state judgment, which ensures that the input control logic does not depend on a specific experimental equipment and has universality. At the same time, the readability and maintainability
Fig. 31.1 Process-driven software framework
266 Y. Liang et al.
31 An Experiment Process Driving Software Framework …
267
Fig. 31.2 Adapter mode-based multimodal input
of the code are improved from the perspective of the software framework, and the extensibility and adaptability of the software framework are also established.
31.3.2 Experimental Workflow Engine Based on Graph Algorithm The experimental workflow engine of the experimental scoring system is the core control module of the virtual simulation experiment in high school. In recent years, Xie Wending [11] and others put forward a scoring system for high school experimental examination, but the system is based on the fixed experimental process to set the scoring standard, which is not flexible enough to meet the requirements of independent exploration of high school virtual simulation experiment under the environment of virtual and real integration. Therefore, we propose an experimental workflow engine based on graph algorithm as shown in Fig. 31.3. It is responsible for the interaction and display process of the whole virtual experiment, the guidance and prompting under the experimental teaching mode, and the judgment and evaluation under the experimental examination mode. The experimental workflow engine is divided into five parts: experimental step configuration management, experimental mode selector, experimental step process topology routing core, experimental process topology, and experimental workflow engine state control interface. The experimental step configuration management is responsible for loading the experimental step configuration files corresponding to the experimental resources into the system according to different experimental modes. Experimental mode selector is divided into exploration mode, practice mode, and test mode. Exploratory mode supports students to freely combine DIY (Do It Yourself) experimental equipment and shows different experimental results with the support
268
Y. Liang et al.
Fig. 31.3 Experimental workflow engine based on graph algorithm
of experimental simulation presentation module. The connection mode supports the students to complete the experiment step by step according to the prompts under the guidance of the standard experimental steps. The practice mode is essentially a standard experimental process, a process-specific case. In this special case, users are not allowed to skip the experimental steps, and students are not allowed to operate freely, beyond the experimental steps to carry out experiments. Students must follow standard procedures in conducting the experiment. In the experimental steps, the user is allowed to try to operate error, and the wrong operation will have corresponding prompting information to help students understand which operation forms the wrong experimental phenomenon and experimental results, and form the correct experimental operation. The test mode allows students to start the experiment from any step without any prompting. The operation of the experiment is recorded in the topology diagram of the experiment workflow and judged in real time. The process topology diagram and the routing core of the topology diagram cooperate with each other to maintain the process state of the whole experiment. The process topology diagram uses the full link diagram as the data structure and representation for the current lab. Unlike the simple state machine or behavior tree model, the experimental process needs to support skip-step execution across states. In particular, we divide the experimental process into four states: sequential execution, parallel execution, reversible execution, and skip-step execution. And there is an ordered experimental operation link list in each state node. Each step of the user’s operation of the experimental equipment corresponds to a specific experimental operation monitoring point in the experimental status node. According to the parameters carried by the departure message, the comparison is carried out in the process topology diagram and combined with the current experimental step state; when the state transition is satisfied, the system transfers from one state to another state, and all the operations at that time are graded and settled during the state transition, and the corresponding comments are given. Finally, when the students finish the experiment, they can summarize it into an experiment report for the students and teachers to see. The state control interface is responsible for sending the state control commands to
31 An Experiment Process Driving Software Framework …
269
the interaction module and the simulation module to supervise the whole experiment cycle. With the support of the above three modes, the experimental workflow engine can cover all experimental operations, experimental steps, and experimental results. Through the logic control of the experimental workflow engine, reverse control of user input, and simulation results, the whole system functions around the experimental process, to achieve the combination of a variety of operating results for the simulation experiment.
31.3.3 Simulation Presentation Based on Scriptable Render Pipeline The experimental workflow is the core of the whole virtual simulation experiment, which controls the experimental reaction and the appearance of experimental phenomena. In the traditional virtual simulation experiment, the experimental phenomena are either written down by GPU rendering pipeline or written down by developers beforehand. The user can only perform the experiment according to the present unique standard operation workflow to present the corresponding experimental effect. If the user makes an error in the experiment, the simulation engine cannot present the corresponding error experimental effect. In the virtual simulation experiment of high school based on the framework of process-driven software, the experimental results are generated dynamically according to the experimental process and operation. A complete GPU rendering pipeline [12] includes vertex processing, slice processing, rasterization, and frame buffering. Shader [12] is a computer program that controls the GPU rendering pipeline. It allows developers to use code to dynamically control the parameters that make up the final image, such as pixels, vertices, textures, locations, color, saturation, brightness, contrast, and so on. A simulation effect algorithm is usually located in vertex processing and chip processing units. The experimental workflow control information controls the vertex processor and the chip processor. In order to ensure the overall control of the experimental simulation results, as shown in Fig. 31.4, we use shaders and scriptable render pipelines to add process control information to the rendering process. The user operates the experimental equipment, and two or more experimental equipment react to trigger the monitoring point of the experimental workflow. The experimental workflow engine dynamically computes the experimental workflow according to the data of the monitoring points and feeds back to the GPU rendering pipeline of the experimental simulation module through the state control commands. According to the different process states, adjust the experimental simulation algorithm parameters of the reaction between the experimental equipment to achieve different experimental results. If the precondition of the experiment is not achieved, even if the experimental equipment has some experimental effect, the user cannot observe and cannot draw
270
Y. Liang et al.
Fig. 31.4 Simulation presentation based on scriptable render pipeline
experimental conclusions, so as to accurately control the experimental simulation effect through the experimental process.
31.4 Experimental Use-Case Analysis The traditional experimental simulation software without experimental process driving usually organizes and realizes the experimental process control by means of the task system on the game. In detail, the software divides the experimental steps into tasks one by one, and the user completes the experimental steps when he reaches every goal. Receive the game task reward, which is to play a section of the experimental steps corresponding to the experimental effect animation. Because each task is executed linearly and the user must complete the prerequisite task to unlock the next task, the user cannot skip step by step, and the task system does not support multi-task and multi-state skip. For example, for a task set X, where X contains the four tasks “A, B, C, D”. As that, experimental steps “A-B-C-D” and “B-A-CD” are both correct experimental procedures; in order to adapt the task system to the diverse experimental processes, developers must present a large number of task scripts, which we called task chains. A large number of task chains not only result in the same experimental step corresponding to multiple task chains. The system cannot determine which task chain the user is following. The end system cannot track user actions. Task chain is forced to break, and any combination of experimental operations makes it impossible for developers to enumerate all cases. As long as one case is not taken into account, the experimental operation will have no experimental results and cannot be completely simulated. More importantly, a large number of task chains will lead to bloated system and reduced operating efficiency. In contrast, based on the process-driven high school experimental software framework, the task chain structure is transformed into a full link graph. For the system,
31 An Experiment Process Driving Software Framework …
271
Fig. 31.5 Taking crude salt extraction experiment
the full link graph contains the experimental state nodes, which contain the experimental operation decision information, and the direct state transition conditions of the nodes correspond to the experimental examination points and the experimental process monitoring points. Taking crude salt extraction experiment as an example, Fig. 31.5 shows a simplified diagram of experiment workflow topology generated by the software framework. From the comparison of the two flowcharts, it can be seen that the software framework driven by the experimental process fits the standard experimental steps of the teaching and researching teachers well, and also contains various experimental states and experimental skips in the experiment. Therefore, the software framework of this paper is more suitable for high school experimental simulation software.
31.5 Conclusion Aiming at the diversity of experimental process resources in high school, this paper proposes a software framework driven by experimental process. This framework is based on graph-based experimental process-driven engine, multimodal input based on adapter mode, and simulation presentation based on experimental process, and supports the whole process of the experiment with monitoring point trigger and state control mechanism. The software framework has been applied in the development of the National Key Research and Development Plan “Open Experimental Teaching Application Platform and Demonstration”, which proves the feasibility and practicability of the software framework. This framework not only provides the support of the bottom-level architecture for the experiment of virtual–reality fusion with reality in high school, but also has certain reference significance for other experimental simulation experiment systems.
272
Y. Liang et al.
Acknowledgements Funded Projects: Supported by National Key R&D Program (2018YFB1004905). National Natural Science Foundation Program (61872291, 61871320).
References 1. WISE Homepage. https://wise.berkeley.edu/. Last accessed 10 Jan 2019 2. PhET HomePage. https://phet.colorado.edu/. Last accessed 10 Jan 2019 3. Workflow Management Coalition in Workflow Management Coalition Terminology & Glossary. http://www.wfmc.org/standards/docs/TC-1011_term_glossary_v3.pdf. Last accessed 28 Jul 2019 4. Reichert, M.F., Rinderle, S.S., Kreher, U.T.: Adaptive process management with ADEPT2. In: International Conference on Data Engineering, vol. 2005, pp. 1113–1114. IEEE Computer Society Press (2005) 5. Van Der Aalst, W.F., Weske, M.S., Grunbauer, D.T.: Case handling a new paradigm for business process support. Data Knowl. Eng. 53(2), 129–162 (2005) 6. Pesic, M.F., Schonenberg, M.S., Van Der Aalst, W.T.: Declare full support for looselystructured processes. In: The 11th IEEE International Enterprise Distributed Object Computing Conference, vol. 2007, pp. 289–298. IEEE Computer Society (2007) 7. Guangzhi Chen, F., Rong Pan, S., Lei Li, T.: Overview of workflow modeling technology and its research trend. Comput. Sci. 41(S1), 11–17 (2014) 8. Yanni Zhao, F., Hualei Guo, S., Lan Shang, T.: Design and implementation of a flexible software framework. Comput. Technol. Development 25(11), 93–98 (2015) 9. Wen Hao, F., Lingmei Ai, S., Yinghui Wang, T.: Implementation method of extension point of three-tier software framework. Comput. Appl. 29(9), 2541–2545 (2009) 10. Lawrence Cabac, F., Michael Haustermann, S., David Mosteller, T.: Software development with petri nets and agents approach, frameworks and tool set. Sci. Comput. Prog. 157, 56–70 (2018) 11. Wending Xie, F.: Research on intelligent measurement and evaluation system of middle school students’ physics experiment operating ability based on digital experiment platform. Suzhou Univ. J. 2015, 75–84 (2015) 12. Lele Feng, F.: Introduction To Unity Shader, 2nd edn. People’s Post And Telecommunications Publishing House, Beijing (2016)
Chapter 32
Text to Complicated Image Synthesis with Segmentation Information Guidance Zhiqiang Zhang, Yunye Zhang, Wenfa Liu, Wenxin Yu, Gang He, Ning Jiang, and Zhuo Yang
Abstract In this paper, we propose a novel method called Segmentation Information Guidance (SIG). In this method, additional segmentation information is added to guide the process of text to complicated image synthesis. We demonstrate the effectiveness of SIG model on Microsoft Common Objects in Common (MSCOCO) dataset. It proves that the image results generated by directly using the segmentation image are more authentic and coherent than that without background.
32.1 Introduction The automatic synthesis of images from text descriptions in computer vision has been an active research topic and has tremendous applications, including auxiliary structural design, scene reduction, etc. For this task, there are two sub-tasks that need to be completed: natural language understanding and image synthesis. For the natural language understanding, inspired by the success of natural language processing, like Recurrent Neural Networks (RNN) especially Long-Short Term Memory (LSTM) [1] and Gated Recurrent Unit (GRU) [2], the text descriptions can be encoded into semantically meaningful vectors. For the second task, the highly multimodal problem is the biggest hindrance. Recently, Generative Adversarial Networks (GAN) [3] has showed enormous potential for image synthesis. It has resolved the multimodal problems well. As suffering the threat of instability, however, it is hard to obtain convincing results sometimes. To solve this problem, Deep Convolutional Generative Adversarial Networks (DCGAN) [4] combined with GAN and Convolutional Neural Networks (CNN) have been applied to a series of datasets such as MSCOCO [5] and showed compelling results. Z. Zhang · Y. Zhang · W. Liu · W. Yu (B) · G. He · N. Jiang Southwest University of Science and Technology, Mianyang, China e-mail: [email protected] Z. Yang Guangdong University of Technology, Guangzhou, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_32
273
274
Z. Zhang et al.
In the past few years, some successful achievements have been achieved in the field of text-to-image synthesis. Reed et al. [6] proposed GAN-CLS which implement the generation of text-to-image successfully. Nevertheless, GAN-CLS limits to simple object datasets like birds and flowers. It fails to generate plausible high-resolution and complicated images. Zhang et al. [7] proposed StackGAN which takes advantage of two stages (Stack-I GAN and Stack-II GAN) to obtain high-resolution images. It does well in getting high-resolution images. Another effective way to obtain highresolution images is AttnGAN [8] which has achieved great success with attention mechanisms. However, it also fails to obtain complicated images. In this paper, we propose a novel method which make use of segmentation image to guide the processing of text-to-image synthesis. As a teacher tell students where and what to draw, segmentation image tells text the basic outline information of objects. We demonstrate the effectiveness of our method in MSCOCO dataset and compare with GAN-CLS, StackGAN, and AttnGAN. The comparison results show that it can generate better result images with more advanced detail processing. We also demonstrate that using the segmentation image can get better results than that only using segmentation content without background. The rest of this paper is arranged as follows: Sect. 32.2 introduces the related technique. Section 32.3 describes the detailed proposal including convolutional model, encoding model, and text-to-image model. Section 32.4 shows the experimental results. Section 32.5 concludes our work.
32.2 Background 32.2.1 Text Coding Module In order to achieve the text encoding for semantic understanding efficiently, we use skip-thoughts model [8] which can encode text to high quality skip-thought vectors. The model utilizes an RNN encoder with GRU [2] activations which iterate the recurrence relation in the following Eqs. (32.1)–(32.4): r t = σ (Wr ∗ x t + Ur ∗ h t−1 )
(32.1)
z t = σ (Wz ∗ x t + Uz ∗ h t−1 )
(32.2)
h = tanh(W ∗ x t + U ∗ (r t • h t−1 ))
(32.3)
t
h t = (1 − z t ) • h t−1 + z t • h
t
(32.4)
where r t denotes the reset gate, and zt is update gate. x t denotes the input of time t. t They belong to the GRU model. ht and h denote the memory and the state update
32 Text to Complicated Image Synthesis …
275
at time t, respectively. The W (W r or W z ) and U (U r or U z ) in the equations are the corresponding weights.
32.2.2 Generative Adversarial Networks Generative adversarial networks learn the distribution of the original data through the game theory. The networks are composed of a generator G and a discriminator D. G and D play a game called minimax game: the generator needs to try its best to generate fake images which make D confused. And the discriminator needs to distinguish between real and fake images as far as possible. The specific process is shown in the Eq. (32.5): min max V (D, G) = G
D
x∼Pdata(x)
log[D(x)] +
z∼PZ (z)
[log(1 − D(G(z)))] (32.5)
where x and z denote original image and noise, respectively. Pdata(x) and PZ(z) represent the distribution of original data and noise, respectively.
32.3 Method 32.3.1 Segmentation Image Guidance A difficult issue for synthesis images conditioned on text descriptions is highly multimodal problem. For example, the sentence “two baseball players standing on the field” can correspond to several different images which all conform to semantic. It is a very natural application for GAN to solve this challenge. But it is only for simple images since the fixed shape and size are available to them. By contrast, the shape and size of complicated images are not fixed. So, it is impossible to acquire pretty results only by substantial number of samples training. It must be appended more additional information to enhance the quality of results. We can regard the task of complicated images synthesis as painting. The first and most important step in painting is to know the basic contour information of objects. Fortunately, this information can be obtained directly in segmentation images. The segmentation image marks the basic outline information of the objects in the image. It can be seen from Fig. 32.1 that different objects in the segmentation image are marked by different colors. This kind of marking is an information mark of general outline, which is exactly the guidance information we need. In our method, it acquires the outline information of segmentation images by AlexNet [9] structure at first. And we regard this information as a mentor to guide the
276
Z. Zhang et al.
Segmentation image
De-background result
Fig. 32.1 The segmentation images and corresponding de-background results
entire generation process. More specifically, we adopt the output from the seventh layer of AlexNet as the segmentation information. Since semantic information is provided by text which could not make sure the specific shape and size of the object yet, the purpose of introducing contour is to guide the text to generate images in the correct direction. Similarly, it is like that the teacher guides the students where and what to draw. After continuous iterative training in terms of the way of combine semantic and outline information, the generator model that meets our expectations can be obtained.
32.3.2 Network Structure Our model includes three modules: (1) segmentation image convolution module, (2) text encoding module, and (3) text-to-image synthesis module. First, for our segmentation convolution module, we use the structure of AlexNet [9] to obtain the convolution feature. The output of the seventh layer is the segmentation information that we need. Next, for text encoding module, we utilize the skip-thoughts model [10] which includes an RNN encoder with GRU [2] activations to get text vectors with high quality. Skip-thoughts model encodes text into skip-thoughts vectors and these vectors show amazing performance in tasks such as semantic relatedness, paraphrase detection, image-sentence ranking, and question-type classification. Last, we adopt the combination of DCGAN and RNN as the basic structure for text-to-image synthesis. On this basis, the segmentation images are added as a guide. The noise variables are combined with text vectors as latent variables input to the network. During the generation, the segmentation information mainly plays a guiding role. The text information is mainly used to generate images. And the noise z is randomly generated by the normal distribution. G generates fake image through training after receiving the input. In our text-to-image synthesis module, “matched sentences” indicate that the text is consistent with the image described by the text.
32 Text to Complicated Image Synthesis …
277
Fig. 32.2 The structure of our Segmentation Information Guidance (SIG) model
Correspondingly, “mismatched sentences” indicate inconsistency. There are three situations as input for D, that is, fake image with matched sentences, real image with matched sentences and real image with mismatched sentences. The main task of D is to identify these inputs. It needs to recognize that the first and third are wrong and the second is correct. The overall model is shown in Fig. 32.2.
32.3.3 Training The first step in training is text encoding, which requires all text descriptions to be encoded into corresponding text vectors. And the next step is to process the segmentation images and to get the segmentation guidance information that we need. And then the text-to-image model will use the input to learn image synthesis. Of course, the key to generate high quality results is the guidance information. By the way, all images in the dataset are converted to 64*64 size and each image corresponds to five text descriptions. And the acquisition of segmentation image is directly downloaded from the official website. For AlexNet structure, the stddev that we use is 0.1. And in the convolution layers, the convolution kernel is 11, 5, 3, 3, and 3, respectively, corresponding step length is 4, 1, 1, 1, and 1. The size of convolution kernel is 3. The step lengths of three pooling layers are 2. With regard to training text-to-image model, initial learning rate of 0.0002 that we use and the Adam [11] optimization with momentum 0.5. A minibatch size of 64 that we use and train for 300 epochs.
278
Z. Zhang et al.
32.4 Experimental Results The experimental results are mainly divided into two parts, namely, the horizontal contrast and vertical contrast of the results on the MSCOCO dataset. Horizontal contrast is the comparison of the results among our methods (SIG-IMG and SIGINFO), GAN-CLS [6], StackGAN [7], and AttnGAN [8]. Vertical contrast is the contrast between our two methods.
32.4.1 Horizontal Contrast We evaluated our method on the MSCOCO dataset. The results of proposed method are compared with those of other methods, such as GAN-CLS, StackGAN, and AttnGAN by subjective evaluation. In the beginning, we get the foreground results from the segmentation images (the results are shown in Fig. 32.1). The specific results of comparison are shown in Fig. 32.3. As can be seen from the comparison results, the background generated by each method is relatively correct. However, in the aspect of object generation, GAN-CLS just generates the relevant background information well. For StackGAN and AttnGAN, their performance in object generation is also awful. By contrast, our results are more realistic in content. The experimental result of the second proposed method is obtained by using the segmentation information without background. This method is called SIG-INFO. We also compared it with GAN-CLS, StackGAN, and AttnGAN. The comparison results are shown in Fig. 32.4. The results of SIG-INFO are still better than those in terms of the overall content of the generated image. In contrast, the results of other methods only have good performance on the background, the overall content is still bad. The horizontal comparison results show the effectiveness of our method in generating object content.
32.4.2 Vertical Contrast The segmentation image contains not only the guidance information that we want, but also the background information. In order to determine the role of background information, a second experiment is conducted by using segmentation information without background. As for background removal, the main idea is to locate the segmentation area and then turn all the other non-segmentation into white. Region location is achieved by searching for the black border around the segmentation region.
32 Text to Complicated Image Synthesis … a man in a wet suit riding on a surfboard
a baseball player hitting a baseball with a bat at home plate
279 two skiers with backpacks head up a mountain
two men pose for a photo on skiis
GT
GAN-CLS
StackGAN
AttnGAN
SIG-IMG
Fig. 32.3 The comparison results among GAN-CLS, StackGAN, AttnGAN, and SIG-IMG
We compare the results of these two methods, as shown in Fig. 32.5. The results of SIG-IMG have better background than SIG-INFO. And the results of SIG-IMG are smoother. They are with better authenticity and coherent than those of SIG-INFO. According to the comparison results, it shows that the background information plays an active role in the process of synthesizing image.
280
Z. Zhang et al. group of people exercising on the beach
a woman sitting on a surfboard in the ocean
a person who is skiing down a snowy hill
a crowd of skiers gather around the top of a snowy slope
GT
GAN-CLS
StackGAN
AttnGAN
SIG-IMG
Fig. 32.4 The comparison results among GAN-CLS, StackGAN, AttnGAN, and SIG-INFO
32.5 Conclusion In this paper, we propose a new method which named SIG to generate complicated images based on text descriptions. Compared with other methods, such us GANCLS, StackGAN, and AttnGAN, the results of our method are more reliable in terms of overall content. And our results on MSCOCO dataset show that the background information in the segmentation image plays an active role in the process of text-toimage synthesis. In the future, we will refer to the architecture of StackGAN and AttnGAN to generate high-resolution image results.
32 Text to Complicated Image Synthesis …
a man riding a wave on top of a suroard
a person on skis standing on a snow covered slope
281 a young man balances on a there is a group of people standing on suroard with one hand on the wave the beach
GT
SIG-IMG
SIG-INFO
Fig. 32.5 The comparison results between our SIG-IMG and SIG-INFO method
Acknowledgements This research was supported by 2018GZ0517, 2019YFS0146, 2019YFS0155, which supported by Sichuan Provincial Science and Technology, Department, 2018KF003 Supported by State Key Laboratory of ASIC & System. No. 61907009 Supported by National Natural Science Foundation of China, No. 2018A030313802 Supported by Natural Science Foundation of Guangdong Province, No. 2017B010110007 and 2017B010110015 Supported by Science and Technology Planning Project of Guangdong Province.
References 1. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 2. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: CoRR abs/1412.3555 (2014) 3. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) 4. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016) 5. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan D., Dollár P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision Proceedings Prat V, pp. 740–755 (2014)
282
Z. Zhang et al.
6. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069 (2016) 7. Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: International Conference on Computer Vision, pp. 5908–5916 (2017) 8. Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018) 9. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017) 10. Kiros, R., Zhu„ Y., Salakhutdinov, R., Zemel, R., Torralba, A., Urtasun, R., Fidler, S.: Skipthought vectors. In: Advances in Neural Information Processing Systems, pp. 3294–3302 (2015) 11. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Chapter 33
Application of VR Technology in Presentation of Campus Architecture Animation—A Case Study of City College of WUST Yu Qian Abstract Based on the example of City College of WUST (Wuhan University of Science and Technology), this study explores and practices the application of VR technology in the design of video animation of campus architectures. Except for collecting and collating the background of research subject and domestic as well as overseas research status, this paper analyzes the role of campus architecture animation in the university promotional video, and sorts out the design concept of campus architecture animation of City College of WUST. On this basis, the design idea, technical path, and the design process are clarified, and the sub-camera design is decomposed as well as divided. Meanwhile, by teasing out the general process of making 3D virtual campus animation, this paper summarizes the steps of scene modeling, lighting layout, material and mapping, animation production, and rendering output. In this way, the processes of making scenes, such as campus bird’s-eye view animation, are illustrated, so as to elaborate on its technical path. This study conducts a positive exploration of the forms and prospects of applying campus architecture animation in campus promotional videos.
33.1 Introduction 33.1.1 Background The progress of computer science has popularized the application of VR technology in social life. As a technical means of creating and experiencing a virtual world, the simulation of 3D dynamic scenes produces a strong sense of immersion. The early application of VR technology in education field is the construction of virtual campus, which includes the simulation of campus environment, campus resources, and
Y. Qian (B) Arts Department, City College of WUST, Wuhan 430083, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_33
283
284
Y. Qian
school planning. In addition, the construction of virtual campus is also the development direction of future campus digital construction, so the study on virtual campus roaming is of great significance to the design and development of virtual campus. By means of providing a more sensory expression for campus publicity, planning and management, virtual campus roaming allows users to experience campus services. This is an ideal platform for the campus digitalization construction, because it will greatly enhance the school reputation and improve campus modernization management [1].
33.1.2 Domestic and Overseas Research Status The computer technology has developed so rapidly that people can perfectly create almost all objects that exist in reality or imagination through 3D modeling software, which makes its application almost everywhere. (1) Overseas Research Status The overseas research and application of virtual roaming technology is relatively early, and this technology has been popularized in schools for a long time. As presented in Status of Domestic and Overseas Research on VR Technology—the journal article published by Jiang Xuezhi and Li Zhonghua in 2004, and Development Status, Policy and Implications of American VR Technology—the journal article published by Wang Jianmei, Zhang Xu, etc. in 2010, overseas researches on VR technology had already involved in different aspects, including environmental simulation, human-computer interaction, graphics processing, artificial intelligence, etc. At present, VR technology has been widely used in the education field in the United States. Therefore, the information construction of American university campuses has achieved remarkable results and it covers the library network, school management, teaching activities, scientific research activities, and students’ daily life. (2) Domestic Research Status Compared with foreign countries, VR technology started late in China with scarce applications in constructing virtual campus roaming. However, in recent years, many China’s universities have created a relatively mature virtual campus system based on this technology. The emergence of virtual campus system enables users to learn all the landscapes and structures on campus while keeping indoors. This high-tech will gradually enter each of China’s universities. Therefore, the research of this topic also plays a strong practical significance [2]. For example, Wuhan University of Technology has developed a system that can complete panoramic roaming through actual photos, which has been applied to campus roaming (see Fig. 33.1). From the construction of virtual campus roaming in many domestic universities, it is obvious that the construction of virtual campus roaming is becoming more popular
33 Application of VR Technology in Presentation …
285
Fig. 33.1 Panorama of Wuhan University of Technology Campus
in China. Thus, instead of liming to single type, various universities adopt different methods to construct the virtual campus roaming so as to meet different needs.
33.1.3 Subject Significance and Research Value Combined with film-making techniques, campus architecture animation is a scene roaming of campus space through continuous dynamic frames, including interior and exterior spaces, road planning, garden design, and so on. Such 3D visual representation can render a sense of immersion. On the one hand, campus architecture animation can be used as the basis of virtual campus roaming, and it positively facilitates the construction of digitalization campus platform. On the other hand, campus architecture animation, thanks to its realistic and artistic expressions, has become an effective means of planning, designing, and promoting campus image [3]. With the increasingly fierce competition for source of students, the campus promotional video has developed into an important means to expand universities’ popularity and attract high-quality students. From the perspective of manifestation pattern, there are many similarities in the contents and techniques because most of promotional videos adopt real-life filming. As a new expression form, architecture animation can produce very realistic effects on the model making, material and color presentation. Together with the great freedom of the lens, use of special effects, contrast of light and shadow, the image quality can produce strong artistic appeal and sci-fi sensibility which cannot be realized through location shooting. In particular, architecture animation is uniquely advantaged in manifesting large scenes. The role of architecture animation, as the continuous advancement of VR technology, will become more prominent in campus promotional videos. The purpose of this study is to explore and practice the application of VR technology in the design of video animation of campus architectures, by taking City College of WUST as an example, and then actively explore the application form and prospect of campus architectural animation in campus promotional videos.
286
Y. Qian
33.2 Design Conception of City College of WUST Campus Architecture Animation Based on the example of City College of WUST, and in combination of 3D and posteffect special effects software, this study presents the 3D virtual scene of campus environment by editing and synthesizing through montage method. Meanwhile, and also, the campus culture, geographical location, and school-running characteristics are manifested; a positive exploration is performed to apply campus scene animation in the design of campus promotion video design.
33.2.1 Design Ideas Ranging from the university gate to the greenway of East Lake, the campus architecture animation of City College of WUST takes campus road as the path and adopts students’ daily study as the main clue. By making a distinction in spaces, it embodies the beautiful campus scenery and rich campus culture. This work aims to make and display the 3D virtual effect of City College of WUST, and manifest the campus in the form of promotional video, so that the society, parents, and students can more intuitively understand the campus environment of City College of WUST. Considering that many scenes are involved, this campus architecture animation produces by dividing into different scenarios. Then it edits and synthesizes through montage technique in the time and spatial sequences. At the same time, the virtual simulation scene and locating shooting are integrated naturally to generate a sense of false or true complement. Apart from exerting the strengths of VR technology in expressing sci-fi and special effects, this combined manifestation method also integrates the realism as well as affinity of real-life images, reflecting better innovation.
33.2.2 Technological Path The campus architecture animation of City College of WUST is mainly made through 3dmax software production, including the bird’s-eye view, gate, teaching building, library, training center, canteen, Swan Lake, Sakura Avenue, etc. Supported by the post-editing software such as Premiere and After Effects, the campus animation video is completed.
33 Application of VR Technology in Presentation …
287
33.2.3 Design Process The design process of this campus architecture animation can be divided into the stages as follows: (1) Solution Design: Determining the theme, presentation, scripting, sub-camera, and labor division of making the animation; (2) Model Making: Low-poly scene model, high-poly scene model, and integration of sub-camera scenes; (3) Animation Design: Creating camera animation, character animation, and special effects; (4) Effect Output: Scene material settings, light settings, sequence frame, and render output; (5) Post-synthesis: Soundtrack selection, post effects of sub-camera sequence frame, video clip output, and video handover test.
33.3 3ds MAX (Three-Dimensional Studio Max)-Based Realization of 3D Virtual Campus Animation Roaming 33.3.1 General Process of Making 3D Virtual Campus Animation (1) Data Collection and Compilation Data collection is the first step in making architecture animation. Since the campus architecture animation involves many objects, in order to ensure that the scene design is basically in line with the actual situation, an on-the-spot survey is necessary to record the external dimensions, shape, number of layers, relative position, and color of architecture. In addition, for relatively complex model and texture, it is required to follow the principle of overall filming before local shooting. Therefore, data collection is very significant. (2) Scene Modeling There are various architecture models and modeling methods. Meanwhile, the same model can be realized in different ways, which depends on individual proficiency and operational complexity. Here the basic modeling, modified modeling, composite modeling, and mapping modeling are adopted [4]. (3) Lighting The lighting in 3D scene is decisive for the tone and atmosphere of the scene, because it can provide the scene with deeper and wider imagination, and show a rich level. The
288
Y. Qian
default lighting settings in 3ds Max system cannot meet the needs of architecture animation. Therefore, under the condition of ensuring realistic effects and giving consideration to the rendering speed, the standard light sources, such as floodlights, spotlights, and parallel lights, are selected to highlight the three-dimensionality of the campus. (4) Material and Map The standard materials in 3ds Max include parameters of diffuse color, highlight color, luminosity, opacity, reflectivity, and refraction. These can create most of the basic materials to meet the needs of architecture models, and achieve efficient rendering effect. The model has texture must be matched with map. The most commonly used map is the “bitmap”. (5) Animation Making The animation is mainly realized through two methods. One is the motion of the model itself, such as the motion of the car in the scene, people’s walking, etc.; the other is the change of scenes with the motion of camera or target point. The animation can be completed by setting key frames or path constraints. Relatively speaking, path constraints will allow the model, camera or target point to perform more complex arbitrary displacement movements, which is easier than key frame recording [5]. (6) Render Output, Synthesis and Virtual Roaming When completing the model, material, lighting, and animation, it is available to render a single frame to observe the effect and output it directly after confirmation. The output format can be either a video or an image sequence. Due to more models, more complex scenes, high memory occupation, system halting or suspension, it usually takes a long time for rendering campus architecture animation. In order to avoid repeated rendering caused by problems, it is recommended to use image sequence output [6].
33.3.2 Production of Campus Bird’s-Eye View Animation The quality of campus architecture 3D model directly affects the effect of architecture animation. Therefore, in the process of model making, it is necessary to display the campus terrain and landscape as realistic as possible. The 3D modeling in this project is mainly to make the models architecture, road, greening, and other facilities (Fig. 33.2). (1) Creation of Terrain Firstly, by means of the campus CAD drawing of City College of WUST, and combined with the topographic map in Google Maps, a bird’s-eye view terrain is created. During this process, the road, grassland, buildings, and related locations are outlined through Edit Spline, so as to extrude the terrain thickness.
33 Application of VR Technology in Presentation …
289
Fig. 33.2 Sub-camera of Bird’s-eye view animation
(2) Creation of Architecture Due to the large number of buildings in the bird’s eye view, all the teaching buildings are completed with low-poly models to reduce the number of scenes. Firstly, take photos of all the buildings and then correct the color of the captured image by Photoshop and acquire bitmaps of 256 * 256 and 512 * 512 via cutting; and finally, map the texture through the Seamless Texture. In the terrain map, selecting the horizontal outline of buildings to generate a lowpoly model; specifying the multi-dimensional sub-object material to assign the wall material to the low-poly architecture model. (3) Creation of Trees and Plants Based on the Forest and MultiScatter plug-ins, all the trees in the bird’s-eye view are generated at the specified location. Sketching locations of street trees, and creating street trees by using Forest. (4) Camera Animation Creating a target camera with a focal length of 28 mm; adjusting the position of top and front views; then drawing an arc in the top view and adjusting to the appropriate position as the trajectory of the camera motion; controlling the camera motion by the path constraint tool. The animation duration is about 13 s. Opening the curve editor, selecting the camera, and adjusting the curve to move at a uniform speed. (5) Material and Light Assigning materials to the buildings in the scene by using the multi-dimensional sub-objects, and specifying materials to all the trees. Creating a vr sunlight in the top view and adjusting relevant parameters. (6) Render and Post-production Setting the output to *.rpf format, checking the Z depth, material ID and object ID options for render output. Opening “Adobe After Effects” software, importing rendered sequence frames, adding fog effects and special effects of bird’s-eye view through 3d channels, and then outputting as bird’s-eye view.mp4.
290
Y. Qian
33.3.3 Production of Other Scene Animation The following manifests the effects of other scene animation. The explanation of the process of making the specific scene will not be repeated. With a total duration of about 19 s, Sakura Avenue animation consists of three sub-cameras. The first and third sub-camera are 3dmax scene solid-molded, and the second sub-camera is a synthesis of location shooting (Fig. 33.3). With a total duration of about 25 s, Swan Lake animation consists of four subcameras. The first, second, and third sub-camera are 3dmax scene solid-molded, and the fourth sub-camera is a synthesis of location shooting (Fig. 33.4). With a total duration of about 9 s, dandelion animation consists of two subcameras. The first sub-camera is 3dmax scene solid-molded, and the second subcamera is a synthesis of location shooting (Fig. 33.5).
Fig. 33.3 Sub-camera of Sakura avenue animation
Fig. 33.4 Sub-camera of Swan Lake animation
Fig. 33.5 Sub-camera of Dandelion animation
33 Application of VR Technology in Presentation …
291
33.4 Conclusions On the basis of making 3D virtual campus animation roaming of City College of WUST, this paper discusses the theory and technical methods for realizing 3D virtual campus promotional animation based on 3d MAX, After Efect, PremierePro and other software. In addition, it reveals the ways and techniques for 3D modeling, material and map, lighting, animation settings, and post-synthesis of 3D animation roaming in the process of achieving 3D virtual campus animation roaming. The work has good practical value, and has broad application prospects in many fields, such as education and training, medical treatment, scientific research, tourism and games [7]. The development of VR and AR technologies has brought new means of manifesting campus architecture animation videos. In the form of multi-sensory integration and human-computer interaction, users are motivated to actively experience the visual impact generated by VR technology. Because 2D drawings are difficult to vividly represent buildings in the real world, ordinary human images are difficult to achieve breakthroughs in visual effects due to the limitations of the objective environment. People need more intuitive and more visual expressions. Virtual reality technology is well suited to human needs in this regard. Through virtual reality technology and architectural animation technology, people can feel the style and charm of the whole building more clearly [8]. Of course, technology is only a means, and a good campus video is mainly to explore the connotation of campus culture and the appeal of artistic expression. This design exploration expects to provide ideas for the design of future campus promotional videos.
References 1. Zhou, B.: Application and research of virtual reality roaming. China Academy of Art 5–10 (2016) 2. Wei, W., Ma, J.: Application of 3ds Max in virtual campus architecture animation. J. Anqing Teachers (JCR-SCI) (3), 45–48 (2016) 3. Shen, S.: Application of VR technology in architectural display——3ds Max-Based campus architecture animation research. Industr. Design 5, 147–148 (2015) 4. Liang, Y.W.: Design and production of campus architecture animation. Henan Sci. Technol. (5), 35 (2017) 5. Li, J.Z.: Realization of virtual campus of Zhengzhou information engineering vocational college based on 3ds Max. Modern Market. 5, 81–84 (2017) 6. Hu, J.S., Kang, J.R.: 3ds MAX-based realization of 3D campus roaming map. Geometr. Spatial Informat. Technol. 5, 5–10 (2018) 7. Sun, M.Y.: Application of Digital Media Technology in Architectural Animation. Research on Transmission Competence, vol. 205 (2017) 8. Sun, M.Y.: Application research of virtual reality technology in interactive architectural animation. Nanjing Arts Institute, 9–10 (2018)
Chapter 34
Design of Video Mosaic System Fei Yan, Wei-Qi Liu, Yin-Ping Liu, Bao-Yi Lu, and Zhen-Shen Song
Abstract With the development of embedded technology and the popularity of ultra-high-definition display, people’s requirements for the quality of display are increasing. This paper designs a four-way high-definition video mosaic system. The system uses four-way high-definition 1080P image as input. After image data buffering, frame synchronization, multi-channel mosaic, and other processing, it can output 3840 × 2160@60 Hz ultra-high-definition video image in real time through DP1.2 interface. The system adopts FPGA chip, which type is XC7K325TFFG900-2. With four DDR3 memory chips, it can process multi-channel image data at least 24 Gb/s in real time. After the program is mapped to the chip, the resource consumption of look-up table is 25.96%, the resource consumption of BRAM is 40.45%, and the total dynamic power consumption is less than 5 W. The experimental results show that the system supports continuous video acquisition and real-time playback, and has broad application prospects in video surveillance, medical consultation, multimedia display, and other fields. At the same time, the hardware system is expected to be popularized in the acquisition, transmission, and processing of three-dimensional stereo images.
F. Yan (B) · W.-Q. Liu · B.-Y. Lu · Z.-S. Song Automation College, Nanjing University of Information Science & Technology, Nanjing 210044, China e-mail: [email protected] F. Yan CICAEET, Nanjing University of Information Science & Technology, Nanjing 210044, China Y.-P. Liu College of Atmospheric Physics, Nanjing University of Information Science & Technology, Nanjing 210044, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_34
293
294
F. Yan et al.
34.1 Introduction Usually, computer uses common Digital Visual Interface (DVI), Video Graphics Array (VGA), and other interfaces as output [1], which can also be used to complete the front-end image acquisition of video processors. This type of interface occupies a larger bit width and more pins of the main control chip. The image transmission bandwidth of the interface is generally not enough. Dual-Link DVI or DVI 2.0 can only support the image resolution for 2560 × 1660 pixels. Otherwise, the influence of wire length is greater, and the transmission of video signal is not very far. With the development of image display technology, the image clarity is getting higher and higher from 720P to 1080P to ultra-high-definition 4K (3840 × 2160 pixels) UHD (Ultra-High Definition), and the content of its transmission is becoming richer and richer. 4K display technology has been widely used in multimedia, industry, virtual reality, and other fields [2, 3], and the implementation schemes are also different. Murakami et al. implement efficient 4K UHD video coding and decoding based on SoC hybrid storage architecture [4]. Jang et al. use VLSI framework to accelerate the capture or playback of 4K UHD audio and video data [5]. Chen also performs image detail enhancement processing based on this framework [6]. In recent years, the use of FPGA (Field-Programmable Gate Array) for high-speed data acquisition, image processing and terminal display applications has a growing trend [7–9] due to the advantages of high-speed parallel processing ability and flexible configuration of the FPGA. With the improvement of image sharpness, the requirement of highspeed image caching technology is getting higher and higher. In video processing, high-speed image caching technology is urgently needed to meet the real-time storage requirements of ultra-high resolution images. Traditional SDRAM (Synchronous Dynamic Random-access Memory) memory has been finding difficult to meet the real-time storage requirements of ultra-high-definition images. DDR3 (Double Data Rate) memory are widely used in some high-speed data storage scenarios because of its advantages of high speed, large capacity, and strong anti-interference ability. Meanwhile, it has become a hot research direction to incorporate image processing, recognition, feature extraction, and other algorithms [10–13] into the hardware platform to make it run efficiently in embedded systems. It can be seen that the acquisition and processing of UHD video based on the framework of FPGA have become the trend of development in recent years. This design also adopts this scheme for four-way HD (High-Definition) video acquisition and one 4K UHD display.
34 Design of Video Mosaic System
295
Control Card
SDI IN1 SDI IN3
FPGA
Configue
Clock
Video 1
Video 2
Video 3
Video 4
DP OUT
SDI IN4 Power
4K Monitor
2160
SDI IN2
1920
1080
Video Sourse × 4
DDR3
3840
MCU
Fig. 34.1 System design block diagram
34.2 Overall Framework The overall design block diagram of the system is shown in Fig. 34.1. It consists of three parts: video source device, FPGA control card, and DP output with 4K UHD. The input port of the FPGA card adopts four channels of high-definition video input. After the synchronous buffer and splicing of the FPGA, the 4K ultra-high-resolution image is generated and output is sent to the 4K monitor. Four channels of real-time video from different sources can be displayed simultaneously on the 4K display. The resolution of each channel of high-definition video is 1920 × 1080. The main control chip of the system is based on the XC7K325TFFG900-2 FPGA of Xilinx-Kintex-7 series. The four DDR3 SDRAMs used as image buffer are MT41J128M16JT-12; single DDR3 has 128M × 16 bit capacity and 16-bit data bus width. Reading and writing control of DDR3 use MIG (Memory Interface Generator) controller scheme. Because the core design of this system lies in the flexible operation of DDR3, the logic implementation part focuses on the storage and control operation of video and makes a detailed analysis of the bandwidth of data stream.
34.3 Logical Realization The logic design block diagram of the FPGA system is shown in Fig. 34.2, which mainly consists of frame detection module, DDR3 writing control module, DDR3 reading control module, MIG controller based on AXI4 (Advanced extensible Interface), user interface to AXI4 interface module, timing generator module, and image mosaic module.
296
HD × 4 IN
F. Yan et al.
frame detection timing generator
UHD OUT
image mosaic
Writing Control writing-address control writing FIFO Reading Control reading-address control
user interface to AXI4 interface
MIG controller based on AXI4
DDR3 ×4
reading FIFO
Fig. 34.2 Systems logical design block diagram
34.3.1 Design of MIG Controller The first consideration in controller design is the memory bandwidth, which mainly depends on the data bandwidth of the stored image. The calculation of image data bandwidth is as follows (34.1): B = Iw × Ih × F × Bpp × 2
(34.1)
Among them, B denotes the bandwidth of the controller, I w denotes the effective width of the image, I h denotes the effective height of the image, F denotes the frame rate of the image, Bpp denotes the bit width of the pixel, and multiplied by 2 denotes the reading and writing. The formula for calculating memory bandwidth is as follows (34.2): BDDR = Wdata × f
(34.2)
Among them, BDDR is memory bandwidth, W data is DDR3 data bus width, and f is DDR3 working frequency. The system uses four channels of high-definition image input and one channel of ultra-high-definition image output to implement the buffer control system. The size of the image is 3840 × 2160 pixels, the frame rate is 60 Hz and the 8-bit pixel format is set. According to the above formula, the bandwidth of image data can be calculated to be about 23.89 Gb/s. According to the 80% bandwidth utilization of memory, the DDR3 working frequency can be retrieved to be about 470 MHz. Because DDR3 uses bilateral edge to transmit data, setting the MIG controller working frequency to 333.33 MHz (means that DDR3 working at 666.66 MHz) can meet the design requirements.
34 Design of Video Mosaic System
297
34.3.2 Interface Design of MIG Controller The system adopts the MIG controller scheme based on AXI4 interface. AXI4 interface mode supports burst reading or writing of 512 bits up to 256 times, which facilitates the cache processing of high-speed image data. However, in view of the complexity of the read–write operation of AXI4 interface, The AXI4 interface of MIG controller is converted logically. The improved interface is relatively independent in reading and writing, which is convenient for users to read and write. User interface transition state machine includes eight states, such as IDLE, WR_CMD, MEM_WRITE, etc. The relationship between states is shown in Fig. 34.3. IDLE represents the start of the program, and it is a waiting state. When the wr_req(writing-request) signal is received, it enters the WR_CMD state. When the wr_ack(writing-ack) signal is received, it enters the MEM_WRITE state, otherwise, it keeps the previous state. WRITE_END indicates entering this state when a wr_data_finish signal is received. RD_CMD indicates entering this state when the rd_req(reading-request) signal is received. MEM_READ indicates entering this state when rd_ack(reading-ack) signal is received. When the rd_data_finish signal
IDLE
wr_req = 1'b1
wr_req or rd_req ?
rd_req = 1'b1
RD_CMD
WR_CMD N
N wr_ack?
rd_ack?
Y
Y
MEM_WRITE
MEM_READ N
N wr_data_finish?
rd_data_finish? Y
Y WRITE_END
END
Fig. 34.3 Flowchart or DDR3 read–write control
READ_END
298
F. Yan et al.
Writing Data
Reading Data
Fig. 34.4 DDR3 read–write simulation
is received, it enters the READ_END state. When it reaches the end of reading or writing, it automatically enters END state and jumps to IDLE state.
34.3.3 Design of DDR3 Read–Write Control User interface data bus width is 256 bits, address bus width is 28 bits (Expressed in B[27:0]). Among them, B[27:26] divides the memory address into four banks, and takes two banks as two-body for interactive operations, which are used for alternate read–write operations of each frame image. B[25:5] is an image storage area for caching valid image data. B[4:0] complements 0 to ensure that the address is not duplicated in burst reading or writing. When the DDR3 read–write simulation is operated, the test module generates read–write instructions and read–write data, and compares the reading data with the writing data. The result of the comparison is output by a signal. If the signal is high, the read–write error is indicated. Otherwise, the read–write is normal. From the simulation diagram, it can be seen that the user interface is alternately controlled by reading and writing with the burst length of 128. After writing a packet of data to a starting address, a packet is read out from that address. After reading and writing each address, the length of a burst packet is added 128. The read–write simulation is shown in Fig. 34.4.
34.3.4 Design of Sequence Generator The frame detection module is used to determine the high 2 bits in the write address. Firstly, a 2-bit counter is defined. When the rising edge of the synchronous signal is
34 Design of Video Mosaic System
299
picked up, the counter accumulates 1. The low value of the counter represents the bank that the image is currently written to. The image output adopts 4-pixel interface timing, VS (Vertical Synchronization) is the video field synchronization signal, HS (Horizontal Synchronization) is the video line synchronization signal, and DE (Data Enable) is the effective video signal. Image timing includes vertical and horizontal timing. Vertical timing includes four parts: field synchronization (V_S), field front shoulder (V_F), field back shoulder (V_B), and field effectiveness (V_A). Horizontal timing includes line synchronization (H_S), line front shoulder (H_F), line back shoulder (H_B), and line validity (H_A). Among them, DE signals output high pulses during the line valid signal. The timing parameters of the 4-pixel interface of 3840 × 2160@60 Hz image are shown in Table 34.1. As shown in Fig. 34.1, the first line of Video1 and Video2 is spliced into the first line of 4K image, which is analogous until the line 1080. The first line of Video3 and Video4 is spliced into line 1081, which is analogous until the line 2060. Then, the resulting 3840 × 2160 resolution image is generated.
34.4 Experimental Result 34.4.1 Four-Way High-Definition Image Read–Write Simulation In order to speed up the simulation and verify the functions of the system, the size of the input image used in the simulation is 1920 × 12@60 Hz (12 rows and 1920 columns, 60 frames per second), so the theoretical size of the output image is 3840 × 24@60 Hz (24 rows and 3840 columns, 60 frames per second). Because the output image is sequentially output by a 4-pixel interface, the width of the effective line is 960. The simulation diagram is shown in Fig. 34.5. When the valid image is input, the mosaic image is output.
34.4.2 Result The system uses four channels of 3G SDI input, the input image is converted into standard VESA format, and the output is processed by system synchronous buffer splicing. The output adopts DP 1.2 interface. The core control of the system is based on the FPGA with XC7K325TFFG900-2. The SDI signal source comes from HDMI to SDI device. The verification device uses 4K UHD display. The DP core is configured with a MicroBlaze soft core to output 3840 × 2160@60 Hz mode. The display results are shown in Fig. 34.6. The final output image is a 4-pixel interface,
H_S
22P
Parameter
Value
74P
H_B 960P
H_A 44P
H_F 1100P
H_T 10L
V_S 72L
V_B
2160L
V_A
8L
V_F
2250L
V_T
Table 34.1 Timing parameters of 4-pixel interface for 3840 × 2160@60 Hz images. (pixel clock = P, lines = L, H_T = total pixels per line, V_T = total lines per frame)
300 F. Yan et al.
34 Design of Video Mosaic System
301
One Channal Output @3840x24
Four Channels Input @1920x12
Fig. 34.5 Simulation of 1920 × 12@60 Hz Input and 3840 × 24@60 Hz Output
Fig. 34.6 System experimental environment and results
8-bit pixel format of UHD image, its resolution is 3840 × 2160, and refresh rate is 60 Hz.
302
F. Yan et al.
34.5 Conclusion The system uses four HD images as input, and can output one UHD image through DP1.2 interface in real time. In board level verification, SDI signal source comes from HDMI to SDI device. Four channels of images are processed by buffer, synchronization, stitching, and so on. The core chip used in the verification system is XC7K325TFFG900-2. The consumption of LUT (Look Up Table) resources accounts for 25.96% of the total resources of the FPGA. Due to the high consumption of on-chip BRAM in this design, the consumption of BRAM resources is 40.45%. FIFO (First In, First Out) also uses this part of the resources, which is used in the read–write of DDR3. After evaluation, the static power consumption of the FPGA is 0.221 W, which accounts for 5% of the total power consumption, while the dynamic power consumption of the running is 3.146 W, which accounts for 67%. In addition, the high-speed serial interface accounts for 28% of the power consumption. From this, it can be seen that the static power consumption of the system is very low, and the power consumption is less than 5 W in normal operation. The experimental results show that the system supports continuous video playback, and its functions are suitable for real-time high-definition display fields such as video surveillance, medical consultation, multimedia display, etc. At the same time, the hardware system is expected to be popularized in the acquisition, transmission, and processing of three-dimensional stereo images. Acknowledgements This work was supported by the Natural Science Foundation of China (61605083), the Natural Science Foundation of Jiangsu Province of China (BK20150903) and the Natural Science Foundation of Jiangsu Higher Education Institutions of China (16KJB510023).
References 1. Qian, Z., Wang, Q., Zhou, C., Yu, D., Cheng, J.: Analysis and reconstruction of conduction leakage signal of computer video cable based on the spatial correlation filtering method. Chin. J. Radio Sci. 32(3), 331–337 (2017) 2. Mayer, T.: The 4K format implications for visualization, VR, command & control and special venue application. In: Workshop on Emerging Displays Technologies: Images & Beyond: the Future of Displays & Interaction (2017). https://doi.org/10.1145/1278240.1278249 3. Lin, Q., Sang, X., Chen, Z., Yan, B., Yu, C., Wang, P., Dou, W., Xiao, L.: Focus-tunable multi-view holographic 3D display using a 4k LCD panel. Proc. Spie 22, 100220F (2016) 4. Murakami, D, Soga Y, Imoto D, Watanabe Y,Yamada T: The world’s 1st Complete-4K SoC solution with hybrid memory system. In: 2015 20th Asia and South Pacific Design Automation Conference, pp 684–686. https://doi.org/10.1109/aspdac.2015.7059088 (2015) 5. Jang, S.J., Lee, S.S., Kim, J.W.: VLSI architecture for simultaneous capture and playback of 4K UHD audio and video data from multiple channels. In: IEEE International Conference on Consumer Electronics-Asia. IEEE2016.978,5090–2743 (2016) 6. Chen, Q., Sun, H., Zhang, X., Tao, H., Yang, J.: Zhao Jizhong, Zheng Nanning: Algorithm and VLSI Architecture of Edge-Directed Image Upscaling for 4 K Display System. IEEE Trans. Circuits Syst. Video Technol. 26(9), 1758–1771 (2016)
34 Design of Video Mosaic System
303
7. Kusano H, Ikebe M, Asai T, Motomura M: An FPGA-optimized architecture of anti-aliasing based super resolution for real-time HDTV to 4K- and 8 K-UHD conversions. In: 2016 International Conference on ReConFigurable Computing and FPGAs(ReConFig). https://doi.org/ 10.1109/reconfig.2016.7857153 8. Abeydeera, M., Karunaratne, M., Karunaratne, G., Silva, D.K., Pasqual, A.: 4K Real-Time HEVC Decoder on an FPGA. IEEE Trans. Circuits Syst. Video Technol. 26(1), 236–249 (2016) 9. Guo X, Wei X, Liu Y: An FPGA implementation of multi-channel video processing and 4K real-time display system. In: 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (2017) 10. Ashour, N. Dey: Video Segmentation using Minimum Ratio Similarity Measurement. International Journal of Image Mining, Vol 1, No. 1 (2015) 11. Hore, S., Chakraborty, S., Chatterjee, S., Dey, N.: An Integrated Interactive Technique for Image Segmentation using Stack based Seeded Region Growing and Thresholding. International Journal of Electrical and Computer Engineering 6(6), 2773–2780 (2016) 12. Li Z, Dey N, Ashour A S, Cao L,Wang Y, Wang D,Mccauley P,Balas V E,Shi K,Shi F: Convolutional Neural Network Based Clustering and Manifold Learning Method for Diabetic Plantar Pressure Imaging Dataset. Journal of Medical Imaging and Health Informatics 7(3) (2017) 13. Naik A, Satapathy S C, Ashour A S, Dey N. Social group optimization for global optimization of multimodal functions and data clustering problems. Neural Computing and Applications:271– 287 (2016)
Chapter 35
Exploring the Formalization of Brain Cognition in the Age of Artificial Intelligence Yang Libo
Abstract In the contemporary society where the Internet has almost reached its peak, artificial intelligence has gradually become an important proposition for the development of science and technology. From the traditional perspective, people’s cognition of the brain is limited to the human point of view, but in the era of artificial intelligence, due to the large number of robots, people will deal with artificial intelligence in a wide range of daily life. In this context, it is clear that the current human cognition of the brain will gradually be eliminated and replaced with the breakthrough of artificial intelligence technology. The problem of brain cognition was widely recognized and heatedly discussed by scientists as early as 1956. In the modern era, with the maturity of science and technology, the problem of the formalization of brain cognition will also be redefined with the advent of the artificial intelligence era. This article starts the relevant discussion and hopes to contribute to the in-depth discussion of the problem of formal cognition of brain cognition in China.
35.1 Introduction Human enthusiasm for brain exploration has been increasing, such as the brain plan of the Obama people, the brain plan of Eastern Europe and Japan, and the Chinese brain plan of Li Yanhong [1]. In January 2019, the US Department of Commerce officially implemented 14 classes. Representative technology export controls, of which there are 7 types of technologies related to artificial intelligence. Priscila Chuhuaicura of Dental School Research Centre for Dental Sciences (CICO) Universidad de La Frontera Temuco Chile studied the relationship between chewing, cerebral blood flow, and cognitive function in adults [2]. At present, human society has begun to enter the era of intelligence. For this era of rapid arrival, China’s exploration of the human brain is still in its infancy. Most of the research is an introduction to Y. Libo (B) Guangdong University of Science & Technology, Dongguan 523083, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_35
305
306
Y. Libo
Fig. 35.1 Human brain structure
relevant foreign research, and there is still a long way to go from comprehensive understanding. In March 2016, the National 13th Five-Year Plan (2006–2020) listed “Brain Science and Brain Research” as “Science and Technology Innovation 2030 Major Project”. In July 2017, the State Council issued the “New Generation Artificial Intelligence Development Plan,” which raised artificial intelligence to an important national development strategy [3]. This paper explores the formalization of brain cognition from the background of artificial intelligence and better promotes the future development of artificial intelligence. The human brain is the most complex organ in the world. Its structure and function are shown in Fig. 35.1 [4]. The research on brain science has involved several interdisciplinary subjects, such as artificial intelligence and life sciences. There are two formal forms of brain cognition. Ways to span multiple fields at the same time. The level of research on brain cognition will also largely determine the extent of development in multiple areas related to it, such as artificial intelligence. In turn, the advent of the artificial intelligence era will in turn affect and change people’s understanding of the cognitive forms of the brain. The two promote each other and are inseparable. This article explores in depth the issues related to this field.
35.2 Explain the Mystery of Brain Cognition The understanding of brain cognition is very complicated. There are two ways to study brain cognition. The first one is to study brain cognition from the field of neurology. This is a popular method in the academic world, such as countries (Japan, China, USA). The second part is to study brain cognition from the physiological point of view. The instinct of the human brain is forgetting, repelling, and refusing to accept. This is the instinct familiar to the IT community. When people understand something, they can’t fully understand another thing, that is, the awareness is not
35 Exploring the Formalization of Brain Cognition …
307
enough. When it comes to specific things, thinking about whether these things can present specific memories in the human brain, this is the formalization of brain cognition. The reason why the human eye observes that the world has a three-dimensional sense is because people have two eyes. When we observe things, the parallax displacement caused by the two eyes is analyzed by the brain, and the distance between the objects is distinguished. This produces a strong sense of three-dimensionality. The capacity of brain cognition is very large, and the brain can be expressed through functional structure diagrams, but human brain cognition cannot be reproduced no matter how many years it takes. The popular cranial phase in the eighteenth century will recognize 26 bones of the skull. Established contact, but later proved to be pseudoscience [4]. For the first time in the brain surgery for patients with epilepsy, the scientific community has linked the cranial nerves to cognitive behavior. It has been found that when different brain waves of different neural regions perform the same oscillating command, the patient will have convulsion. Nowadays, some people have studied the 116 and 252 functional areas of the human brain. The human brain weighs about 2 kg, most of which are concentrated in various regions of the cerebellum in various topologies. The brain is composed mainly of the nerve cells of the human body. Each nerve cell has a number of outwardly protruding parts called dendrites and axons. Brain cognition is to perceive external things through the human brain and form an imagination. It also encodes and expresses perception and memory. Its perfection is related to the development and growth of the human brain. For example, seeing a person is fundamentally different from knowing a person. It means that different parts of the brain are working. For adults, the part of the cerebral cortex that has memory function accounts to 22 km2 , and the brain is not necessarily representative [4].
35.3 Analyze the Method of Studying Brain Cognition 35.3.1 Cognitive Neurological Methods There have been studies in neurobiology that take out brain cells from psychiatric patients and use them to develop a cerebral cortex in a prepared dish. With this cerebral cortex, it is induced to grow into pluripotent stem cells, and then, through complex neurobiological techniques, it becomes a brain cell. Finally, with these brain cells, neurons and glial cells were created [5]. What this experiment essentially manifests is the method of cognitive neurology. At present, in cognitive neuroscience, biological neural circuits and large-scale neural networks have become a hot research topic. Therefore, in the current era, to deepen the study of the formalization of brain cognition, we must first start from this cognitive neurological approach. However, when using this cognitive neurological method to conduct related research in brain
308
Y. Libo
science, there will be many obstacles, that is, what kind of scale should be chosen to formalize the course. Because once the scale is reduced to a certain level, the microstructure will become more and more complicated. Therefore, if you want to carry out formal research on brain cognition, the difficulty will increase. Because the world to be studied is “infinite,” it is microscopic in the microscopic, that is to say, the regional map from the neural network to the brain, and then to the gene. Each time the scale of the study is expanded, the difficulty of the study increases accordingly.
35.3.2 Physical Methods of Brain Cognition The deepening of physics in the field of human brain research has led to the development of many disciplines and technologies. Not only the development of computers and the Internet depends on the physics of this person’s brain, but the future development of artificial intelligence and the formalization of brain cognition will also rely heavily on this human brain physics method. Because in the world of the human brain, the same is true, and the laws of the external physical world are everywhere. It can be said that the human brain is a scaled-down universe.
35.4 Analyze the Social Attributes of Brain Cognition In order to grasp the research process of the whole brain cognition formalization as a whole, it is necessary to combine multiple aspects, including not only the study of biological attributes but also the study of human social attributes. This is also the question that philosophy and psychology have been discussing for a long time— whether the environment determines human behavior or whether genes determine human behavior [6]. We call the study of biological brains formal. However, in order to dig deeper into the formalization of human brain cognition, we must first focus on the study of social attributes of brain cognition. These questions include, but are not limited to, how the human brain obtains information from the outside world and how does it reflect the objective world? In addition, how does the human brain infer the unknowns through known things? Moreover, what kind of ways does the human brain achieve innovation? Research such as this, as well as research on human emotions, expressions, attacks, etc. are important components of the formal research process of human brain cognition. In general, in order to carry out these studies indepth, it is necessary to resort to a variety of contemporary neuroimaging techniques that rely on the development of physics, such as nuclear magnetic resonance, signal analysis, and other techniques, which greatly enhance the human brain. The degree of convenience. In the past, when technology has not reached the current level of development, researchers can only study the human brain through craniotomy, but with this series of neurophysical methods, research on the human brain is often easy
35 Exploring the Formalization of Brain Cognition …
309
to achieve. It is. This has also greatly promoted the development of formal research on brain cognition. The cognition of the human brain is achieved by the brain. In order to make the machine as cognitive as human beings, the artificial intelligence era explores the difference between the researcher and the computer in processing information. In order to truly make a breakthrough in the study of brain cognition, it is necessary to make breakthroughs in essence. It must also rely on the study of the language system of the human brain. This systematic study consists of three parts, one of which is human memory. Human memory implements multiple learning and memory functions by at least five brain functional systems, including repetitive priming effects, perceptual priming effects, semantic systems, working memory, and context. Multiple memories of memory, as well as proficiency learning, perceptual learning, and semantic learning. The second is human computational cognitive ability, the ability to assist, understand, make decisions, and to discover [7]. Parallel and serial methods of information processing, cognitive activities based on experience and knowledge, and the ability to adapt to the environment and interact with the environment through inspirational, epiphany cognitive activities. Cognitive computing system notification is represented by IBM (Watson) Real-time computing and analysis of big data is achieving self-learning and possessing the ability of human brain. And successfully applied to the fields of medicine, finance, and customer service [8], Wu Guozheng of Xi’an Jiaotong University and others developed the “brain-controlled rehabilitation robot system based on visual and auditory-induced motor nerve channel reconstruction.” The research on these parts is also the key to the development of contemporary artificial intelligence. Similarly, it is also a key part of improving the process of formal research on brain cognition.
35.5 Conclusion After the above discussion, we can find that in the era of artificial intelligence, the study of the formalization of human brain cognition has been urgently needed. The human brain is a complex and sophisticated machine, so to study it, it must be based on a variety of methods from multiple perspectives. This paper explains the research methods of cognitive neurology and the research methods of human brain physics. Only on this basis, the research depth of various parts such as the social attributes of the human brain can be continuously improved, and the scope of research can be extended to enable relevant application techniques. This can achieve breakthrough progress, rather than stagnation.
310
Y. Libo
References 1. Li, D.Y.: Exploring the formalization of brain cognition in the age of artificial intelligence. China Informatization Weekly (2015) 2. Chuhuaicura, P.: Mastication as a protective factor of the cognitive decline in adults: a qualitative systematic review. Int. Dent. J. 3, 334–340 (2019) 3. Gao, Q.Q.: Research on Audio-Visual Information Fusion Method Based on Deep Learning. Hebei University of Technology (2016) 4. Yang, T.L., Xin, F., Lei, X.: Gender differences in human brain structure and function: evidence from brain imaging research. Prog. Psychol. Sci. 4, 571–581 (2015) 5. Li, D.Y.: Formalization of brain cognition—talking from R&D machine driving brain. Sci. Technol. Rev. 24, 125–125 (2015) 6. Huang, Q.: Teaching research based on network optimization in artificial intelligence age. Computers 01, 235–236 (2019) 7. Gong, Y.H.: Will artificial intelligence eventually transcend human intelligence? Based on the basic principles of machine learning and human brain cognition. People’s Forum Academic Frontiers (2018) 8. Li, Y., Pan, J., Long, J., et al.: Multimodal BCIs: target detection, multidimensional control, and awareness evaluation in patients with disorder of consciousness. Proc. IEEE 2, 332–352 (2016)
Chapter 36
On the New Challenges Faced by Library Book Acquisition in the Age of “Internet +” Dong Na
Abstract This study analyzes the dilemma faced by the library industry in China in the field of book interviews from the perspective of the “Internet +” era. The process of the book interview is cumbersome and rigid, and the channel resources for book interviews are relatively fixed. In response to these problems, we seek solutions for “Internet +” thinking, discuss cloud technology, effective reader interaction mechanism, and streamline the use of workflow in book interviews, which provides a useful reference for improving the quality and efficiency of library procurement.
36.1 Introduction The library’s collection of books is very large, and it is an important way for the students, the general public, and the majority of scientific research workers to access, to compare, and learn all kinds of materials. The French government’s support for various books and reading activities is unparalleled. According to the European Times, the National Book Center, which is part of the French Ministry of Culture, receives an annual grant of about 42 million euros to support book publishing practitioners in all aspects [1]. The American library community has proposed an interview mode for PDA and DDA, which incorporates readers into the book purchasing decision, and they directly purchase the books they need most on their behalf [2]. In China, the current “Internet + Book Interview” is still in the exploratory stage. There are many platforms but lack of versatility, lack of visualization of paper books, speeding up the updating of various books and materials, and diversification of readers’ appeals, making the traditional book interview mode more prominent [3]. Therefore, we must adjust the book interview mode in time, actively close the “Internet +” thinking, and establish a faster, more accurate, and effective book interview mechanism.
D. Na (B) Guangdong University of Science & Technology, Dongguan 523083, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_36
311
312
D. Na
36.2 The “Internet +” Challenges for Library Book Interviews 36.2.1 The Drawbacks of the Traditional Book Interview Mode Are Highlighted With the development of the times, new books are constantly appearing and libraries must also introduce new books without interruption. However, the traditional book interview mode is relatively limited. Traditional library book interview workflow is shown in Fig. 36.1. Firstly, there is a blindness in the goal of book purchasing. There is no clear goal for the titles that need to be introduced or the categories that need to be enriched. Then the traditional book interview process is cumbersome. The book purchasing staff should be connected with the book sellers and library leaders. Instead, they will ignore what the reader really wants [4], and the time period for completing the book interview work is relatively long. In the context of the “Internet +” era, this long book procurement cycle and the aimless book purchasing model have been criticized. For example, in a university library, students need to check out new test materials. However, the delay in the procurement of the library has led to
Fig. 36.1 Traditional library book interview workflow
36 On the New Challenges Faced by Library Book …
313
the delay in the availability of new books, which will have an impact on the actual needs of students, highlighting the shortcomings of traditional book interviews.
36.2.2 The Resources of the Traditional Book Interview Mode Are Scarce Due to the long cycle and complicated procedures, the traditional book interviewing work is not accurate enough in the selection of book categories, titles, and versions. The library suppliers who cooperated with the library formed a lazy thinking, and the updating of the books and materials was relatively slow. Some libraries did not introduce new books for several years. As a result, the practicality and timeliness of various books and periodicals in the library have been greatly reduced, and the ability to serve a wide range of readers has been greatly weakened.
36.2.3 The Traditional Book Interview Mode and Internet + New Technology Are Not Connected The traditional library book interview mode is still dominated by paper office,The computer is just a kind of auxiliary office equipment, which cannot play a bigger role in the book purchasing work. In addition, the library’s application to other new technologies and equipment is also very simple. For example, the library should determine the list of interview titles, but when communicating with the readers, it is mainly based on sending emails and text messages. This has a certain contrast with the habits of modern people’s WeChat communication and community communication, which has affected the communication efficiency of libraries and readers to a certain extent. In addition, some libraries will use the online recommendation platform to develop their own procurement lists. For example, some recommended books, hot titles, etc. provided by post bars, forums, or similar platforms. This kind of online post information fills the shortage of traditional book interview mode to a certain extent, but its professionalism and authority have great volatility, which may not be able to accurately interface with the needs of readers.
314
D. Na
36.3 “Internet +” Brings Inspiration to the Library Book Interview Work 36.3.1 Cloud Big Data Service Library Book Interview Work It is impossible to comprehensively and accurately collect, analyze, and judge readers’ demand information and industry information. This is an urgent problem for libraries. In the context of the “Internet +” era, the docking of libraries and cloud big data systems is possible. Through the increasingly mature cloud technology and cloud data processing system, the library can collect information data for a wider target group, and quickly process various types of information through cloud computing to obtain intuitive data results for books. The library’s book interview business provides an effective reference. In fact, many libraries at home and abroad have established a complete cloud big data book interview platform. This platform is built on professional cloud computing, cloud storage information and technology services companies. With this platform, the library can interact with a wider range of readers for more flexible and timely information. On the one hand, this kind of cloud book interviewing mechanism requires the library or third-party service organization to plan and structure the technology, complete the function setting of the platform and optimize the user experience; on the other hand, it can scan the code device and user port through the library. Account registration, library website landing, big data recommendation platform registration, and other methods [5] facilitate the readers to interact with the library’s big data. In this way, the library can directly use the big data system to summarize the feedback information of the readers; track the current book supply and hot sales situation in the book market; and conduct information with libraries in other regions, other institutions, or schools shared. This will bring great convenience to the library’s book interview work.
36.3.2 Through the “Internet +” Technology to Achieve Reader-Centered Book Interview Work The library conducts book interviews, which essentially serves the readers. Because the essence of the library is to collect high-quality books and materials from various industries and fields and to provide convenience for readers. However, in the traditional book purchasing mode, the communication between the library and the reader group is not smooth, so it is also unidirectional in determining the purpose of purchasing books, which may not meet the needs of the readers. Now, the Internet + era has given libraries the conditions to improve their service levels. The e-book interview process of university library under the “Internet” thinking is shown in Fig. 36.2. The library should rely on technical channels to
36 On the New Challenges Faced by Library Book …
315
Fig. 36.2 University library e-book interview process
gradually work in the direction of “focusing on readers’ needs.” For example, the establishment of a normalized reader demand tracking and feedback mechanism to achieve continuous correction and optimization management of the interview list [6]; accurate classification and matching of the reading habits and appeals of the target group, to improve the scientific and accurate book interviewing work, in addition, to clearing the communication path between the library and the reader. It is necessary to fully understand the needs of readers, refine the needs of readers, and conduct comprehensive information collection and tracking feedback on the types of books, book prices, book specifications, book versions, and publishing houses that readers are concerned about. The library can set up electronic survey questionnaires around several elements of the reader’s concern, initiate electronic voting, implement bibliographic scoring, or establish a recommendation for bibliographic details promotion mechanism, and support readers to register and log in with various accounts such as QQ, WeChat, and mobile phones. Encourage readers to fill in personal details and collect relevant content in this way.
36.3.3 Streamline and Optimize the Traditional Book Interview Workflow “Internet +” provides sufficient technical support for the library’s book interview work. However, to achieve this transformation smoothly, what is needed is the active
316
D. Na
adjustment and positive change of the library in terms of management concepts, book interviewing systems and processes. The library should update the concept of book interviews, and boldly update the book interview workflow and related system regulations. To remove the complex redundant links of traditional book procurement, it is necessary to make full use of the convenient information transmission and communication mechanism of the Internet to streamline the procurement process. For example, the collection, analysis and reporting mechanism of the reader’s intention information, the online submission of various interview titles, and the leadership approval mechanism. It is necessary to complete the book interviewing work quickly and effectively, so that we can truly integrate the “Internet +” era and realize the successful transformation of library functions.
36.4 Conclusion The background of the Internet + requires libraries to transform early. The transformation of book interviewing work is only one of the key points in the entire transformation stage of the library. Libraries should fully recognize the shortcomings of traditional book procurement methods, and use the convenience of Internet technology and mature technology systems to optimize the book interview mechanism and functions. In this way, we can calmly cope with the challenges caused by the progress of the times and the development of science and technology, and truly play the role of a library.
References 1. Lugya, F.K.: User-friendly libraries for active teaching and learning. Information and Learning Science. 119, 275–294 (2018) 2. Https://www.xzbu.com/4/view-7664192.htm. Accessed from April 29 2019 3. Zheng, Y.P.: Discussion on the Mode of “Internet + Book Interview” and Its Development Strategy. Library Research 9, 46–51 (2011) 4. Liao Sh.R., Wang, Z., Yi, S., et al.: Challenges and Countermeasures of University Libraries in the Internet Age. Education and Teaching Forum, Vol. 52: 12–13 (2016) 5. Ge, Y.: On the Document Acquisition Model of Public Libraries in the Digital Age. Western Library Forum 3, 42–44 (2019) 6. Liu, G.Q.: On the new challenges faced by library books in the era of “Internet +”. Journal of Hubei Normal University (Philosophy and Social Sciences Edition) 1, 111–114 (2018)
Chapter 37
FSIS: Fast and Seamless Image Stitching of High-Resolution Dunhuang Murals Ming Chen, Xudong Zhao and Duanqing Xu
Abstract Digital processing of Dunhuang murals is important for the permanent and real preservation of precious cultural relics information. The image stitching is one of the key and challenge steps to obtain complete murals with high resolution. Gradient-domain methods, which need to solve a sparse linear system of the Poisson equation, are widely used in the field of image stitching. However, it is a time consuming and resource-intensive task for solving the Poisson equations, especially when the image is large. To address the above problems, a fast and seamless image stitching method called FSIS implemented on graphics processing unit (GPU) architecture is proposed to improve the image stitching efficiency of high-resolution Dunhuang murals. Specially, the original image is firstly down-sampled into lowresolution border pixels images (BPIs) by using quadrilateral grids. Next, the Poisson equations are constructed and solved based on the BPIs. Finally, the final result image is obtained by interpolation. Through using the above steps, the time and memory consumed by solving Poisson equations can be reduced. Experimental results have shown the efficiency of the proposed method.
37.1 Introduction Dunhuang murals are precious information for studying the Buddhist art. However, it is a difficult task to preserve these murals. For the permanent and real preservation of these precious cultural relics, the digital processing of murals is used, so that people can appreciate murals by digital products. In digital processing, due to the size of murals is large, it is impossible to get a complete and high-quality mural image only by photography collection. Fortunately, multiple high-resolution images of different mural areas can be obtained and then stitched into a complete mural. Therefore, how to fast and seamlessly stitch these multiple images is a major challenge.
M. Chen (B) · X. Zhao · D. Xu Zhejiang University, No. 38 Zheda Road, Hangzhou, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_37
317
318
M. Chen et al.
Gradient-domain techniques [1–6] are widely applied in the field of image processing. However, all related techniques need to solve a sparse linear system of the Poisson equation, which is a time consuming and resource-intensive task, especially when the size of image is large, e.g., giga pixel images. There are various approaches for addressing this problem. For example, Agarwala [1] described a hierarchical approach by using quadtrees to reduce the scale of the above problem and improve the efficiency of image compositing. However, this method cannot be used to process large images. Kazhdan and Hoppe [2] proposed a streaming multigrid method in order to solve the large global linear systems. The streaming multigrid method implemented in the paper could not be run in parallel. Hence, it is necessary to make full use of resources and reduce execution time through parallel operations, especially processing large and high-resolution images, such as Dunhuang murals. It is also the problem to be solved in this paper. Inspired by [1, 7], we intend to reduce the time consumed and resource used in the process of solving the Poisson equation for seamless image stitching of highresolution Dunhuang murals. In this paper, a fast and seamless image stitching method called FSIS is proposed to address the above problem. Assuming that all input multiple images of different mural areas have already been stitched into a panorama, which is called the original high-resolution image (or the original image) in this paper. There are mainly three steps in the FSIS method. • Iteratively down-sampling the original high-resolution image to the low-resolution border pixels images (BPIs) by using quadrilateral grids and then constructing a complete binary tree called DST to record the information of BPIs including image sizes, locations on disks and so on. • Constructing and solving the Poisson equations based on the BPIs. Specially, the BPIs used in this step are the leaf nodes of DST. For solving the Poisson equations, the V-cycle multigrid method is used. • Obtaining the final result image by interpolation based on the solutions of the BPIs. The Poisson equation interpolation and the bilinear interpolation are combined to use for obtaining the result image in a fast and accurate way. The proposed FSIS method is implemented on the GPU architecture. Through using the real high-resolution Dunhuang murals, experimental results have shown the efficiency of the proposed FSIS method, which can generate a complete seamless image in less time and memory consumption. The remainder of this paper is organized as follows. The related work is described in Sect. 37.2. Section 37.3 introduces the FSIS method in detail. Section 37.4 gives the description of the experimental results to validate the efficiency of the proposed FSIS method. Section 37.5 gives the conclusion of the paper.
37 FSIS: Fast and Seamless Image Stitching of High-Resolution …
319
37.2 Related Work The purpose of image stitching is to stitch multiple input images into a complete image in a seamless manner. However, it needs a lot of time and resources to process each pixel of the image, especially large and high-resolution images. Therefore, it is a hot topic to study the image stitching algorithms. Many researches [8–15] have focused on the study of algorithms to improve the quality and performance of image stitching. Hejazifar and Khotanlou [8] proposed a fast and robust seam estimation method (FARSE), which could be simply implemented, to estimate seam in a fast and robust way for image stitching. By using gray-weighted distance and gradient-domain region of differences, the FARSE can make a better performance. Shi et al. [9] presented an improved parallax image stitching algorithm using feature blocks (PIFB), which divides each image into feature blocks based on an improved fuzzy C-Means algorithm for improving the accuracy and calculation speed of the image stitching process. Fu et al. [14] focused on the improvement of computational speed of image stitching by giving a new way to determine feature searching area. Jeong and Jun [15] mainly focused on the study of seaming finding and reduced the computing complexity and time consumption by using downscaling and cost function. In this paper, we also studied the optimization method for reducing time and resource consumption in the process of image stitching, specifically for large and high-resolution images. Some other researches [16–18] have studied the image stitching in the concrete application situations. Shi et al. [16] have studied the image stitching algorithm for compressing the data of star map and obtaining a complete star map by constructing the aggregated star map. Zhang et al. [17] presented the application of image stitching method for obtaining the panorama of the corroded surface from multiple scanned microscopic corrosion images. In the paper, the Dunhuang murals are used as an example to study the seamless image stitching. It is a challenging task due to the murals are always large and of high resolution.
37.3 FSIS Method 37.3.1 Overview of the FSIS Method The purpose of the paper is to obtain seamless and high-resolution Dunhuang murals by using image stitching method in an efficient way. Therefore, The FSIS method is proposed to address this problem through solving a large sparse linear system of the Poisson equation. The overview of the proposed FSIS method is shown in Fig. 37.1. As shown in Fig. 37.1, the original image is the input of the proposed FSIS method. In the process of the FSIS method, we firstly iteratively down-sample the original image to the BPIs, which are low-resolution images. And then the DST is constructed based on the iterative process of generating BPIs. Next, we construct the
320
M. Chen et al. DST
BPIs
Root
Original image
HBPI
.. ..... ...... ...
HBPI
VBPI
VBPI
Poisson Equations of all BPIs in the leaf nodes of the DST
Interpolate the images from bottom to top
HBPI VBPI HBPI VBPI
Fig. 37.1 The overview of the FSIS method
Poisson equations of the leaf-BPIs, i.e., all the leaf nodes of the DST. After solving the Poisson equations, the final result image is obtained by interpolation from bottom to top of the DST.
37.3.2 Down-Sampling the Original Image and Constructing DST Due to the large size and high resolution of the original image, it is very costly to load the full image to process from the perspective of time and memory consumption. It is necessary to down-sample the original image to the low-resolution images. In the FSIS method, the quadrilateral grids are used to down-sample the original image. The original images are firstly divided into grids and then two BPIs are generated in horizontal and vertical directions, which are called HBPI and VBPI, respectively. In concrete, in horizontal direction, for each grid, the pixels of inner pixels and vertical border pixels are discarded, and the horizontal border pixels are reserved to generate the HBPI. Similarly, only the vertical border pixels of each grid are reserved to generate the VBPI in the vertical direction. When the BPIs are still not small enough, the BPIs should perform the same down-sampling process as above. There may be many BPIs generated in the down-sampling process, to better organize these BPIs, a complete binary tree called DST can be constructed. The structure of the DST is shown in Fig. 37.1. The root of the DST is the original image, and the child nodes, i.e., HBPI and VBPI, are the generated BPIs of the corresponding father node. The nodes of the DST record the information of BPIs, such as image sizes, image formats, locations on disks, and so on. Specially, the leaf nodes of the DST are called leaf-BPIs. From the above process, there are two parameters that should be specified, which are the size of quadrilateral grids and the end condition of the iterative process. • The size of quadrilateral grids. This parameter is one of the key factors to decide the accuracy of the FSIS method. On the one hand, when the size of the grid is large, it means that the generated BPIs are at a coarser level after only reserving the border pixels, and then it will decrease the accuracy of the method. In other words, the smaller the size of the grids, the more accurate the method. However, when the size of the grid is small, it also means that it will greatly increase the
37 FSIS: Fast and Seamless Image Stitching of High-Resolution …
321
time consumed in the following steps. On the other hand, due to the FSIS method implemented on the GPU, it will guarantee a better performance when the size of grid is 2 p × 2q (3 ≤ p, q ≤ 10). It is a tradeoff between the time and accuracy and can be decided depending on the size of the original image • The end condition of the iterative process. This parameter determines the time and resource consumed by the proposed FSIS method to some extent. If there are many iterations, many BPIs will be generated, which need to be solved in the following steps and consume a lot of time and resources. Therefore, it is necessary to specify the end condition. This parameter can be set by limiting the maximum size of leaf-BPIs. In other words, when the leaf-BPIs are smaller than maximum, the down-sampling process will be stopped.
37.3.3 Constructing and Solving Poisson Equations Based on Leaf-BPIs of DST Based on the leaf-BPIs of the DST, we construct the Poisson equations. The basic form of the two-dimensional Poisson equation is shown in the Eq. (37.1). ∂ 2 u(x, y) ∂ x 2 + ∂ 2 u(x, y) ∂ y 2 = ϕ(x, y)
(37.1)
For the discrete digital image, the as gradient of image I can be represented − → and the gradient of a given gradient field g I ∇ Ii, j = Ii+1, j − Ii, j , Ii, j+1 − i, j y − → − → − → x , the goal is to minimize the above two can be represented as g = g , g i, j
i, j
i, j
gradients [2], which can be described as → g i, j || min||∇ Ii, j − −
(37.2)
It is equal to construct and solve the linear system Ax = b. The five-point Laplacian stencil is used and then the Laplacian operator of pixel Pi, j can be represented as L Pi, j = Pi−1, j + Pi, j−1 + Pi+1, j + Pi, j+1 − 4Pi, j
(37.3)
In this stage, the Neumann boundary conditions are adopted. Due to the color channels in the image are independent, the Poisson equations of each color channel can be solved separately and then combined to obtain the final result. For solving the Poisson equations, the V-cycle multigrid method [19], which mainly contains restriction and prolongation two processes, is applied. In the restriction process, the linear system is calculated from finer grid and then the next coarser grid is restricted by the error. Iteratively doing the above operation until the grid is
322
M. Chen et al.
the coarsest. In the prolongation process, the system adjusts the error from coarser grid to finer grid, and finally gets the result of the linear system of leaf-BPIs.
37.3.4 Obtaining the Final Result Image by Interpolation After solving the constructed Poisson equations based on the leaf-BPIs, the last step of the proposed FSIS method is to obtain the final result image by interpolation from bottom to top of the DST. Because the border pixels of the quadrilateral grids are reserved when generating BPIs, therefore, only the inner pixels need to be evaluated by interpolation. There are two approaches, which are Poisson equation interpolation [20] and bilinear interpolation, to obtain the final result image. The Poisson equation interpolation is the process of solving Poisson equations as described above. The bilinear interpolation is to evaluate the inner pixels by calculating the differences between the computed values of BPIs and the ground-truth values of the original image in four directions of the border pixels about each inner pixel. By comparing the two interpolation approaches, the Poisson equation interpolation is time consuming but has a high accuracy; the bilinear interpolation is fast but has a relatively low accuracy. From the above descriptions of constructing DST, the interpolation accuracy of the nodes especially leaf nodes greatly influences the quality of the final result image. On the one hand, the impact gradually decreases from bottom to top. On the other hand, the size of BPIs gradually increases from top to bottom. In addition, the quadrilateral grids are independent and can be solved in parallel. Therefore, the Poisson equation interpolation and the bilinear interpolation are combined to use in this step. In detail, the Poisson equation interpolation is used when the father node of HBPI and VBPI is not the root node, it not only ensures accuracy but also reduces the time consumed through parallel processing. As the father node is the root node, the bilinear interpolation is used to accelerate the generation of the final result image. In conclusion, the overall process of our proposed FSIS method can be clearly seen in the block diagram, which is shown in Fig. 37.2.
37.4 Experimental Results 37.4.1 Experimental Settings In the proposed FSIS method, two parameters described in Sect. 37.3.2 need to be specified. The first parameter is the size of the quadrilateral grid, in order to balance the time consumption and accuracy and take full advantage of the computing power of GPU, the size of grid is set to 32 × 32 when the image is small or 64 × 64 when
37 FSIS: Fast and Seamless Image Stitching of High-Resolution …
Start Input the original image Set the size of quadrilateral grids; Set the original image as the root of the DST Generate BPIs (HBPI and VBPI) Construct or update DST No
Leaf-BPIs is smaller than the specified limitation Yes
Construct and solve Poisson equations based on the leaf BPIs of the DST No
All leaf nodes is processed Yes
Load two corresponding BPIs from bottom to top of the DST
Father node is the root node
No
Interpolate father image by Poisson equation interpolation
Yes
Interpolate father image by bilinear interpolation Output the final result image End Fig. 37.2 The overall process of the proposed FSIS method
323
324
M. Chen et al.
the image is large. Another parameter is the end condition of iterative process. In the experiments, the end condition is that the maximum size of leaf-BPIs is 3MP (megapixel). For comparing the performance with the quadtree (QT) method proposed by Agarwala [1] and the streaming multigrid (SM) method proposed by Kazhdan and Hoppe [2]. Four different sizes of images of [2] are used in the performance comparison. The four images are Beyanc, Rainier, Edinburgh, and Redrock, and the sizes of these images are 12 MP, 23 MP, 50 MP, and 87 MP, respectively.
37.4.2 Performance Comparison The performance comparison of the FSIS method, SM method and QT method is described. The average root mean square error (RMSE) of each pixel is measured using 8-bit red channel. The comparison of the RMSE is shown in Fig. 37.3. It should be noted that the results of Edinburgh and Redrock are not shown in the figure due to that the QT method cannot be used to obtain the ground-truth solution on large images. In addition, the size of the quadrilateral grid is set to 32 × 32 in this comparative experiment. As shown in Fig. 37.3, although the RMSE of the FSIS method is lower than the SM method, the error of the FSIS method is still low enough. The comparison of peak memory usage and execution time are shown in Figs. 37.4 and 37.5, respectively. As we can see from Fig. 37.4, the peak memory usage of the FSIS method is relatively high when the image size is low (i.e., Beynac and Rainier). However, the memory usage of the FSIS method drops significantly when the size of image is high (i.e., Edinburgh and Redrock). The reason causing this phenomena may be related to the iteration in the down-sampling process. When the image size is low, it only needs to down-sample the original image once. As the image size increases, it will down-sample the image twice or more and then only the generated BPIs are
Fig. 37.3 RMSE comparison of FSIS, SM, and QT among four different size images
37 FSIS: Fast and Seamless Image Stitching of High-Resolution …
325
Fig. 37.4 Peak memory usage comparison
Fig. 37.5 Execution time comparison
stored in memory and then processed. The more iteration of down-sampling, the more BPIs with lower resolution will be generated. This process reduces the memory usage when compared to the other two methods. However, it also means that more BPIs need to be solved and the execution time increases but is still less than other methods, which can be seen from Fig. 37.5. To sum up, although the proposed FSIS method has a relative poor performance, the FSIS method can process the high-resolution image with less memory and time consumption.
37.4.3 Seamless Image Stitching of Dunhuang Murals One of the Dunhuang murals, i.e., Mount Wutai mural is used as the example to validate the efficiency of the proposed FSIS method. The original image size is
326
M. Chen et al.
Fig. 37.6 Seamless image stitching of part of Mount Wutai mural
351 MP. And the size of the quadrilateral grid is set to 64 × 64 in this experiment. Figure 37.6 shows the performance of the FSIS method. In Fig. 37.6, left and right images, which are at the same location in the original image, are used to make a comparison. It can be seen that the obvious seam in left image is eliminated in right image. In addition, the peak memory usage and the execution time are 98 MB and 392 s, respectively. In summary, the proposed FSIS method can process the image stitching of high-resolution Dunhuang murals in a fast and seamless way.
37.5 Conclusions The large size and high resolution of Dunhuang mural pose a huge challenge for seamless image stitching in an efficient way. In the paper, a fast and seamless image stitching method called FSIS implemented on GPU architecture is proposed to improve the efficiency of image stitching. Specially, the FSIS method has three steps: downsampling the original image and constructing DST by using quadrilateral grids, constructing and solving Poisson equations based on leaf-BPIs of DST, and obtaining the final result image by interpolation. Finally, the efficiency of the proposed FSIS method is validated by experimental results.
References 1. Agarwala, A.: Efficient gradient-domain compositing using quadtrees. ACM Trans. Graph. 26(3), 94 (2007) 2. Kazhdan, M., Hoppe, H.: Streaming multigrid for gradient-domain operations on large images. ACM Trans. Graph. 27(3), 1–10 (2008)
37 FSIS: Fast and Seamless Image Stitching of High-Resolution …
327
3. Mccann, J., Pollard, N.S.: Real-time gradient-domain painting. Proc. Siggraph 27(3), 1–7 (2008) 4. Jin, Z., Jin, W., Li, L., Han, Z., Xia, W.: Multiscale infrared and visible image fusion using gradient domain guided image filtering. Infrared Phys. Technol. 89, 8–19 (2018) 5. Wang, R., Yang, H., Pan, Z., Huang, B., Hou, G.: Screen content image quality assessment with edge features in gradient domain. IEEE Access PP (99), 1–1 (2019) 6. Facciolo, G., Sadek, R., Bugeau, A., Caselles, V.: Temporally consistent gradient domain video editing. In: Energy Minimization Methods in Computer Vision and Pattern Recognition. Lecture Notes in Computer Science, vol. 6819, pp. 59–73 (2011) 7. Kopf, J., Cohen, M.F., Lischinski, D., Uyttendaele, M.: Joint bilateral upsampling. ACM Trans. Graph. 26(3), 96 (2007) 8. Hejazifar, H., Khotanlou, H.: Fast and robust seam estimation to seamless image stitching. Signal Image Video Process. 12(5), 885–893 (2018) 9. Shi, H.T., Guo, L., Tan, S., Li, G., Sun, J.: Improved parallax image stitching algorithm based on feature block. Symmetry 11(3), 348 (2019) 10. Qu, Z., Wang, T.F., An, S.Q., Liu, L.: Image seamless stitching and straightening based on the image block. IET Image Process. 12(8), 1361–1369 (2018) 11. Herrmann, C., Wang, C., Bowen, R.S., Keyder, E., Krainin, M., Liu, C., Zabih, R.: Robust image stitching with multiple registrations. In: ECCV, pp. 53–69 (2018) 12. Chen, K., Tu, J.M., Xiang B.B., Li, L., Yao, J.: Multiple combined constraints for image stitching. CoRR abs/1809.06706 (2018) 13. Abdukholikov, M., Whangbo, T.: Fast image stitching method for handling dynamic object problems in panoramic images. TIIS 11(11), 5419–5434 (2017) 14. Fu, Q.C., Wang, H.Y.: A fast image stitching algorithm based on SURF. In: ICCC, pp. 1–4 (2017) 15. Jeong, J., Jun, K.: A novel seam finding method using downscaling and cost for image stitching. J. Sens. 5258473, 1–8 (2016) 16. Shi, Q., Dongmei, Z., Yun, D.: The image stitching algorithm based on aggregated star groups. Signal Image Video Process. 13(2), 227–235 (2019) 17. Zhang, S.X., Li, Z.L., Yang, C., Xu, Y.Z., Zhou, J.Y.: Application of image stitching method in corrosion morphology analysis. J. Electron. Imaging 28(1), 013045 (2019) 18. Hugl, S., Eckardt, F., Lexow, J.G., Majdani, O., Lenarz, T., Rau, T.S.: Increasing the resolution of morphological 3D image data sets through image stitching: application to the temporal bone. CMBBE Imaging Vis. 5(6), 438–445 (2017) 19. Bramble, J.H., Goldstein, C.I., Pasciak, J.E.: Analysis of V-Cycle multigrid algorithms for forms defined by numerical quadrature. SIAM J. Sci. Comput. 15(3), 566–576 (1994) 20. Patrick, P., Michel, G., Andrew, B.: Poisson image editing. ACM Trans. Graph. 22(3), 313–318 (2003)
Chapter 38
Design of the Control System of 532/940 nm Double-Wavelength High-Power Laser Therapeutic Instrument Ningning Dong, Jinjiang Cui and Jiangen Xu Abstract The control system of 532/940 nm double-wavelength high-power laser therapeutic instrument is designed to realize the safe and stable works. The laser power supply module, main controller module and interface circuit module is studied. FPGA is used to realize the digital control of constant current source, the TEC drive, the digital PID operation and the temperature control of the double-wavelength laser. MCU is responsible for the implementation of RS232 protocol analysis. The main control module and interface circuit module are designed based on the STM32F microprocessor, to realize the display and storage of relevant data, the touch screen drive, and the response and control of the interface. Keil software is used to write the control software. The whole system is tested. As is shown to the experimental results, the deviation between the output power and the set power is less than 2%, which can meet the stable output of the double-wavelength high-power laser therapeutic instrument and ensure that the laser therapeutic system meets the safety standards of medical devices.
38.1 Introduction With the continuous development of laser technology, the application of laser in medicine is more and more extensive [1–5]. Doctors use different wavelength and power lasers to treat patients according to their different lesions. The laser with a wavelength of 532 nm can be well absorbed by hemoglobin, and the longer pulse width is close to the thermal relaxation time of bright red nevus and telangiectasia in N. Dong Academy for Engineering & Technology, Fudan University, Shanghai 200433, China N. Dong (B) · J. Cui (B) · J. Xu Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215163, China e-mail: [email protected] J. Cui e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 180, https://doi.org/10.1007/978-981-15-3867-4_38
329
330
N. Dong et al.
adults. Therefore, this laser is most suitable for the treatment of superficial vascular diseases. The 940 nm semiconductor laser has the deepest depth of action. It is the highest wavelength absorbed by hemoglobin in the infrared band and the best wavelength absorbed by water. Compared with other laser wavelengths, it absorbs less melanin and is more suitable for deep lesions. At present, most of the high-power laser therapeutic instruments used in clinic are single wavelength, such as Dornier Medilas D series for varicose veins, 1470 nm semiconductor laser therapeutic instrument of Bliolitec company in Germany for prostate treatment, etc. The single wavelength laser therapy instrument cannot fully meet the needs of complex treatment. If two sets are used, the laser therapeutic apparatus is troublesome and expensive. At present, most of the existing double wavelength laser therapeutic devices are weak lasers. For example, the 650/830 nm double wavelength semiconductor laser therapeutic device jointly developed by the computer department of Tianjin Science and Engineering College and the Biomedical Engineering Department of Tianjin Medical University has a maximum output of only 1 W for clinical application; the 635 and 808 nm double wavelength laser medical devices independently developed by Institute of Biomedical Engineering, Chinese Academy of Medical Sciences, the maximum output of multi-channel laser acupuncture instrument is only 0.1 W. In this paper, a control system of 532/940 nm double-wavelength high power laser therapy is designed. This paper discusses the digital control of system power supply, constant current source of FPGA, TEC (Thermoelectric Cooler) driver and digital PID hybrid temperature control, the main controller module of STM32F series microprocessor based on ARM Cortex-M3 architecture, and the key technologies of control system, such as compiling control software with Keil software, etc. Experiments show that the double-wavelength laser therapy technology has the characteristics of both short wave and medium-long wave. It can accurately target superficial blood vessels and blood vessels with deep and large diameters, ensure the system’s functional requirements, at the same time, make the system have stable output of high-power laser, and ensure the safety of laser therapy. At present, the laser control system mainly controls single-wavelength laser, and the report of double-wavelength control system is limited to control low-power laser output. The system adopts digital PID control mode to control high-power semiconductor refrigerators, accurately control temperature to meet the high-power laser output of the whole machine, and uses the system power module which meets the medical standard to supply power for each driver, temperature control and control unit, so that the whole machine has a high resistance to voltage fluctuation, transient and surge of AC power supply. The digital control of constant current source is realized by using FPGA, and the driving current is accurately controlled by using the main control and human-computer interaction of STM32F microprocessor based on ARM Cortex-M3 architecture, which realizes the stable output of double-wavelength high power laser.
38 Design of the Control System of 532/940 nm Double-Wavelength …
331
38.2 Composition and Working Principle of Control System for Double Wavelength High Power Laser Therapeutic Instrument High power semiconductor laser module is driven by high power laser power module [6–10]. 532 nm laser uses semiconductor end-pumping technology and 808 nm laser for pumping; the radiator ensures the working temperature of the laser in an appropriate range to ensure the lifetime of the semiconductor laser, and the temperature control device controls the frequency doubling crystal to ensure the most efficient output of 532 nm laser; 532 nm laser passes through the optical fiber coupling device, and enters the optical fiber; two wavelength lasers are transmitted through the optical fiber. It is transported to the therapeutic hand to produce the specified spot output at the specified distance. The design parameters are as follows: laser wavelength: 532/940 nm; output power: 532 nm (0–5 W), 940 nm (0–30 W); output spot: 1.4 mm. Figure 38.1 is a block diagram of the principle and structure of the control system of the doublewavelength high-power laser therapeutic instrument. It is mainly composed of system power supply, laser power supply module, main controller and interface circuit. The power supply of the system mainly supplies power to 940 nm laser driver module, 532 nm laser driver module, main controller and touch screen. The laser power supply module includes 940 nm semiconductor laser system driver and 532 nm semiconductor laser pumping solid state system driver, temperature control circuit and control circuit. It is responsible for ensuring the normal operation of 940 nm
System Power Supply Module
Laser power module 532 nm Laser Driver Power Module
940 nm Laser Driver Power Module
interface circuit
RS232
Touch screen Main Controller Module
940nm LD module
Foot switch Key switch
808nm LD Driver Circuit
532nm L resonator
RS232
Emergency stop switch
Fig. 38.1 Principle and structure diagram of double-wavelength high-power laser therapy control system
332
N. Dong et al.
semiconductor laser system and 532 nm semiconductor laser pumping solid-state system. RS232 protocol is used with the main controller to transfer the laser working parameters, operation mode, alarm signal and other information. Among them, MCU is responsible for RS232 protocol resolution, and FPGA realizes digital control signal of constant current source, drive signal of TEC, digital PID operation and other functions. The main controller mainly controls the output of LD (Laser Diode) module to meet the requirement of laser, according to the user’s needs and treatment requirements, and makes it work stably. At the same time, it also completes the output display and storage of relevant data. It is responsible for the touch screen drive, the response of touch screen signal and the control of indicator lamp. The interface circuit mainly consists of key switch, emergency stop switch, foot switch, and touch screen.
38.3 Software and Hardware Design 38.3.1 Laser Module The 940 nm laser uses the 40 W 940 nm semiconductor laser medical module of Dilas, Germany. The threshold current is less than 10 A. When the working current is 55 A, the output power is 40 W and the voltage is less than 1.8 V. Optical fiber output, matching optical fiber core diameter of 200 µm, numerical aperture of 0.22. The module has the functions of red light indication, optical power monitoring PD and optical fiber access monitoring. The high power cavity frequency multiplier 532 nm laser is mainly composed of control system, driving power supply, semiconductor pump source, optical resonator, optical coupling system and treatment arm. Drive power to pump module power supply, pump sources are used to send out 808 nm semiconductor laser, laser crystal Nd: YVO4 after absorption, form a 1064 nm oscillation within the optical resonator, the nonlinear crystal LBO into 532 nm laser output, 532 and 650 nm indicating light through the optical coupling system common coupling into 400 µm multimode fiber, finally the gathering system the whole formation in the treatment of hand with diameter of 1.4 mm spot treatment. The control system controls the output current of the driving power supply to realize the power modulation of 532 nm, and the switching off of the 650 nm indicating light and the cooling current sizes of TEC1 and TEC2 are controlled simultaneously by the control system to realize the accurate temperature control of the pump source and resonator.
38 Design of the Control System of 532/940 nm Double-Wavelength … Table 38.1 Main performance of medical switching power supply
333
Parameter
Describe
Value
Insulation
Between input and output
≥4000 V AC
Input to shell
≥1500 V AC
250 V AC, 60 Hz, 25 °C
≤300 µA
Leakage current
38.3.2 System Power Supply Module The power supply of the system is mainly supplied by 940 nm laser drive power module, 532 nm laser drive power module, main controller and touch screen. Convert 220 V AC to 5 V and 24 V DC. Using the mature and stable medical switch power supply in the market to ensure that the whole machine meets the requirements of medical electrical standards, XVD switch power supply Excelsys company is selected. Its main performance parameters are shown in Table 38.1, which meet the requirements of GB 9706.1-2007 “Medical Electrical Equipment Part 1: General Requirements for Safety”. The power supply has 6 slots, which can output 6 voltages at the same time, and can also parallel some of its modules to increase the output power. Because there are two lasers, the maximum driving current of each laser is not more than 36 A, and the maximum output current of a single 5 V module is 40 A, so each 5 V 40 A module is supplied with one laser. Since the maximum TEC driving voltage of the temperature control circuit can reach 20 V 15 A, the total current of 24 V should be 20 A or 24 V 10 A module in parallel. Touch screen rated power 8 W, 24 V fan and some digital circuits of laser driving power, the total power is lower than 80 W, so the smallest 24 V 5 A module is selected. In conclusion, 5 slot power supply is selected to meet the power supply requirements of the system.
38.3.3 Laser Power Supply Module The basic structure of laser power supply module is shown in Fig. 38.2. It includes 940 nm semiconductor laser system driver and 532 nm semiconductor laser pumping
Laser power module constant current source of laser drive LD drive power supply output
control circuit
interact with the major control circuit
tempertature control circuit temperature feedback
Fig. 38.2 Basic structure diagram of laser power supply module
334
N. Dong et al.
solid-state system driver, temperature control circuit and control circuit. It is responsible for ensuring the normal operation of 940 nm semiconductor laser system and 532 nm semiconductor laser pumping solid state system. RS232 protocol is used to transfer laser parameters, operation mode and report with the main controller. Information such as alarm signals. Among them, MCU is responsible for RS232 protocol resolution, and FPGA realizes digital control signal of constant current source, drive signal of TEC, digital PID operation and other functions. Driving indices of 940 semiconductor laser system and 532 nm semiconductor laser pumped solid state system: voltage: 0–4 V self-adaptive; current: 0–50 A continuous adjustable; peak current ripple: