123 91 22MB
English Pages 533 [504] Year 2020
Smart Innovation, Systems and Technologies 179
Roumen Kountchev Srikanta Patnaik Junsheng Shi Margarita N. Favorskaya Editors
Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology Methods and Algorithms, Proceedings of IC3DIT 2019, Volume 1
Smart Innovation, Systems and Technologies Volume 179
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/8767
Roumen Kountchev Srikanta Patnaik Junsheng Shi Margarita N. Favorskaya •
•
•
Editors
Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology Methods and Algorithms, Proceedings of IC3DIT 2019, Volume 1
123
Editors Roumen Kountchev Technical University of Sofia Sofia, Bulgaria Junsheng Shi Department of Electro-Optical Engineering Yunnan Normal University Yunnan, China
Srikanta Patnaik Department of Computer Science and Engineering SOA University Bhubaneswar, Odisha, India Margarita N. Favorskaya Informatics and Computer Techniques Reshetnev Siberian State University of Science and Technology Krasnoyarsk, Russia
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-15-3862-9 ISBN 978-981-15-3863-6 (eBook) https://doi.org/10.1007/978-981-15-3863-6 © Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
International Program Committee
Honorary Chair Prof. Lakhmi C. Jain
General Chairs Prof. Dr. Srikanta Patnaik Prof. Dr. Junsheng Shi Prof. Dr. Roumen Kountchev
Organizing Chair Prof. Yonghang Tai
International Advisory Chair S. R. Dr. Roumiana Kountcheva
Founder of IRNet China Silai Zhou
v
vi
International Program Committee
Co-founder of IRNet China Er. Bin Hu
TPC Members Prof. Jian Wang, Kunming University of Science and Technology, China Prof. Xilong Qu, Hunan University of Finance and Economics, China Dr. Victor Chang, Teesside University, Middlesbrough, UK Prof. Vladicescu Popentiu, Florin, City University, UK Prof. Guangzhi Qu, Oakland University, USA Prof. Dr. Zhengtao Yu, Kunming University of Science and Technology Prof. V. S. S. Yadavalli, University of Pretoria, South Africa Prof. Bruno Apolloni, Università degli Studi di Milano, Italy Prof. Harry Bouwman, Delft University of Technology, Netherlands Prof. Shyi-Ming Chen, National Taiwan University of Science and Technology Prof. Yahaya Coulibaly, University Technology Malaysia, Malaysia Prof. Ing Kong, RMIT University, Australia Prof. Gerald Midgley, Centre for Systems Studies, University of Hull, UK Prof. Khubaib Ahmed, Hamdard University, Pakistan Prof. Moustafa Mohammed Eissa, Faculty of Engineering, Helwan University, Egypt Dr. Xilang Tang, Air Force Engineering University, China Dr. Yangjun Gao, Air Force Engineering University, China Dr. Fernando Boronat Seguí, Universitat Politecnica de Valencia, Spain Dr. Alexandros Fragkiadakis, Institute of Computer Science (FORTH-ICS), Greece Dr. Cristina Alcaraz, University of Malaga, Spain Dr. Mohamed Atef, Assiut University, Egypt Dr. Weilin Wang, University of Georgia, USA Dr. Bensafi Abd-Ei-Hamid, World Islamic Sciences and Education University, Jordan Dr. Yudi Gondokaryono, Institute of Teknologi Bandung, Indonesia Dr. Hadi Arabshahi, Ferdowsi University of Mashhad, Iran Dr. Qian Lv, Western Digital, USA Dr. Alojz Poredo, University of Ljubljana, Slovenia Dr. Mohamed F. El-Santawy, Cairo University, Egypt Dr. Tongpin Liu, University of Massachusetts Amherst, USA Dr. Seema Verma, Banasthali University, India
International Program Committee
vii
Sponsors Yunan Normal University IRNet International Academy Communication Center Interscience Research Network (IRNet) International Academy Communication Center is an independent, non-profit academic institute. It provides scholars, researchers of science, engineering, and technology all over the world a professional, informational, and intelligent academic platform and promotes academic research, communication, and international cooperation.
Preface
The annual International Conference on 3D Imaging Technology (IC3DIT 2019) took place in August 15–18, 2019 in Kunming, Yunnan, China. The key aspect of this conference was the strong mixture of academia and industry. The IC3DIT 2019 provided a forum that brought together researchers and academia as well as practitioners from industry to meet and exchange their ideas and recent research development work on all aspects of images, their applications, and other related areas. The conference established an effective platform for institutions and industries to share ideas and to present the works of scientists, engineers, educators, and students from different countries. The main topics of the conference papers, presented in Vol. 1, are 3D Image Representation, 3D Image Technology, 3D Image Graphics and Computing, 3D Information Technology and the related areas. Volume 1 of the Proceedings presents new approaches, methods, and algorithms for 3D image and data representation, analysis, computing, and new technologies related to the development of 3D cameras and displays. The volume comprises 55 chapters which can be divided into the following groups: 1. Chapters 1–8 are aimed at 3D data processing in areas related to: thermal analysis of anisotropic materials in 3D field; shape retrieval for 3D models; numerical conformal mapping; video summarization; multi-objective optimization of engineering project in the 3D field; 3D path planning; parallel magnetic resonance imaging reconstruction; and third-order tensor representation with low computational complexity; 2. Chapters 9–19 are related to neural networks (NN): spectral reconstruction through Bayesian regulation NN; loose hand gesture recognition using convolutional NN (CNN); multi-focus image fusion based on NSST and BPNN; classification of 3D remote sensing images; deep learning algorithm based on CNN; object recognition based on hard negative mining and CNN; intelligent identification of power equipment based on deep learning; chinese color name identification based on CNN; convolution computation for FPGA-based CNN; voice mark recognition based on optimized BPNN; and image super-resolution via deep convolutional dual up-scaling branches;
ix
x
Preface
3. Chapters 20–31 are aimed at the development and implementation of new algorithms in various application areas: human heart model rendering; segmentation of renal cysts images; fast template matching algorithm; infrared and visible image fusion; improved k-means algorithm; new AdaBoost algorithm for handwritten digits recognition; fuzzy c-means based on improved artificial bee colony algorithm; airplane trajectory planning; method for removing image noise; sparse impulsive noise corrupted signal recovery; improved Criminisi’s image inpainting; and image restoration based on the Criminisi algorithm; 4. Chapters 32–43 are in the area of the stereo-photography, computer vision (CV), feature extraction, watermarking, and cloud computing (CC): stereo image synthesis based on oblique UAV photography; production of stereoscopic UAV aerial photography based on optical flow; intelligent feeding system based on CV for indoor breeding locations; real-time detection substation equipment label based on CV; model construction based on Sobel and snake algorithm; parent block classification of fractal image coding; improved local binary patterns feature extraction; QR code for watermarking; software watermarking algorithm; image watermarking with multilevel embedding using various transforms; VR technology-based teaching cloud platform; and secure keyword query based on CC; 5. Chapters 44–55 are in the area of new technologies related to video cameras and 3D displays: 3D computational imaging with one liquid crystal lens camera; application of finite element analysis in calibration of non-contact displacement sensor; obstacle detection by a vehicle fish-eye camera; auto-stereoscopic 3D display system; effect of the pixel shape to the auto-stereoscopic display crosstalk; directional backlight using blue light LED for auto-stereoscopic 3D display; time-multiplexed stereoscopic display technology; eyes localization system for auto-stereoscopic display; 2D–3D auto-stereoscopic switchable display; optimization image dehazing algorithm; crosstalk suppression lookup table; and algorithm for blind separation of speech and background music. All chapters of the book were reviewed and passed the plagiarism check. The editors express their thanks to IRNet for the excellent organization, coordination, and support. Also, we thank the sponsors, the Organizing Committee members, and the International Program Committee members of IC3DIT 2019, and the authors for their hard work needed for the preparation of this book. The book will be useful for the researchers, students, and lecturers who study or work in the area of the contemporary 3D Imaging.
Sofia, Bulgaria Bhubaneswar, India Yunnan, China Krasnoyarsk, Russia
Editors Roumen Kountchev Srikanta Patnaik Junsheng Shi Margarita N. Favorskaya
Contents
1
Study Optimization of Thermal Analysis of Anisotropic Materials in 3D Field Based on One-Dimensional Heat Conduction Equation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ying He, Rui Huang, and Gefei Li
2
Shape Retrieval for 3D Models Based on MRF . . . . . . . . . . . . . . . Qingbin Li and Junxiao Xue
3
Numerical Conformal Mapping Based on Improved Hermitian and Skew-Hermitian Splitting Method . . . . . . . . . . . . . . . . . . . . . . Peng Wan, Yibin Lu, Yingzi Wang, Shengnan Tang, and Dean Wu
4
Hierarchical Clustering-Based Video Summarization . . . . . . . . . . . Fengsui Wang, Zhengnan Liu, Haiying Cheng, Linjun Fu, Jingang Chen, Qisheng Wang, Furong Liu, and Chao Han
5
Research on Multi-objective Optimization Problem of Engineering Project in 3D Field Based on Improved Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianqiang Zhao, Subei Li, and Gefei Li
6
7
8
1 9
19 27
35
3D Path Planning Based on Improved Particle Swarm Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yihu Wang and Siming Wang
45
Improving Parallel Magnetic Resonance Imaging Reconstruction Using Nonlinear Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . Xiaoyan Wang, Zhengzhou An, Jing Zhou, and Yuchou Chang
53
Low Computational Complexity Third-Order Tensor Representation Through Inverse Spectrum Pyramid . . . . . . . . . . . Roumen Kountchev and Roumiana Kountcheva
61
xi
xii
9
Contents
Spectral Reconstruction Based on Bayesian Regulation Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongdong Gong, Jie Feng, Wenjing Xiao, and Shan Sun
10 Loose Hand Gesture Recognition Using CNN . . . . . . . . . . . . . . . . . Chen-Ming Chang and Din-Chang Tseng 11 Multi-focus Image Fusion Method Based on NSST and BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaosheng Huang, Ruwei Zi, and Feilong Li
77 87
97
12 Classification of 3D Remote Sensing Images Through Dimensionality Reduction and Semantic Segmentation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Daoji Li, Chuan Zhao, Donghang Yu, Baoming Zhang, and Zhou Yuan 13 Research on Deep Learning Algorithm and Application Based on Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 117 Mei Guo, Min Xiao, and Fang Yu 14 Object Attribute Recognition Based on Hard Negative Mining and Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 123 Ming Lei and Fang Liu 15 Research on Intelligent Identification Method of Power Equipment Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 133 Zhimin He, Lin Peng, Min Xu, Gang Wang, Hai Yu, Xingchuan Bao, and Zhansheng Hou 16 Chinese Color Name Identification from Images Using a Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . 141 Lin Shi and Wei Zhao 17 Research and Design of a Key Technology for Accelerating Convolution Computation for FPGA-Based CNN . . . . . . . . . . . . . . 149 Liang Ying, Li Keli, Bi Fanghong, Zhang Kun, and Yang Jun 18 Research on Voice Mark Recognition Algorithms Based on Optimized BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 157 Kong Yanfeng, Jia Lianxing, and Zhang Xiyong 19 Single Image Super Resolution via Deep Convolutional Dual Upscaling Branches with Different Focus . . . . . . . . . . . . . . . . . . . . 165 Zuzhong Liang, Hui Liu, Zhenhong Shang, and Runxin Li 20 Human Heart Model Rendering Based on BRDF Algorithm . . . . . 179 Wenjing Xiao, Zhibao Qin, Zhaoxiang Guo, Haoyuan Shi, and Yonghang Tai
Contents
xiii
21 Level Set Segmentation Algorithm for Cyst Images with Intensity Inhomogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Jinli Fang, Yibin Lu, Yingzi Wang, and Dean Wu 22 A Robust and Fast Template Matching Algorithm . . . . . . . . . . . . . 193 Guoxi Chen, Changyou Li, and Wensu Xu 23 Infrared and Visible Image Fusion Algorithm Based on Threshold Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Wang Yumei, Li Xiaoming, Li Congyong, Wang Dong, Cheng Long, and Zheng Chen 24 Improved K-Means Algorithm for Optimizing Initial Centers . . . . 213 Jianming Liu, Lili Xu, Zhenna Zhang, and Xuemei Zhen 25 An Improved AdaBoost Algorithm for Handwriting Digits Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Chang Liu, Rize Jin, Liang Wu, Ziyang Liu, and Mingming Guo 26 FCM Based on Improved Artificial Bee Colony Algorithm . . . . . . . 231 An-Xin Ye and Yong-Xian Jin 27 Airplane Trajectory Planning Using Artificial Immune Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Lifeng Liu 28 A New Method for Removing Image Noise . . . . . . . . . . . . . . . . . . . 253 Suhua Wang, Zhiqiang Ma, and Rujuan Wang 29 Sparse Impulsive Noise Corrupted Compressed Signal Recovery Using Laplace Noise Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Hongjie Wan and Haiyun Zhang 30 An Improved Criminisi’s Image Inpainting Algorithm for Priority and Matching Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 269 Yao Li, Junsheng Shi, Feiyan Cheng, Rui Xiao, and Dan Tao 31 An Improved Image Restoration Algorithm Based on the Criminisi Algorithm and Least Squares Method . . . . . . . . . 277 Jia Xia, Feiyan Cheng, and Chao Li 32 Research on Stereo Image Synthesis Based on Oblique Photography of UAV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Zheng Jinji and Wang Yuanqin 33 Research on Production of Stereoscopic UAV Aerial Photography Based on Optical Flow Image Migration Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Xv Chen, Xi-Cai Li, Bang-Peng Xiao, and Yuan-Qing Wang
xiv
Contents
34 Design of Intelligent Feeding System Based on Computer Vision for Indoor Breeding Fixed Locations . . . . . . . . . . . . . . . . . . . . . . . 307 Tao Chi and Yunjian Pang 35 A Real-Time Detection Method of Substation Equipment Label Based on Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Lin Li, Ya bo Zhao, and Yu Li 36 Target Model Construct Method Based on Sobel and Snake Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Xiao Guang Li, Qi Ming Dong, and Yu Hang Tan 37 Parent Block Classification of Fractal Image Coding Algorithm Based on ‘Shizi’ Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Li Wang and Zengli Liu 38 An Improved LBP Feature Extraction Algorithm . . . . . . . . . . . . . . 341 Yongpeng Lin, Jinmin Zhang, and Siming Wang 39 QR Code Digital Watermarking Algorithm Based on Two-Dimensional Logistic Chaotic Map . . . . . . . . . . . . . . . . . . 349 Xiangjuan Ran and De Li 40 A Software Watermarking Algorithm Based on Instruction Encoding and Resource Section Format . . . . . . . . . . . . . . . . . . . . . 359 Zhen Chen and De Li 41 High-Security Image Watermarking with Multilevel Embedding Using Various Image Transform Techniques . . . . . . . . . . . . . . . . . 369 Ch. Hima Bindu 42 The Research and Design of Virtual Reality Technology-Based Teaching Cloud Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Dacan Li, Yuanyuan Gong, Dezheng Li, Qingpei Huang, and Guicai Feng 43 Research on Secure Keyword Query Based on Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Ying Ren, Huawei Li, Yanhong Zhang, and Weiling Wang 44 Three-Dimensional Computational Imaging with One Liquid Crystal Lens Camera and Refocusing from Extended Depth of Field Intensity and Depth Image Pair . . . . . . . . . . . . . . . . . . . . . 397 Xiaoxi Chen, Yalei Zhang, Jiacheng Ma, Liming Zheng, and Mao Ye 45 Application of Finite Element Analysis in Calibration of Non-contact Magnetic Piezoelectric Type of Displacement Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Jin Gao
Contents
xv
46 Approaching Obstacle Detection by a Vehicle Fisheye Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Yu Pu Shi, Hai Ping Wei, and Hong Fei Yu 47 Autostereoscopic 3D Display System Based on Lenticular Lens and Quantum-Dot Film . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 Xue-Ling Li, Bin Xu, Qin-Qin Wu, and Yuan-Qing Wang 48 The Effect of the Pixel Shape to the Autostereoscopic Display Crosstalk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Sheng-zhi Qiang and Yuan-qing Wang 49 A New Directional Backlight Using Blue Light LED Design for Autostereoscopic 3D Displays . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Guangjun Yin and Yuanqing Wang 50 Time-Multiplexed Stereoscopic Display Technology Based on Side-Dynamic Areal Backlight . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Chen Gangwei, Xu Bin, Bao Yucheng, and Wang Yuanqing 51 A Robust Eye Localization System for Autostereoscopic Display Using a Multiple Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Li Xicai, Liu Xuanyi, Zheng Jinji, Xiao Bangpeng, Chen Xu, and Wang Yuanqing 52 2D-3D Autostereoscopic Switchable Display Based on Multi-distance Dynamic Directional Backlight . . . . . . . . . . . . . . 475 Bin Xu, Xueling Li, and Yuanqing Wang 53 An Optimization Image Dehazing Algorithm Based on Dark Channel Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Yu Fu and Zengli Liu 54 A Crosstalk-Suppression Lookup Table Based on Gray-Level Pair Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493 Jian Wang, Po Yang, Liwei Gu, Yong Liu, and Xiaohua Li 55 A New Algorithm for Blind Separation of Speech and Background Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Chao Xiao and Haiyan Quan Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
About the Editors
Prof. Dr. Roumen Kountchev, D.Sc. works at the Faculty of Telecommunications, Department of Radio Communications and Video Technologies—Technical University of Sofia, Bulgaria. He has published 341 papers in journals and conference proceedings, 15 books and 46 book chapters, and holds 20 patents. He is a member of the Euro Mediterranean Academy of Arts and Sciences; President of the Bulgarian Association for Pattern Recognition (member of IAPR); and editorial board member of the IJBST Journal Group, International Journal of Reasoningbased Intelligent Systems, and the international journal Broad Research in Artificial Intelligence and Neuroscience. Prof. Dr. Srikanta Patnaik works at the Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, India. He has published over 100 papers in international journals and conference proceedings, 2 textbooks and 32 edited volumes. He is the editor-in-chief of the International Journal of Information and Communication Technology, the International Journal of Computational Vision and Robotics (Inderscience Publishing House) and the book series Modeling and Optimization in Science and Technology (Springer), Advances in Computer and Electrical Engineering, and Advances in Medical Technologies and Clinical Practices (IGI-Global). Prof. Dr. Junsheng Shi is the Dean of the School of Physics and Electronic Information, Yunnan Normal University. He is a member of China Illuminating Engineering Society Image Technology Specialized Committee, the China Optical Technology Optical Society Professional Committee, and the Chinese Society of Image and Graphics Technical Committee on stereoscopic imaging. He also contributes to the journals Optical Engineering and the Journal of Display Technology. In 2004 and 2008, he received awards for scientific and technological progress and natural science in Yunnan Province. In the last five years, he has completed two major projects and published over 50 papers.
xvii
xviii
About the Editors
Prof. Dr. Margarita N. Favorskaya is the Head of the Department of Informatics and Computer Techniques at Reshetnev Siberian State University of Science and Technology, RF. She is an IPC member and has chaired invited sessions at over 30 international conferences. She is a reviewer for various journals: Neurocomputing, Knowledge Engineering and Soft Data Paradigms, Pattern Recognition Letters, and Engineering Applications of Artificial Intelligence, and an associate editor of Intelligent Decision Technologies Journal, International Journal of KnowledgeBased Intelligent Engineering Systems and the International Journal of Reasoningbased Intelligent Systems. She is also a reviewer and book editor for Springer.
Chapter 1
Study Optimization of Thermal Analysis of Anisotropic Materials in 3D Field Based on One-Dimensional Heat Conduction Equation Model Ying He, Rui Huang, and Gefei Li Abstract Taking the optimization design of working uniform for high-temperature operation as an example, this paper studies the optimization of thermal analysis of anisotropic materials in 3D field. Firstly, one-dimensional heat conduction equation model is established to integrate partial differential equations and separate the variables. Then thermal resistance analysis and finite difference method are used to simulate the temperature values. Finally, Tikhonov regularization method is selected to regularize the inversion method, and Generalized crossover criterion to determine the regularization parameters. The results show that this method is of great significance in optimizing the thickness design of anisotropic materials.
1.1 Introduction Thermal analysis of anisotropic materials is very important in practical engineering, especially in the field of 3D. All kinds of uncertainty in structural material characteristics, geometric parameters and carrier caused by measurement error, manufacturing level, and environmental conditions will lead to uncertainty in structural performance and response results. In high-temperature engineering, high temperature results in anisotropy of most engineering materials. If the above uncertainty factors are ignored, the results obtained by traditional deterministic optimization methods may deviate from the required optimal performance. Research on working uniform for high-temperature operation in the 3D field can not only improve the safety of high-temperature operations, but also the engineering efficiency, which is an important measure to vigorously develop the high-tech industry and accelerate the industrial transformation. Y. He (B) · R. Huang School of Economics, Xuzhou University of Technology, Xuzhou 221111, China e-mail: [email protected] G. Li Department of Mathematics, University of California, Davis 95616, USA © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_1
1
2
Y. He et al.
In recent years, research models of thermal protective clothing transfer model are mainly divided into single-layer and multi-layer models. For the single-layer model, the researchers mainly studied the effects of the physical properties of the singlelayer protective fabric, the radiant heat of the external flame of the fabric, and the air layer thickness between the fabric and the skin. Gibson [1] proposed a singlelayer porous media heat and mass transfer model, and Chitrphiromsri [2] proposed a thermal and humidity coupling model for the interior of porous media protective clothing. For the multi-layer model, Mell proposed a heat transfer model between layers of fabric containing heat conduction and heat radiation. Mercer [3] established a multilayer dynamic heat transfer model containing phase change materials. Lu linzhen [4] established a heat transfer model of multi-layer thermal protective clothing. In this paper, two different radiation terms, heat conduction and bidirectional radiation heat transfer, are considered to establish the one-dimensional heat conduction equation model. The finite difference method and inversion method were used to optimize the shell and heat insulation model of the three-layer fabric.
1.2 Heat Transfer Mode and Process Analysis There are three types of heat transfer, including conduction, convection, and radiation. Conduction is the relative static between layers of materials, and only the thermal motion of microscopic particles occurs. Heat conduction is the transfer of heat energy between layers of materials due to motion. Define Q as per unit time through A certain area of heat. Define φ as the heat flux. Define A as heat transfer area. According to Fourier law, Q=
φ A
(1.1)
According to Eq. (1.1), the expression of the area heat flux of the material when heat transfer occurs in the one-dimensional plane can be obtained as follows: Q = −λ
dt dx
(1.2)
where t stands for temperature, x stands for heat flow, λ is thermal conductivity. Take general radiative heat transfer phenomenon as an example for analysis, as shown in Fig. 1.1. The figure shows an object with surface area of A, external temperature of T1 , and emissivity of r. In a box with an external temperature of T2 , if ϑ is the radiant heat and S the radiant area, S1 is the radiative area between two objects, ε1 is the emissivity between two objects, the heat of radiant heat transfer between them can be calculated according to the following formula:
1 Study Optimization of Thermal Analysis of Anisotropic Materials …
3
Fig. 1 General radiative heat transfer diagram
ϑ = ε1 S1 σ T14 − T24
(1.3)
where σ is the Stefan–Boltzmann constant, also known as the blackbody radiation constant, with a value of 5.67 × 10−8 W/ m 2 · k 4 (This formula shows that the radiation power per unit area per unit time is proportional to the fourth power of temperature, where w represents work, m2 represents area, and k 4 represents effective area to the fourth power). ε is emissivity (blackness), which is usually less than 1.
1.3 Optimization Model of Protective Clothing Design for High-Temperature Engineering Work 1.3.1 The Establishment of the Model and the Proposal of the Hypothesis Conditions In the high-temperature environment, if the human body’s temperature is 37 °C, the design of the following three kinds of high-temperature engineering protective clothing under different external conditions will be optimized by measuring the outside temperature of human skin. (data source: simulation experiment data at high temperature) 1. Set the optimal thickness of layer II when the thickness of layer IV is 5.5 mm and the ambient temperature is 65 °C, so that the outer skin temperature of the human body should be within 47 °C after working for 60 min, and the time greater than 44 °C should be controlled within 5 min;
4
Y. He et al.
2. Develop the optimal thickness of layer II and IV at the ambient temperature of 80 °C, so that the outer skin temperature of the human body should be within 47 °C after half an hour of operation, and the time greater than 44 °C should be controlled within 5 min.
1.3.2 Optimization Model Based on Inversion Method We abstract the problem of temperature change calculation into the inverse problem of heat conduction equation, and use inversion method [5] to get the optimal thickness of the second layer. The temperature distribution on the outer skin of the dummy can be expressed as follows: T = f (x, y, z, t)
(1.4)
Suppose T is the temperature at any point between the outer part of the dummy and the ambient air layer; λ is the thermal conductivity coefficient of each material; ρ is the density of each material; c is the specific heat of the material; T is the time; x, y, and z are the spatial coordinates of any point, then the heat conduction equation can be expressed as follows:
∂T ∂T ∂T + 2+ 2 λ(z) 2 ∂X ∂y ∂z
= ρ(z)c(z)
∂T ∂t
(1.5)
The initial condition is the temperature distribution of the outer layers of the dummy at time t = t0 = 0. μ(z) is the temperature distribution at time t = 0. Similarly, it is assumed that the temperature of the four layers of materials between the external and external air layers of the dummy at the initial moment is 25 °C at room temperature, and the temperature of the external environment is 65 °C, and the temperature is uniformly distributed. Then, T |t=0 = μ(z)
(1.6)
Among them, 5.4 mm ≤ z ≤ 15.6 mm + l. After one-dimensional simplification of Eq. (1.4), if the thickness of layer II is l, the heat conduction equation will be shown as λ(z)
∂T ∂2T =0 − ρ(z)c(z) ∂z 2 ∂t
(1.7)
Among them, T |t=0 = μ(z), 5.4 mm ≤ z ≤ 15.6 mm + l, 0 s < t ≤ 3600 s.
1 Study Optimization of Thermal Analysis of Anisotropic Materials …
5
ϕ(t) is the temperature change function of each layer. Finally, Eq. (1.6) is used to calculate the functional equation of temperature variation (YI –YIV ) of each layer in the time: YI = (4.3e − 25)x 7 + (−9.1e − 21)x 6 + (8.0e − 17)x 5 + (−3.7e − 13)x 4 + (9.5e − 10)x 3 + (−1.4e − 06)x 2 + 0.001x + 73.99
(1.8)
YII = (1.8e − 24)x 7 + (−3.9e − 20)x 6 + (3.4e − 16)x 5 + (−1.5e − 12)x 4 + (4.0e − 09)x 3 + (−5.8e − 06)x 2 + 0.004x + 70.75
(1.9)
YIII = (7.9e − 24)x 7 + (−1.7e − 19)x 6 + (1.5e − 15)x 5 + (−6.7e − 12)x 4 + (1.7e − 08)x 3 + (−2.5e − 05)x 2 + 0.019x + 56.49
(1.10)
YIV = (1.7e − 23)x 7 + (−3.5e − 19)x 6 + (3.1e − 15)x 5 + (−1.4e − 11)x 4 + (3.7e − 08)x 3 + (−5.3e − 05)x 2 + 0.040x + 36.15
(1.11)
According to the thermal resistance analysis, the heat flow between the air layer IV and the skin of the dummy is positively correlated with the skin surface temperature of the dummy: −λ0
∂T |z=0 = h 0 (u|z=0 − Tr ) ∂z
(1.12)
For the skin surface temperature of the dummy, the second boundary condition is used to calculate, and the formula is ∂ T = φ(t) (1.13) ∂z z=l where h 0 is the heat exchange coefficient between the IV layer and the dummy, Tr is the internal temperature of the dummy, and φ(t) is the function of the corresponding change of skin surface temperature and time of the dummy. Under the premise that the ambient temperature is 65 °C and the thickness of layer I, layer III, and layer IV is known, the thickness values are shown in Table 1.1. In order to invert the thickness of the material layer II, two additional conditions are added according to the experimental conditions. Table 1.1 Numerical table of thickness
Material layer
Thickness (mm)
Layer I
0.6
Layer III
3.6
Layer IV
5.5
6
Y. He et al.
(1) when working for 60 min, the temperature on the outside of the skin of the dummy should not exceed 47 °C, that is the temperature on the outside of the skin of the dummy should be less than or equal to 47 °C at the moment of the 60th min, the 3600th s, with formula shown as φ(3600) ≤ 47
(1.14)
(2) when working for 60 min, the time when dummy lateral skin temperature is more than 44 °C is not more than five minutes, that is the outer skin temperature of a dummy, within 0–55 min at least, can’t exceed 47 °C, namely, in the 55th min, the 3300th s, the temperature on the outside of the skin of the dummy should be less than or equal to 44 °C, with formula shown as φ(3300) ≤ 44
(1.15)
Based on the two additional conditions above, the formula can be shown as:
φ(3300) ≤ 44 φ(3600) ≤ 47
(1.16)
Equations (1.14)–(1.18) constitute a positive problem for calculating the temperature of the material layer of the external protective clothing of the dummy. Equations (1.14)–(1.19) constitute the inversion problem of temperature calculation of material layer of the dummy’s external protective clothing. As usual, for the inversion of partial differential equations, each information value obtained by continuous forward simulation is used to modify the initial value continuously, and then closer to the true value. Therefore, if we want to invert the partial differential equation, we must first carry out forward calculation. In this paper, the finite difference method is used to conduct numerical simulation of the temperature of the external surface of the skin of the dummy [6]. The discrete scheme difference and its details are as follows: ⎧ τ τ ⎪ ⎨ Ti, j+1 = h 2 ρi ci λi+1 Ti+1, j + 1 − h 2 ρi ci λi+1 − 0 0 T1, j = hλa0 I ( jτ ) − hh 0λ−λ T0, j + hh θ λ0 a ⎪ 0 ⎩ Ti,0 = ϕ(i h), Tn+1, j − Tn, j = φ( jτ )
τ h 2 ρi ci
λi Ti, j +
τ h 2 ρi ci
λi Ti−1, j
(1.17) where Ti j represents the temperature value of layer i at moment j; l represents the thickness of material layer; τ represents the time sampling interval; here, 1 s is taken; λi , ρi , ci represent the values of parameters of layer i, 0 ≤ i ≤ 4. From the forward simulation, the surface skin temperature of the dummy Tn, j can be obtained. If the temperature can meet the requirements of condition (1.9), then the thickness of layer II in forward modeling is the optimal thickness to be solved. If the temperature can’t meet the requirements in the conditions, the new thickness of
1 Study Optimization of Thermal Analysis of Anisotropic Materials …
7
layer II will be obtained according to the iterative method, and then the problem will be substituted into the forward simulation, so that the new skin surface temperature of dummy can be obtained. Specific inversion steps are as follows [7]: Step 1 Set the initial thickness of the garment of layer II as d0 , and estimate the approximate thickness range of layer II as [a, b]. Step 2 Substitute the endpoints of range a and b into the equation set (1.10) to calculate the temperature of the surface skin of the dummy. Step 3 Compare temperature of the Tn,1 j , Tn,2 j with condition (1.9). If conditions are met, the scope will be the optimal of the thickness of layer II. If a or b doesn’t meet the condition, the dichotomy will be used to split the range above. A new range of thickness [a1 , b1 ] consists of the new endpoint and another endpoint satisfying the condition, which is back to Step 2 We use the forward modeling method above to continuously simulate the numerical value and gradually approach the truth value by combining with additional conditions. In the calculation process of inversion calculation, slight changes of some data will directly produce huge fluctuations. Therefore, Tikhonov regularization method is adopted in this paper to regularize the inversion method. [a, b] is the thickness range of layer II, and L is the thickness of layer II. The regularization solution is xa = min Ax − b22 + a 2 L x22
(1.18)
For regularization parameter a, this paper selects Generalized crossover criterion to determine the regularization parameter [8], and the formula is aG = argmin
Axa − b22 trace I − A A I
(1.19)
Finally, the optimal thickness of layer II is 10.3 mm by using inversion method.
1.4 Conclusion Based on the excellent research results, combined with the reality of thermal analysis of anisotropic materials in 3D field, this paper tries to solve the optimization problems of high-temperature protective clothing in engineering. The thermal conductivity of each layer is used to calculate the heat flux density; the partial differential equation is used to determine the relationship between various factors, which helps reduce errors and improve precision. Besides, scientific modeling, overall consideration of heat conduction and the two-way radiation heat transfer of two different items, the optimization of three-layer fabric shell (shell, waterproof layer, and insulating layer), and the thermal insulation layer model are not only effectively applied in
8
Y. He et al.
optimization of the thermal analysis of anisotropic materials in 3D field, but are crucial to the healthy and orderly development of manufacturing industry.
References 1. Gibson, P.: Multiphase heat and mass transfer through hygroscopic porous media with applications to clothing materials. Fiber, 183–194 (1996) 2. Chitrphiromsri, P., Kuznetsov, A.V.: Modeling heat and moisture transport in firefighter protective clothing during flash fire exposure. Heat Mass Transf., 206–215 (2005) 3. Mercer, G.N., Sidhu, H.S.: Mathematical modelling of the effect of fire exposure on a new type of protective clothing. Aust. N. Z. Ind. Appl. Math. J., 289–305 (2007) 4. Lu, L.Z.: Heat Transfer Model and Parameter Optimal Decision of Multilayer Thermal Protective Clothing. Zhejiang University of Science and Technology (2018) 5. Cui, H.M.: Inverse Calculation of Nonlinear Heat Conduction Equation. Zhejiang University, Hangzhou (2011) 6. Zhao, H.J., Sui, Z., Zhang, P.J.: Optimal thickness of flat roof based on temperature field inversion algorithm. J. Jilin Univ. 27(01), 90–98 (2009) 7. Zhou, M.: Numerical Inversion of a Class of Thermal Conductivity. Zhejiang University, Hangzhou (2007) 8. Xiao, T.Y.: Numerical Solution of Inverse Problem. Science Press, Beijing (2003)
Chapter 2
Shape Retrieval for 3D Models Based on MRF Qingbin Li and Junxiao Xue
Abstract Recent advances in modeling, digitization, and visualization of threedimensional shapes have led to a surge in the number of available three-dimensional models. Therefore, the technology of three-dimensional retrieval becomes very necessary. This paper introduces a content-based 3D models retrieval method. We propose a unified framework to deal with the complex mesh structure of threedimensional models, which has one-dimensional potentials describing local similarity and higher-order potentials describing spatial consistency. A three-dimensional surface extension is proposed, which describes the three-dimensional graph as a set of local rotation and scale invariant points. Effective indexing and approximate optimization techniques are also used to speed up MRF reasoning.
2.1 Introduction With the rapid development of information technology, especially the rapid development of 3D technology and the growing popularity of acquisition and capture devices, 3D model has been widely used in the industry and our life, such as enterprise product design, film and television animation, role design and environmental design in the online games, etc. It takes a lot of energy and time to build high quality 3D models. Retrieving 3D models that meet users’ needs in massive 3D data and realizing resource reuse can greatly improve the efficiency of 3D modeling, industrial product design, and game design. 3D model retrieval is a hot research topic in the field of information technology. It is of great significance to promote the development of technology and industry such as 3D printing, intelligent recognition, digital media, and so on.
Q. Li Department of Mathematics, Zhengzhou University of Aeronautics, Zhengzhou 450046, China J. Xue (B) School of Software, Zhengzhou University, Zhengzhou 450002, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_2
9
10
Q. Li and J. Xue
3D model retrieval has always been an important research topic in computer graphics and computational geometry. In the past period of time, researchers have done extensive efforts to construct effective 3D shape retrieval algorithms [1]. These methods can be roughly divided into two categories, the first one is sketch-based [2, 3] and another one is content-based [4]. Sketch-based retrieval method is to project the 3D model to get its 2D projection image, select several 2D images from different viewpoints, and use the characteristics of these 2D images to describe the 3D model. If the two 3D models are similar from all viewpoints, they can be considered to be similar [5, 6]. This method reduces the complexity of feature extraction and similarity measurement greatly by reducing the 3D model from 3D space to 2D space. It is easy to extract features and easy to generate indexes. Moreover, the retrieval algorithm has good stability [7]. However, in the process of dimensionality reduction from 3D to 2D, it is impossible to describe 3D models accurately and comprehensively, and the retrieval accuracy is poor. In order to reduce the loss of 3D structure information, additional constraints are often needed, which inevitably results in a large number of redundant data, which makes the algorithm face the challenges of speed and scalability. Contend-based retrieval method is performed by submitting original 3D model files as input conditions. The method mainly includes two sub-processes: feature extraction and similarity calculation. Feature extraction is the first and most important step, which mainly extracts shape features from the spatial geometric information of the model. Similarity calculation is to compare the characteristics of the input model with all the models in the model library according to some criteria, sort the similarity from large to small, and return the retrieval results to users. The shape feature that can appropriately describe the geometric information of the 3D models is an important problem for content-based retrieval method. In order to solve this problem, many geometric features have been designed to capture the local geometry of given models, such as Mesh Hog [8], HKS (Heat Kernel Signature) [9, 10], ISC (Intrinsic Shape Context) [11], and PFH (Persistent Feature Histograms) [12]. In order to realize the sensitivity of these local descriptors to model noise [13], researchers proposed some methods using advanced topological features [14], or aggregate low-level functions into intermediate representations, such as an extended word package model [15]. Although great progress has been made in the design and extraction of 3D model features, these methods are highly dependent on the quality of the 3D model and are often sensitive to the noise of the model. In addition, the existing methods have not explicitly addressed the challenges of model incompleteness. Given the large amount of accumulated 3D data, there is a growing need to develop automatic techniques that facilitate efficient search, manipulation, and recognition of 3D data. Without losing the depth dimension, 3D information from sensors has much less ambiguity than 2D photos, such as preserving physical size and distance from the camera. This helps bring new opportunities to the computer vision field. In addition, industrial prototypes for self-driving vehicles based on 3D sensors have been manufactured, which is extremely challenging, if not impossible, with 2D vision. The main challenge of 3D model retrieval is the structure of 3D data. In twodimensional images, sensor units with grid structure can capture and generate dense
2 Shape Retrieval for 3D Models Based on MRF
11
data. For every pixel of a two-dimensional image (except boundary points), there are adjacent points on its top, bottom, left, and right (except boundary pixels), so many efficient processing techniques can be applied. However, 3D data is composed of sparse vertices and edges, which makes the technology based on dense data structure unable to meet the application requirements. A typical (2D) search engine contains one or more feature representations, index structures, and spatial consistency-based sorting modules. The sparse structure of 3D data makes many components of the existing retrieval engine unsuitable. For example, DoG, SIFT, SURF, and other 2D feature representation methods based on convolution and gradient cannot be applied to 3D data. 3D data pose new challenges, especially for those captured from low-cost sensors. One such challenge is that the output from these consumer level sensors is extremely noisy and unreliable. This causes significant performance degeneration of existing 3D indexing and understanding algorithms, and thus requires additional robustness in the algorithm design. The second challenge is that the captured model from 3D sensors is typically incomplete, in contrast with the assumption of traditional 3D content analysis, which is that the model is manually designed and thus usually complete. Such incompleteness usually originates from the self-occlusion of the objects. However, the existing methods cannot solve these problems. We present a new content-based method for 3D models retrieval based on MRF (Markov Random Field) in this paper. An unified framework is proposed to address the unique mesh structure of the 3D models, with the unary potential that describes local similarity, and higher-order potential for the spatial consistency. A threedimensional surface extension is proposed, which describes the three-dimensional graph as a set of local rotation and scale invariant points. Effective indexing and approximate optimization techniques are also used to speed up MRF reasoning. Random forests are used to estimate approximate similarity effectively, so as to determine the one-dimensional potential.
2.2 Feature Descriptor The main challenge of 3D retrieval is the structure of 3D data. We consider the information provided by approximate retrieval results based on local regions to solve the problem of incomplete 3D model data, while taking into account the consistency of retrieval results in different local regions to deal with model noise. Considering that the 3D model is composed of interrelated parts, and different features in the feature set of the same scale/level reflect different geometric attributes of the object (such as local curvature, local deformation degree, etc.), the robust feature descriptor of the 3D model is constructed by fusing the same level of features and discussing the constraints between different levels of local features. In this paper, we propose a method to extend the popular surf features to threedimensional comparing with the previous methods which only focus on the overall shape and appearance. To solve the problem of 3D graphics retrieval, we divide
12
Q. Li and J. Xue
a group of graphics into two independent sets: training data and query data. Each three-dimensional graph is represented as a set of vertices and triangles. The extraction of the 3D feature points, the detector, is as follows: First, we vowelize a shape in a volumetric 3D cube using the intersection of faces with the grid-bins, after which each shape is uniformly scaled to fit the cube with 40 bins along each axis. Next, we compute a saliency measure S for each gridding and several scales σ. Each octave is further subdivided into a constant number of scale levels. It is important that the difference between neighboring scale levels in each octave is higher than in the previous octave. Thus, the algorithm can focus on details in the finer scale levels (octave) because the neighboring scale levels will not change as much as for coarse scale levels. 3D SURF was performed over three octaves. Feature points correspond to the local extremes of the Hessian filter responses. We define S as following. ⎛ ⎞ → → → L x x (− x , σ ) L x y (− x , σ ) L x z (− x , σ ) → → → → → S(− x , σ ) = |H (− x , σ )| = ⎝ L yx (− x , σ ) L yy (− x , σ ) L yz (− x , σ ) ⎠ → − → − → L (− zx x , σ ) L zy ( x , σ ) L zz ( x , σ )
(2.1)
where L xx is the second partial derivative in the X direction, L yy is the second partial derivative in Y direction, L zz is the second partial derivative in Z direction, and L xy , L yz , L zx are mixed second partial derivatives. This means that, unlike the two-dimensional surface case, the positive value of s does not guarantee that all the eigenvalues of H have the same symbol. Therefore, not only spotted signals can be detected, but saddle spots can also be detected. In addition, these detections occur with local maximum and minimum values. Therefore, they can be found even far from the surface of the object. Finally, non-maximum suppression method is used to extract features from volume.
2.3 Framework of Retrieval 2.3.1 Notations Suppose the database contains N 3D mesh models {M n } for n = 1, 2, …, N, where n is the model index. We use a user-captured model M q which is incomplete and has some noisy as the query. We aim to get a ranked list of the 3D models in the database based on our method. In our method, the model which is more similar to the query can get higher ranked score. In 2D recognition or scene understanding approaches, MRF is always used because their inter-node potential can effectively encapsulate the expected label smoothness in a neighborhood. This property makes MRF looks promising because of the robustness it provides to resolve sensor noise. From another perspective, MRF
2 Shape Retrieval for 3D Models Based on MRF
13
is also sufficiently flexible to apply to non-grid structures, with various definitions of cliques and corresponding potential functions. In this paper, we construct a cross-domain MRF model with vertex set and edge set. In fact, the vertex set of MRF model contains two virtual types of vertices: 3D hyper-nodes and 2D hyper-pixels. Among them, 3D point clouds are obtained based on smooth segmentation, and 2D reference images are divided based on Mean-shift clustering.
2.3.2 Formulation The sparse characteristics of 3D model data and the data noise and incompleteness caused by sampling equipment are the key problems to be solved. Statistical method based on random field can consider the relationship between spatial nodes, and different levels of features can be defined by corresponding potential functions, so it can be applied to 3D data retrieval with noise efficiently and flexibly. The essence of statistical method is to model the 3D model from the statistical point of view. The local feature information of each node in the model is regarded as a random component with a certain probability distribution. Markov random field can effectively describe the spatial information of 3D model. We use onedimensional potential function to describe the local similarity estimation of nodes, use higher-order potential function to describe the geometric structure of node neighborhood, and use probability graph model to describe the three-dimensional data characteristics. N N and y = {yi }i=1 are the associated With regard to the graph (V, E), x = {xi }i=1 random variables. We aim to solve a problem which maximize the joint distribution prob(y|x). Suppose that the above joint distribution obeys the Markov property. And then we decompose the distribution with two items, one item is probu which is defined on each vertex, and another one is a pairwise terms probp which is defined on each pair of connected vertices, prob(y|x) =
vi ∈V
probu (yi |xi )λ
1−λ prob p yi , y j |xi , x j
(2.2)
(vi ,v j )∈ε
where λ ∈ [0, 1] is a parameter that balances the weights of the two terms. If λ is closer to 0, the score of the shape similarity is higher. Otherwise, if λ is closer to 1, the score of the geometric consistency is higher. A graphical illustration of the nodes and potential functions is shown in Fig. 2.1.
14
Q. Li and J. Xue
Fig. 2.1 Framework
2.3.3 Potential Function To avoid possible numerical problems in the optimization process, we introduce a potential function ψ(y|x) that is the negative logarithm of the joint probability, thus to be minimized, y ∗ = arg max prob(y|x) y
= arg min(− log prob(y|x)) y
⎛
= arg min⎝λ y
vi ∈V
ψu (yi |xi ) + (1 − λ)
⎞ ψ p yi , y j |xi , x j ⎠.
(2.3)
(vi ,v j )∈ε
Similarly, the potential function is decomposed into two terms. Univariate terms provide a powerful similarity score estimation, considering only the local shape of a single 3D patch, i.e., shape similarity. Paired terms aim to further refine scores by enhancing geometric consistency between adjacent patches. Through the combination of these two items, our method can manage cross-domain partial matching with a single item, but it is insensitive to model noise and incompleteness because of the geometric consistency embedded in the paired items. By using our potential function design, the final objective ψ(y|x) has the form of quadratic function, so we use gradient descent to optimize the local minimum effectively. A natural problem with this formula is scalability, especially considering that optimization in Eq. (2.3) may involve hundreds of variables with thousands of dimensions. However, as we will soon show, by exploring the sparsity of the problem and using discriminatory random forests, such MRF inferences can be very effective and can be extended to large-scale data sets.
2 Shape Retrieval for 3D Models Based on MRF
15
2.4 Spatial Consistency Test Existing methods of spatial consistency checking are still insufficient to meet the challenges of low-quality models collected by consumer sensors. In this paper, we take Markov random field as the core to construct the overall framework of spatial consistency checking. We use a random forest method to compute the ranked score |V | μ(xi )i=1 of the 3D models in database based on the input condition. A regression process is constructed by two steps: training and testing. Compared with previous methods, this scheme uses abundant geometric information (instead of traditional paired spatial relation checking) to compensate for model noise and incompleteness.
2.4.1 Training The training data contains all the features extracted from the 3D model as input from the database model, and the features of the query model as discrete responses. For random forests, the training of each decision tree recursively uses standard information gain algorithm based on linear classifier to partition data. Finally, each leaf node in the decision tree obtains a score vector: PL = pl1 , . . . , pln which can measure the frequency of a specific 3D model falling on the leaf node.
2.4.2 Testing Firstly, we examine the trained random forest decision tree from root node to leaf node. Average the score of the retrieved leaf nodes. The similarity scores of query model and database model are approximated by approximation. Compared with the traditional method of calculating similarity score by using precise feature matching algorithm, the regression analysis method can capture the basic distribution of features by using discriminate decision model, so it can more robustly deal with the challenge of consistency test caused by model noise.
2.4.3 Index Based on Random Forest An important part of a visual search engine is the exponential mapping input mode or a limited set of candidate models. Hence, the time consuming of ranking does not require the whole database to be carried out. Although we do not have a clear
16
Q. Li and J. Xue
index module in the pipeline, the proposed random forest as the index, because the forest training is designed to maximize the child nodes (i.e., minimizes the entropy of the distribution in the source model). This optimization goal effectively makes the distribution become more sparse, the depth of the training process is increased, so that the leaf node only contains a few three-dimensional model of the function. Thus, in the test phase, when we get the similarity score from the root to the leaf, most of the vector entries are virtually zero, so filter most of the models and allow us to do only the ranking based on some nonzero terms.
2.4.4 Measure Similarity We find that when there is one or more trees in a random forest, the maximum similarity score of the CAD model is usually the most similar to that of the query. But for other CAD models, the higher similarity score does not necessarily mean that it is more similar to queries. This is reasonable because a decision tree as a classification only tries to ensure that it is the most similar to the highest score and that the most similar category in the second category will have a second score or similar order. Therefore, it is not suitable for retrieval scenarios. However, with the increase of the number of trees in the forest, the score of regression similarity increases gradually. That is, with the increase of the number of trees, the score of regression similarity becomes a better indicator of human perceptual similarity. This can be interpreted as a set of decision trees. And every decision tree will make a hard classification, even a large number of data from different training parts will be combined. Total probability can provide a similarity measurement model.
2.5 Experiments We do some experiments in a PC (Intel(R) Core(TM) I3 CPU2.3 GHz, 4 GB memory) running Windows 7. We plot the precision-recall curves for all the methods in Fig. 2.2, which further confirms the clear performance gain of the proposed method. Our experiments are classified into two types: (1) The 3D models in the database are captured from the 3D scanner with a SHREC 09 part model of tracking, including incomplete and noise models; (2) 3D models of the second data set contains the query by Microsoft access sensors used in SHREC 13. The results of the proposed method has been clearly demonstrated the superior performance. Accurate recall curves are given for different dimensions of feature vectors, and corresponding parameters are set to try to deduce the best resolution of feature vectors. Then, by comparing different variants of the method, we prove that some settings are the best choice. Finally, we compare different distance calculations. When choosing the best representations of technology (dimensions and parameter settings), we compare the best representations with each other for all our methods. Then, we compare our method with the latest
2 Shape Retrieval for 3D Models Based on MRF
17
Fig. 2.2 Precision-recall curves using the L1 norm as distance metric
technology. As shown in Fig. 2.2, we observe L1 norm convergence when using a reclassified set.
2.6 Conclusions In this paper, we present a new content-based shape retrieval method for 3D models obtained by acquisition tools. We consider the 3D models which are noisy and incomplete. A retrieval formulation based on MRF is constructed. Additionally, we define an unary potential function to describe the local similarity of the model, and a higher-order potential function is also defined in the paper to measure the spatial consistency. Therefore, the presented method can solve the challenging problem due to the noise of the models obtained by low-cost acquisition tools.
References 1. Funkhouser, T., Min, P., Kazhdan, M., Chen, J., Halderman, A., Dobkin, D., Jacobs, D.: A search engine for 3D models. ACM Trans. Graph. 22(1), 83–105 (2003) 2. Yoon, S.M., Scherer, M., Schreck, T., Kuijper, A.: Sketch-based 3D model retrieval using diffusion tensor fields of suggestive contours. In: Proceedings of ACM Multimedia, pp. 193–200 (2010)
18
Q. Li and J. Xue
3. Shao. T., Xu, W., Yin, K.K., Wang, J., Zhou, K., Guo, B.: Discriminative sketch-based 3D model retrieval via robust shape matching. In: Computer Graphics Forum (PG), 2011–2020 (2011) 4. Zaharescu, A., Boyer, E., Varanasi, K., Horaud, R.: Surface feature detection and description with applications to mesh matching. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 373–380 (2009) 5. Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. Comput. Graph. Forum 28(5), 1383–1392 (2009) 6. Bronstein, M., Kokkinos, I.: Scale-invariant heat kernel signatures for non-rigid shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1704–1711 (2010) 7. Kokkinos, I., Bronstein, M.M., Litman, R., Bronstein, A.M.: Intrinsic shape context descriptors for deformable shapes. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 159–166 (2012) 8. Rusu, R.B., Blodow, N., Beetz, M.: Fast point feature histograms (FPFH) for 3D registration. In: Proceedings of the International Conference on Robotics and Automation (ICRA), pp. 3212– 3217. IEEE (2009) 9. Dutagaci, H., Godil, A., et al.: SHREC 09 track: querying with partial models. In: Proceedings of Eurographics 3DOR, pp. 69–76 (2009) 10. Tung, T., Matsuyama, T.: Topology dictionary for 3D video understanding. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1645–1657 (2012) 11. Huang, P., Hilton, A., Starck, J.: Shape similarity for 3D video sequences of people. Int. J. Comput. Vision 89(2–3), 362–381 (2010) 12. Bronstein, M., Bronstein, M.M., Guibas, L.J., Ovsjanikov, M.: Shape Google: geometric words and expressions for invariant shape retrieval. ACM Trans. Graph. 30(1) (2011) 13. Pickup, D., Sun, X., et al.: SHREC 14 track: shape retrieval of nonrigid 3D human models. In: EG 3DOR (2014) 14. Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2393–2406 (2012) 15. Knopp, J., Prasad, M., Willems, G., Timofte, R., Gool, L.J.V.: Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of the ECCV (2010)
Chapter 3
Numerical Conformal Mapping Based on Improved Hermitian and Skew-Hermitian Splitting Method Peng Wan, Yibin Lu, Yingzi Wang, Shengnan Tang, and Dean Wu
Abstract In this paper, we proposed a new method for numerical conformal mapping function. In order to obtain the highly accurate numerical results of this method, we reduce calculations of the charge points to improved Hermitian and skewHermitian splitting method based on (k, j)-Padé iteration. Experimental examples show that the proposed method has high-precision.
3.1 Introduction Conformal mapping plays an important role in fluid mechanics, electrostatic field, image processing, and many other practical problems. However, solving the analytic solution in most other engineering problems will become very complicated or even impossible. The numerical conformal mapping based on the charge simulation method was proposed by Refs. [1–3]. In this paper, the Hermitian and skewHermitian splitting method based on the iteration method of (k, j)-Padé is used to solve the quantity of the charges, where the Hermitian and skew-Hermitian splitting method was proposed in Refs. [4, 5] and the iteration method of (k, j)-Padé was created by bibliography [6, 7]. In the numerical experiment, we implemented several groups of experiments with the different number of the charge points, and realized the conformal image mapping.
P. Wan · Y. Lu (B) · S. Tang Faculty of Science, Kunming University of Science and Technology, Kunming 650500, China e-mail: [email protected] Y. Wang Computer Center, Kunming University of Science and Technology, Kunming 650500, China D. Wu School of Mathematical Sciences, University of Electronic Science and Technology, Chengdu 611731, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_3
19
20
P. Wan et al.
3.2 Numerical Conformal Mapping Based on the Charge Simulation Method Consider mapping the domain D outer region of the Jordan curve C1 , C2 , . . . , Cn in the z-plane into the w-plane containing level slits. Assume that z = 0 in the domain D, under the regularization conditions f (∞) = ∞, f (∞) = 1, a0 = 0. The conformal mapping maps the boundary curve C1 , C2 , . . . , Cn into the level slits, w = f v (z) can be written as f v (z) = z + i(gv (z) + i h v (z))
(3.1)
where gv and h v are a pair of conjugate harmonic functions. Based on the charge simulation method, gv (z) + i h v (z) can be approximated by G v (z) + i Hv (z) = Q 0 +
Nl n
Q li log(z − ζli )
(3.2)
l=1 i=1
N1 + N2 + · · · Nn -dimensional constrained equations are obtained n N l −1
l=1 i=1
Q li
z − ζli log z − ζli+1
− Vm = −ym j
z m j ∈ Cm , j = 1, 2, . . . , Nm , m = 1, 2, . . . , n
(3.3)
i where Q li = k=1 Q lk , Nl is the number of charge points within Cl , ζli (i = 1, 2, . . . , Nm ) are distributed on the boundary Cm , Vm is the approximate value of Vm , that is, the position of the slits after conformal mapping. The charge quantity calculated by Eq. (3.3) is used to approximate Fv .
3.3 Improved HSS Method Based on (k, j)-Padé Iteration Method The constraint equation can be rewritten as follows Ax = b
(3.4)
where A ∈ R (N1 +N2 +···Nn )∗(N1 +N2 +···Nn ) , x ∈ R N1 +N2 +···Nn , b ∈ R N1 +N2 +···Nn , Nl is the number of charge points within Cl (l = 1, 2, . . . , n). If the number of charge points too large, the coefficient matrix will become an ill-conditioned matrix, so special methods are needed to solve the charge quantity.
3 Numerical Conformal Mapping Based on Improved …
21
First of all, multiply both sides of Eq. (3.4) by ωi K A T , according to literature [4], A T A = H + S, the iterative formula is obtained xk+ 21 = E − ω1 K −1 (M1 − N1 ) xk + ω1 K −1 A T b xk+1 = E − ω2 K −1 (M2 − N2 ) xk+ 21 + ω2 K −1 A T b where M1 = α I + H, N1 = α I − S, M2 = α I + S, N2 ∗ 1 T A A + AT A , = α I − H, H = 2
∗ 1 T S= A A − A T A , α = λ(Hmax ) ∗ λ(Hmin ). 2 According to literature [6, 7] 1 − |x|
1 1 = (q(x) − |x|h(x)) = r (x) t(x) q(x)
(3.5)
then we obtained q1 xk+ 21 − xk = ω1 h 1 K −1 A T b − (M1 − N1 )xk q2 xk+1 − xk+ 21 = ω2 h 2 K −1 A T b − (M2 − N2 )xk+ 21 . The convergence of the improved HSS algorithm: p
(x)
Theorem 1 rk, j (x) = qk,k, jj (x) is a rational function, where pk, j = −1 j k+ j j xi , qk, j (x) = pk, j (−x), if k > j, x ∈ R_, ω > 0, Improved i=0 i i i! HHS method based on (k, j)-Padé iterative method is convergent. Proof When k > j, x ∈ R_, ω > 0, then rk, j < 1. Since Mi −N1 is a positive definite matrix, all eigenvalues are greater than 0, then we have λ(−ωi (Mi − Ni )) < 0, ρ(r) =
1 − |λ| 1 = |r (λ)| < 1. max λ∈λ(−ωi (Mi −Ni )) t(λ) λ∈λ(−ωi (Mi −Ni )) max
ρ(r) < 1 means the iterative formula is converging.
22
P. Wan et al.
3.4 Numerical Examples In this section, we have implemented the algorithm in Sect. 3.3 by learning Ref. [8]. All numerical examples were performed on 2.80 GHz processor in Microsoft Windows 2018 operating system by using MATLAB 2016b. According to the lecture [9, 10], the following error formula can be obtained: E vl = |Vl − Vld |
(3.6)
where Vl , Vld are the position of horizontal slits, Vld is twice as the number of charge points as Vl . Example 1 Exterior of the orange-shaped domain. The boundary: z 2 − 1 = a 2 , a > 1, z = x + i y. Parametric equation of the boundary: ρ(t) =
cos(2t) +
cos2 (2t) + a 4 − 1 .
The constraint points: z j = ρ 2π j − 1) exp i 2π ( j − 1) , j = 1, 2, . . . , N . N ( N The charge points: ζ j = z j + iq z j+1 − z j−1 , j = 1, 2, . . . , N , where z 0 = z N , z N +1 = z 1 . Figure 3.1 shows the distribution of charge points (‘+’) and constraint points (diamond). Figure 3.2 shows the conformal mapping of Fig. 3.1 by improved algorithm. From the Fig. 3.3 we can see our method has obvious advantage to solve the ill-conditioned constraint equation, both in accuracy and in stability.
3 Numerical Conformal Mapping Based on Improved …
Fig. 3.1 Problem domain
Fig. 3.2 Conformal mapping of Fig. 3.1 by Improved algorithm
23
24
Fig. 3.3 E V l
Fig. 3.4 Condition number of constraint equation
P. Wan et al.
3 Numerical Conformal Mapping Based on Improved …
25
Figure 3.4 shows the condition number of constraint equations with the change of the number of charge points. Figure 3.5 is the original image without pixels in the orange-shaped domain. Figure 3.6 is the result of numerical conformal mapping of Fig. 3.5. Fig. 3.5 The original image
Fig. 3.6 Conformal mapping of Fig. 3.5
26
P. Wan et al.
3.5 Conclusions The improved HSS method is proposed to solve charge quantity. From the results of numerical experiments, our method, to a great extent, can improve the accuracy of the horizontal slits position. In the future, we will try to use high-precision numerical conformal mapping to study medical images. Acknowledgements This paper is supported by the National Natural Science Foundation of China (Grant Nos. 11461037).
References 1. Amano, K., Okano, D., Ogata, H., Sugihara, M.: Numerical conformal mappings of unbounded multiply-connected domains using the charge simulation method. Alumni Revue Du Cercle Des Alumni De La Fondation Universitaire. 26(1), 35–51 (2003) 2. Amano, K.: A charge simulation method for numerical conformal mapping onto circular and radial slit domains (eng). Siam J. Sci. Comput. 19(4), 1169–1187 (1998) 3. Okano, D., Ogata, H., Amano, K.: Numerical conformal mappings of bounded multiply connected domains by the charge simulation method. J. Comput. Appl. Math. 159(1), 109–117 (2003) 4. Bai, Z.Z., Golub, G., Li, C.K.: Convergence properties of preconditioned Hermitian and skewHermitian splitting methods for non-Hermitian positive semidefinite matrices. Math. Comput. 76(257), 287–298 (2007) 5. Bai, Z.Z., Golub, G.H., Ng, M.K.: On inexact hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. Linear Algebra Appl. 428(2), 413–440 (2008) 6. Kirsche, A., Böckmann, C.: Rational approximations for ill-conditioned equation systems. Appl. Math. Comput. 171(1), 385–397 (2005) 7. Kirsche, A., Böckmann, C.: Padé iteration method for regularization. Appl. Math. Comput. 180(2), 648–663 (2006) 8. Mukherjee, A., Dey, N.: Smart Computing with Open Source Platforms. CRC Press (2019). The family name must be written first, and the given name-abbreviated 9. Amano, K., Okano, D., Ogata, H., Okamoto, T.: Numerical conformal mapping onto the unit disk with concentric circular slits by the charge simulation method. Trans. Inf. Process. Soc. Jpn. 41(1145), 41–49 (2000) 10. Rossi, H.: The local maximum modulus principle. Ann. Math. 72(1), 1–11 (1960)
Chapter 4
Hierarchical Clustering-Based Video Summarization Fengsui Wang, Zhengnan Liu, Haiying Cheng, Linjun Fu, Jingang Chen, Qisheng Wang, Furong Liu, and Chao Han
Abstract In order to improve the quality of key frames and generate video summaries that are closer to the user’s viewing habits, a video summarization key frame extraction method based on hierarchical clustering is proposed. First, the color and texture features in the video image are extracted and fused. Secondly, the initial clustering parameters are obtained by hierarchical clustering, and K-means clustering is employed to optimize the initial results and categorize similar frames. Finally, the frame closest to the cluster center is selected as the key frame. The interpretation of the experimental results shows that using the proposed method achieves an improvement in accuracy of the extracted key frames by 0.71 on average. Moreover, the average recall rate is 0.76, and the average F-score is 0.73, which is higher than the other tested methods, and effectively improves the overall effect of the video summary. Also, it generates video key frames that are closer to human summaries.
4.1 Introduction With the rapid development of Internet technology, multimedia video is integrated into every aspect of national lives. Massive video is uploaded and downloaded on the network every day. Technology brings convenience to life, but the storage and retrieval of large-capacity video data have become a stress that cannot be underestimated. Video summarization technology can just alleviate this pressure [1]. The video summarization presents the important information from the original video to the user in a more concise and efficient form, so that the user can quickly retrieve useful information [2]. Key frame extraction is one of the core tasks of video summarization technology. It is divided into the following four categories: the first type of key frame extraction based on motion information [3], using the optical flow method F. Wang (B) · Z. Liu · H. Cheng · L. Fu · J. Chen · Q. Wang · F. Liu · C. Han School of Electrical Engineering, Anhui Polytechnic University, Wuhu 241000, China e-mail: [email protected] Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, Wuhu 241000, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_4
27
28
F. Wang et al.
to calculate the amount of motion of the lens, and its local minimum, the frame is set as a key frame. The algorithm can select the number of corresponding key frames according to the lens variation amplitude. However, the motion calculation process of this method is especially complicated, and the local minimum is difficult to accurately judge. The second type is the key frame extraction based on the lens boundary [4], which divides the original video into several short segments, and then sets the first and last frames as key frames. This method is simple to operate, but has poor applicability, especially when the lens switching is frequent. The third category is based on the key frame extraction of visual content [5] and selects the frames with dramatic changes in video content as key frames. This method extracted the key frames conforms to the original video content, but it will generate too much redundancy if the lens is frequently switched. The fourth type is the key frame extraction based on the cluster [6, 7], and the frame closest to the cluster center is selected as the key frame. The key frame extracted by this method works well, but it is crucial to the selection of initial parameters, which directly affects the quality of clustering.
4.2 Feature Extraction 4.2.1 Color Feature The HSV color model can express colors more intuitively than RGB color model [8, 9]. The brightness, color, and vividness of the film facilitate the contrast between colors, so it is more popular. In fact, the majority of the color information that is initially acquired is in the RGB format, so the RGB space needs to be converted to the HSV space. And that the three components of the HSV color space are quantized in an unequally way [10]. Hence, feature vector G is constructed including 72-dimensional vector by computing each color component, that is Gi = 9H + 3S + V, G ∈ [0, 1, …, 71]. The HSV color space is divided into many small color intervals, each of which is a bin of the color histogram, so we get a 1-dimensional color histogram of 72 bins.
4.2.2 Texture Feature Texture features can be characterized by Local Binary Pattern (LBP). This paper fuses LBP operators based on color features. Any LBP operator has 2P binary modes. It can be known from the 2P exponential properties that the binary pattern will increase exponentially with the value P linear increase. Therefore, this paper uses the equivalent dimensionality reduction mode of the LBP operator to reduce the amount of computation. The equivalent mode of the LBP operator is if a binary pattern occurs at most twice from 1 to 0 or from 0 to 1, the binary corresponding to this mode is called the equivalent mode class. By using this method, the number of
4 Hierarchical Clustering-Based Video Summarization
29
modes is reduced from the original 2P to P(P − 1) + 2, which greatly reduces the amount of calculation of the algorithm. After two steps, the feature vectors of the color and LBP operator are Gi and V i, respectively, and the new feature after fusion is M i = [Gi , V i ]. The new features are used to describe the image, so that the image information is more fully expressed.
4.3 The Proposed Algorithm 4.3.1 Hierarchical Clustering Method The proposed algorithm adopts the “bottom-up” merged method in hierarchical clustering. Firstly, a termination condition is determined. The proposed algorithm determines the number of clusters K as the termination condition of hierarchical clustering, and uses the Euclidean distance to measure two adjacent frames. The difference between the color histograms determines the number of clusters, which effectively reduces the blindness of K value selection. After the termination condition is obtained, the hierarchical clustering process begins. The detailed steps are as follows: Step 1 Regard each data to be clustered as a single category. Step 2 Measure the Euclidean distance between each class. Step 3 Combine two classes with the smallest distance into one class according to the Step 2 measurement result. Step 4 After merging, measure the distance between every two classes again. Step 5 Repeat Step 3 and Step 4 until the number of clusters is K, complete the clustering, calculate the average value of each sample, and obtain K cluster centers, i.e., C = {C 1 , C 2 , …, C K }.
4.3.2 Optimize Initial Clustering Results to Extract Video Summary Key Frames After the execution of the hierarchical clustering method is finished, the obtained cluster center and K value are used as input of the K-means method. This method can be used to solve the problem that K-means cannot preset the initial parameters, and also reduces the number of operations of the K-means algorithm, which improves the accuracy and stability of the cluster. Next, the K-means algorithm is used to extract the key frame steps, as follows: Step 1 The average value of each cluster sample in the hierarchical cluster is taken as the initial cluster center, and the obtained K value is taken as the initial cluster number.
30
F. Wang et al.
Step 2 Measure the distance from each frame to the center of the class and redistribute all objects according to the minimum distance criterion. Step 3 Calculate the cluster centers of each type again. Step 4 Repeat Step 2 and 3 above until the sample objects of each class remain stable. Step 5 Output the video frame closest to the center of the class as the key frame.
4.4 Experimental Results and Discussion 4.4.1 Evaluation Standard Since the current video summary field has not formed a unified evaluation standard, the proposed algorithm uses accuracy (i.e., Precision = N matched /N AS ), recall rate (i.e. Recall = N matched /N US ) and F-score (i.e., F-score = 2 × Precision × Recall/(Precision + Recall)) three objective values to measure the extracted key frame quality [11].
4.4.2 Experimental Results In order to verify the validity of the proposed algorithm, four videos were selected in the Open Video Database dataset. The dataset provides five user abstracts for each video as real value calculations. We select four different scene videos of video 22, video 23, video 25, and video 63 for key frame extraction from this data set, and calculate the accuracy, recall rate and F-score by averaging five user summaries. The value is directly compared with the related algorithm, and the video 25 is also compared with the key frame results published by the Random Walk on Hypergraph for Video Summarization (RWH) [2] and Clustering Histogram Texture (CHT) [12] algorithms in recent years. In this work, we compare our method with five different methods of Open Video Project (OV) [13], Delaunay Triangulation (DT) [14], STIll and Moving Video Storyboard (STIMO) [15], Video SUMMarization 1(VSUMM 1) [16] and Video SUMMarization 2(VSUMM 2) [16] and show the results in Table 4.1. First of all, from the perspective of F-score, F-score is a comprehensive evaluation of the quality of the video summary, reflecting the key indicators of the overall performance of the summary. The proposed algorithm has the highest F-score values on video 22 and video 25, reaching 0.68 and 0.84, respectively. On video 22, the proposed algorithm is 7, 50, 15, 4, and 4% higher than the OV, DT, STIMO, VSUMM 1, and VSUMM 2 algorithms; the advantage is more obvious on video 25, in turn, OV DT, STIMO, VSUMM 1, VSUMM 2, RWH, and CHT algorithms increased by 42, 62, 27, 10, 10, 18, and 8%. In other words, the proposed method outperforms the other algorithms in terms of F-score.
4 Hierarchical Clustering-Based Video Summarization
31
Table 4.1. Comparison of accuracy, recall and F-score results for different videos Videos
Methods
Precision
Recall
F-score
Video22
OV
0.56
0.67
0.61
Video23
Video25
Video63
Average
DT
0.20
0.17
0.18
STIMO
0.43
0.70
0.53
VSUMM1
0.65
0.63
0.64
VSUMM2
0.65
0.63
0.64
Proposed
0.70
0.66
0.68
OV
0.47
0.61
0.53
DT
0.66
0.55
0.60
STIMO
0.57
0.47
0.52
VSUMM1
0.48
0.86
0.61
VSUMM2
0.57
0.84
0.68
Proposed
0.60
0.77
0.67
OV
0.72
0.29
0.42
DT
0.45
0.15
0.22
STIMO
0.87
0.43
0.57
VSUMM1
0.82
0.67
0.74
VSUMM2
0.87
0.64
0.74
RWH
0.91
0.52
0.66
CHT
0.80
0.72
0.76
Proposed
0.85
0.84
0.84
OV
0.63
0.53
0.58
DT
0.77
0.64
0.70
STIMO
0.51
0.50
0.51
VSUMM1
0.70
0.87
0.78
VSUMM2
0.77
0.64
0.70
Proposed
0.70
0.78
0.73
OV
0.60
0.53
0.54
DT
0.52
0.38
0.43
STIMO
0.60
0.53
0.53
VSUMM1
0.66
0.76
0.69
VSUMM2
0.72
0.69
0.69
Proposed
0.71
0.76
0.73
32
F. Wang et al.
Secondly, from the perspective of Recall, the recall rate is the ability to match the key frame of the algorithm matching with the real value, that is, the accuracy of the algorithm. As can be seen from the above data, the recall rate of the proposed algorithm is the best in video 25, reaching 0.84, which is far superior to the other five methods. Furthermore, the recall rate can also be the top of the other three scene videos. Finally, the precision of the proposed algorithm is 0.70, 0.60, 0.85, and 0.7 in the four-segment video. In addition to the leading position in video 22, it does not play a particularly obvious advantage in other videos. The more key frames generated by the algorithm, the easier it is to reduce the accuracy. On the other hand, it can be seen from video 25 that the RWH algorithm has a precision of 0.91 due to the generation of fewer key frames, which is 6% higher than the proposed algorithm. It is far ahead of the other six algorithms. But the algorithm recall is only 0.52, greatly reducing the F-score value. From this group comparison, it can be concluded that the appropriate number of key frames can be generated in order to achieve a balance between accuracy and recall rate. So that the F-score overall rises, and then extract higher quality key frames. Combining three evaluation indicators of precision, recall and F-score in different scene videos, the proposed algorithm can extract more reasonable key frames and provide an effective basis for video summarization. To further visualize the difference between the key frames selected by five users on the dataset and the summaries generated by the various algorithms, Fig. 4.1 lists the OV, DT, STIMO, VSUMM 1, VSUMM 2, RWH, CHT, and our algorithm on video 25. It can be seen from Fig. 4.1 that the key frame length generated by the proposed algorithm on video 25 is most suitable. It basically covers the summary of five user choices and is also the key frame that best characterizes the main content of the video compared with the other five algorithms. It is more convenient for users to quickly understand this video. In general, VSUMM 1 and this algorithm are moderately sized and can deliver valuable information from the video to the user with better results.
4.5 Conclusion In this paper, we present a feature extraction method for producing video summaries. The color feature and the LBP operator are firstly fused to describe the image. Next, the hierarchical clustering method is used to acquire the initial clustering result, and the K-means method optimizes the initial result. The frame closest to the clustering center is defined as the key frame. The tests show that the key frame extracted by the proposed algorithm has high accuracy and good F-score, which can improve the overall performance of the video summary.
4 Hierarchical Clustering-Based Video Summarization
33
USER 1
USER 2
USER 3
USER 4
USER 5
OV
DT
STIM O
VSUMM 1
VSUMM 2
RWH
CHT
Proposed
Fig. 4.1. Comparison of key frame results generated by eight algorithms on video 25
Acknowledgements This work was supported by the Province Natural Science Foundation of Anhui under Grand no. 1708085MF154, and the Major Program of Scientific Research of Anhui Province, China (KJ2019A0162, KJ2015A071).
References 1. Wang, J., Jiang, X., Sun, T.: Review of video abstraction. J. Image Graph. 19(12), 1685–1695 (2014) 2. Ji, Z., Fan, S.: Video summarization with random walk on hypergraph. J. Chin. Comput. Syst. 38(11), 2535–2540 (2017)
34
F. Wang et al.
3. Fuentes, J., Ruiz, J., Rendon, J.: Salient point tracking for key frames selection in outdoor image sequences. IEEE Lat. Am. Trans. 14(5), 2461–2469 (2016) 4. Mendi, E., Bayrak, C.: Shot boundary detection and key frame extraction using salient region detection and structural similarity. In: Proceedings of the 48th Annual Southeast Regional Conference, pp. 66–68. ACM, America (2010) 5. Chugh, I., Gupta, R., Kumar, R.: Techniques for key frame extraction: shot segmentation and feature trajectory computation. In: 2016 6th International Conference on Cloud System and Big Data Engineering (Confluence), pp. 463–466. IEEE, America (2016) 6. Jiang, W., Fei, M., Song, Z.: New fusional framework combining sparse selection and clustering for key frame extraction. IET Comput. Vision 10(4), 280–288 (2016) 7. Kishor, D., Venkateswarlu, N.: A novel hybridization of expectation-maximization and Kmeans algorithms for better clustering performance. Int. J. Ambient. Comput. Intell. 7(2), 47–74 (2016) 8. Zhai, Y., Chen, L.: Feature fusion and discriminative null space for person re-identification. J. Signal Process. 34(4), 476–485 (2018) 9. Si, R., Zhang, M.: Research on video key frame extraction based on k-means clustering algorithm. Mod. Comput. 2016(20), 59–63 (2016) 10. Zong, Z., Gong, Q.: Key frame extraction based on dynamic color histogram and fast wavelet histogram. In: IEEE International Conference on Information and Automation, pp. 183–188. IEEE, America (2017) 11. Hore, S., Chakraborty, S., Chatterjee, S.: An integrated interactive technique for image segmentation using stack based seeded region growing and thresholding. Int. J. Electr. Comput. Eng. 6(6), 2773–2780 (2016) 12. Wang, R., Hu, J., Yang, J.: Video key frame selection based on mapping and clustering. J. Image Graph. 21(12), 1652–1661 (2016) 13. Dementhon, D., Kobla, V., Doermann, D.: Video summarization by curve simplification. In: ACM International Conference on Multimedia, pp. 211–218. ACM, America (1999) 14. Mundur, P., Rao, Y., Yesha, Y.: Keyframe-based video summarization using Delaunay clustering. Int. J. Digit. Libr. 6(2), 219–232 (2006) 15. Furini, M., Geraci, F., Montangero, M.: STIMO: STILL and Moving video storyboard for the web scenario. Multimed. Tools Appl. 46(1), 47–69 (2010) 16. Avila, S., Lopes, A., Luz, A.: VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognit. Lett. 32(1), 56–68 (2011)
Chapter 5
Research on Multi-objective Optimization Problem of Engineering Project in 3D Field Based on Improved Particle Swarm Optimization Algorithm Jianqiang Zhao, Subei Li, and Gefei Li Abstract For the traditional project, three-objective optimization in 3D field (3D imaging technology applications include smart home, autopilot, security monitoring, etc.) can’t meet the actual needs. By two goals safety and environmental factors, balance multi-objective optimization system about the time limit for a project, cost, quality, environment, and safe comprehensive is formed. Using the average rating value adjust dynamically step factor to improve particle swarm optimization (IPSOA), this ensures the diversity of particles and avoids PSO falling into local optimum. Finally, the IPSOA algorithm and PSO algorithms were applied to the multi-objective optimization projects. The results show that IPSOA optimization algorithm is faster and higher precision, and the results of the project in 3D field for solving multi-objective optimization problem are reliable and feasible.
5.1 Introduction The main direction of research on multi-objective optimization of the project in 3D field of the traditional project lies in balance optimization of time, cost and quality. 3D imaging technology for portraits and objects has been increasingly integrated into intelligent hardware. Applications are very broad, including smart home, medical health, gaming entertainment, autonomous driving, security monitoring, etc. These areas are often involved in the engineering projects optimization. To solve the multiobjective optimization problem, basic project can be summarized as three kinds: first, multi-objective optimization problem is converted directly to a single objective optimization problem, including aggregation function, constraint method, the efficacy coefficient method the evaluation function method, etc. Second, the multi-objective optimization problem is transformed into a series of single objective optimization J. Zhao (B) · S. Li School of Mathematics and Statistics, Xuzhou University of Technology, Xuzhou 221018, China e-mail: [email protected] G. Li Department of Mathematics, University of California, Davis, CA 95616, USA © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_5
35
36
J. Zhao et al.
problem which gradually solve typical methods including stratified sequence. The third type is a multi-objective comprehensive model-intelligent heuristic algorithm that is different from the above two algorithms in consideration of the complex linear or nonlinear relationship of duration, cost, and quality. The first two methods essentially transform multi-objective optimization problem into a single objective problem. The result will be solved according to the target weights vary, which subjectivity is too strong and has a certain limitation. The intelligent heuristic algorithm does not blindly assume that the three-objectives are linear or nonlinear functional relationships and the order of the target weights, which can better reflect the objective facts. In recent years, there are many scholars swarm intelligence algorithm for multiobjective optimization problem projects, such as Angeline, Peter J [1] propose to improved PSO algorithm, Bai, Qinghai [2] proposed PSO algorithm based on simulated annealing, Lv, Zhensu, and Z. R. Hou [3] proposed an adaptive variant of the PSO algorithm. References in this field can also be found in [4–11]. The PSO algorithm is iterative by tending to the optimal position, so it has the disadvantages that the search space coverage is incomplete and easy to fall into the local optimal solution. To avoid these defects, this paper presents an improved particle based on the average evaluation value of dynamic adjustment of the step factor swarm algorithm, named improved particle swarm optimization algorithm (IPSOA). This method uses the information of the average evaluation value of all particles in a certain dimension to control the step factor, which improves the ability of the population to jump out of the local optimal solution, so that it can search through the entire space.
5.2 IPSO Algorithms When the particle swarm optimization algorithm searches in the search space, it will produce its own optimal position pi and the optimal position pg of the population after each iteration. This selection will cause potential information that is present in poor particles to be ignored. Because they may be closer to the theoretical optimal position in a dimension than pg,d . They also have the potential to evolve into a better solution, and the information they contain is better than the current position. Therefore, while considering the optimal particles, this paper uses the influence of other individuals to improve the ability of the population to jump out of the local optimal solution by introducing the step factor. The evaluation value of the population on the d-th dimension is defined as avgd =
n 1 f pi,d n i=1
(5.1)
5 Research on Multi-objective Optimization Problem …
37
where n is the number of particles, pi,d is the position of the i-th particle in the d-th dimension, avgd is the evaluation value of the population in the d-dimension, f pi,d is the objective function value of the i-th particle at position pi,d . Difference value of performance of particles i on the dimensional d is defined as i,d = f pi,d − avgd
(5.2)
Step factor can be introduced to adjust the particle’s next step speed. Assume that population currently find the minimum value of the dimensional is f pg,d and define the step length factor for u i,d : u i,d =
u max −u u min + f pi,d − f pg,d × avg − f pmin , i,d < 0 ( g,d ) d u max , else
(5.3)
where the step factor u i,d adjustment range experience settings for u min , u max . In the Eq. (5.3), we can adjust the step size factor u i,d according to the change of the average value of evaluation, and make the algorithm with adaptive behavior, full use of the information particle, monitoring more comprehensively to the algorithm. When i,d < 0, the results show that particle i is better in performance of the dimensional d. The particle trajectory is adjusted according to the optimal position, and the trend of the evaluation value performance will affect the flight speed of the particle, that is, the flight speed of the particle is reduced at this time, so that the algorithm performs an accurate search. Therefore, the distance at which the particles reach the optimal solution is reduced. No longer blind search, the local search ability of the algorithm is enhanced. When i,d > 0, it is shown that the performance of particle i in d-dimension is declining, the flight path of particles is not ideal, the flight speed of particles needs to be increased, the particle has the ability to expand the search space, and the global search ability of the algorithm is enhanced. The position update equation for the particle becomes t+1 t t t = pi,d + u i,d × vi,d pi,d
(5.4)
t+1 where t is the time, pi,d is the position of the i-th particle in the d-th dimension at t time t + 1, vi,d is the speed of the i-th particle in the d-th dimension at time t. t can make the particle adaptive search, so that the algorithm The step factor u i,d can find the optimal solution quickly. At the same time, the use of i,d as a trigger function collects information on all particles, ensuring the diversity of the population.
5.3 Model The goal of project management comprehensive optimization in 3D field is the short construction period, low cost, high quality, small environmental impaction, and high
38
J. Zhao et al.
safety. Based on theoretical analysis of the management goal in the above section, we set up a balance optimization model under the time limit for a project–cost–quality– environment–security level. The multi-objective optimization model is
⎧ ⎪ ti min C = ci N + αi (ti − ti N )2 min T = ⎪ ⎪ ⎪ ⎪ i∈A i∈A ⎪ ⎪
⎪ ⎨ max Q = wi qi = wi qi max − (ti − ti L )2 ⎪ i∈A i∈A ⎪ ⎪ ⎪ ⎪ jm n t1 ⎪ out in ⎪ ⎪ min G = 1 − s × sn η g = η × × e max S = s = s = 1 − G Π i i i i j ⎩ n j1 t i∈A
i=0
ic
i∈A
j1=1
(5.5)
⎧ ⎪ ⎪ ⎨ dT =
Tmax − T Tmax − Tmin Q − Q min ⎪ ⎪ ⎩ dQ = Tmax − Tmin
s.t.
Cmax − C Cmax − Cmin E max − E Smax − S dE = dS = E max − E min Smax − Smin max D = dT dC d Q d E d S wi = 1 ηi = 1 dc =
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ i∈A ⎪ ⎪ ⎪ ⎪ jm ⎪ in ⎪ ⎪ Π 1 − s j1 × sn ≥ Smin ⎨ 1 − j1=1 ⎪ tiC ≤ ti ≤ ti L ⎪ ⎪ ⎪ ⎪ ⎪ − X i + X j − ti ≥ 0 ⎪ ⎪ ⎪ ⎪ ⎪ 0 ≤ qi ≤ 1 ⎪ ⎪ ⎪ ⎪ ⎪ 0 ≤ si ≤ 1 ⎪ ⎪ ⎩ wi > 0, ηi > 0, X i ≥ 0, X j ≥ 0
(5.6)
(5.7)
where ci N is cost under normal time; X i , X j are start time of i and j; tiC ,ti L are upper and lower; Q is total quality project; qi is the quality level quantized; wi is quality weight coefficient; gi is the level of environmental requirements; ei is environmental pollution factor; ηi is environmental weights; μ is security cost rate guarantee; pi is the probability of accident; si is security Level Index; poi is the initial probability; pi is reducing the proportion of probability; piC is minimum reduction rate; pi L is highest reduction rate; cig is the minimum cost; ci L is the maximum cost; cig is minimum guarantee ofsecurity costs; c¯ig is maximum guarantee of security costs;
in S is security level index s in j1 , . . . , s jm are security level indexes of j1 , . . . , jm . Equations (5.5) and (5.6) indicate multi-objective optimization model of time, cost, quality, environmental impact, and safety, in pursuit of short duration, low cost,
5 Research on Multi-objective Optimization Problem …
39
high level quality, minimal environmental impact, and a high level of safety index for constraints; Eq. (5.7) indicates the optimization of short duration, low cost, high quality, low environmental impact, safety, in pursuit of the cost of the project, quality, schedule, and environmental impact for constraints.
5.4 Engineering Calculation and the Result Analysis An engineering instance contains eight activities in 3D field, its construction network diagram as shown in Fig. 5.1. Project contains eight activities in this case, and requires a period of no more than 140 days, the completion of the project total cost less than 30 million Yuan, the quality level is not lower than 0.93, the environmental impact value is less than 0.7, the security level is not lower than 0.8, μ = 5%, the logical relationship between the project activities and parameters as shown in Table 5.1. In the table, the unit tiC , ti N and ti L are for the day, the unit ci N is RMB ten thousand, the unit αi , qiC , ωi and ηi are 1. αi , ωi and ηi are based on expert scoring. The influence factors of the various activities on the environment conditions of a project are shown in Table 5.2. The project requires a reasonable allocation of the duration of each activity, so that the construction period meets the requirements of the project contract, achieving the goal of “shortest construction period, lowest cost, best quality, minimal environmental impact, and highest safety level.” Fig. 5.1. Engineering network plan
Table 5.1. Related parameters of each activity Activity
t iC
t iN
t iC
ciN
αi
qic
ωi
ηi
poi
piC
piL
A
23
30
34
390
1.68
0.9033
0.1339
0.1326
0.15
0.05
0.90
B
21
28
31
312
3.37
0.9156
0.1588
0.1663
0.15
0.02
0.89
C
20
26
33
390
3.14
0.8889
0.1703
0.1570
0.12
0.10
0.85
D
15
22
29
240
0.56
0.8789
0.1495
0.1510
0.10
0.04
0.90
E
16
21
27
228
3.00
0.9233
0.1624
0.1593
0.05
0.008
0.92
F
22
28
35
330
3.53
0.8956
0.1484
0.1585
0.08
0.12
0.85
G
21
21
28
216
1.96
0.8700
0.1757
0.1617
0.13
0.05
0.90
H
25
25
30
240
3.98
0.8878
0.1657
0.1749
0.11
0.09
0.95
40
J. Zhao et al.
Table 5.2. Influence factor and influence degree of environment Activity
Noise pollution
Air pollution
Greenhouse
Acid rain
Water pollution
Photochemical pollution
Construction waste
A
Moderate
Mild
Mild
Severe
Mild
None
Moderate
B
None
Mild
None
Mild
None
Mild
Mild
C
None
Moderate
None
Severe
None
None
Moderate
D
Moderate
Severe
Moderate
None
None
None
Severe
E
Mild
None
Severe
Moderate
Moderate
Mild
None
F
Moderate
Severe
Moderate
Mild
Mild
None
Severe
G
Moderate
Moderate
Mild
None
None
None
None
H
Severe
None
Moderate
None
Mild
None
Mild
Table 5.3. Coefficient of environmental pollution Activity
A
B
C
D
E
F
G
H
ei
0.475
0.165
0.35
0.405
0.405
0.535
0.12
0.263
According to the parameters of the engineering case and the multi-objective comprehensive optimization model constructed, we can construct an optimization model of the actual engineering case. Among them, activities in the project of environmental pollution coefficient values can be revised in accordance with Table 5.3 and based on the influence factors of each activity index scores, the influence factors of indicators score are got. The coefficient of environmental pollution is shown in Table 5.3. According to the engineering case network plan Fig. 5.1, combining with the known conditions, using the improved particle swarm algorithm for single objective optimization model, we get the following results: T max = 159, T min = 101; C max = 3251, C min = 2346; Qmax = 0.9973, Qmin = 0.9023; E max = 0.765, E min = 0.413; S max = 0.996, S min = 0.722. The multi-objective optimization model takes the duration of each activity as the decision variable X. Because the engineering case consists of 8 activities, X is an 8-dimensional vector. Let the population size be 100, the number of iterations is 200 generations, and the position and velocity of the particles are 8-dimensional column vectors. The detailed solution steps are as follows: t is the position Step 1 Initialize the position and velocity of the particle swarm. pk,8 of the k-th particle where the dimension is 8 at the t-th iteration, k = 1, 2, …, o o , vk,8 , and u ok,8 , which meet the longest 100, t = 1, 2, …, 200. Initialize pk,8 and shortest time limit for each activity. Step 2 Calculate particle fitness. Minimize the fitness function, i.e., min D = − dT dC d Q d E d S . Calculate the fitness value fitnessi , the optimal value of the individual Pdbesti , the optimal location Pdi , optimal value Pgbesti , and optimal location of groups Pgi of each initial particle. Step 3 Calculate adaptive adjustment step factor u i,d . According to Eq. (5.7), calculate the average evaluation value avgd of the population on the d-th search
5 Research on Multi-objective Optimization Problem …
Step 4
Step 5
Step 6 Step 7
41
space and the difference in the evaluation value of the particle i in the ddimensional search space is obtained. Then, according to Eqs. (5.6) and (5.7) to adjust the step factor u i,d of the particle; Update the particles. According to the change of the step factor, update the velocity and position of each dimension of each particle according to Eqs. (5.3) and (5.7) to generate a new population; Evaluation of the particles. Calculate the fitness of each particle value. If the current fitness value of particles is better than the value of Pdbesti , then replace Pdbesti with this particle. Find the best value Pdbest of all the particles. If the evaluation value is better, replace Pdbest with this particle; Update the location of the optimal value. Initialize the optimal position obtained to avoid the algorithm falling into local optimum; Check the algorithm termination condition. If the current number of iterations reaches the preset maximum iteration algebra or the error reaches the minimum error, the algorithm terminates and outputs the optimal solution, otherwise it goes to Step 2 and enters the next iteration.
We use IPSOA (the proposed algorithm) and the basic particle swarm algorithm to solve this example. A comparison of the evolutionary processes of the obtained algorithm is shown in Fig. 5.2. It is not difficult to see from the above figure that the improved particle swarm optimization algorithm is superior to the common particle swarm optimization algorithm in terms of evolution speed and convergence precision. Therefore, it can be concluded
Fig. 5.2. Evolutionary process of different algorithms on multi-objective optimization
42
J. Zhao et al.
Table 5.4. Case project of different solution and its corresponding target value The duration of each activity
Plan a
Plan b
Plan c
A
28
23
29
B
21
21
22
C
25
23
26
D
22
19
19
E
17
17
20
F
28
24
25
G
20
19
21
H
26
22
25
I
123
110
116
J
2548
2984
2598
K
0.931
0.956
0.933
L
0.512
0.475
0.491
M
0.875
0.923
0.958
that the improved particle swarm optimization algorithm proposed in this paper has its physical significance and practical value. A series of optimization schemes are obtained by the improved particle swarm optimization algorithm. Limited to space, Table 5.4 lists a part of the solution, each group can be used as an alternative to the case project.
5.5 Conclusions Based on the existing theoretical knowledge, this paper combined with the actual situation of project problems in 3D field, detailed theoretical analysis, the multiobjective problem scientific modeling, improved PSO optimization algorithm, finally realized the comprehensive multi-objective optimization. The multi-objective optimization model and the improved particle swarm algorithm proposed in this paper have wide applicability, not only for use in the balance of the project optimization for production, but also in the field of manufacturing production management. Acknowledgements This research was supported by the Jiangsu Planned Projects for Postdoctoral Research Funds No. 1601076B, Xuzhou University of Technology Research Funds No. XKY2018120.
5 Research on Multi-objective Optimization Problem …
43
References 1. Angeline, P.J.: Evolutionary optimization versus particle swarm optimization: philosophy and performance differences. In: International Conference on Evolutionary Programming, pp. 601– 610. Springer, Berlin (1998) 2. Bai, Qinghai: Analysis of particle swarm optimization algorithm. Comput. Inf. Sci. 3(1), 180– 192 (2010) 3. Lv, Z., Hou, Z.R.: Adaptive mutation particle swarm optimization algorithm. J. Electron. 32(1), 416–420 (2004) 4. Shrivastava, R., Singh, S., Dubey, G.C.: Multi-objective optimization of time cost quality quantity using multi colony ant algorithm. Int. J. Contemp. Math. Sci. 7(16), 773–784 (2012) 5. Ham, Donhee, Hajimiri, Ali: Concepts and methods in optimization of integrated LC VCOs. IEEE J. Solid-State Circuits 36(6), 896–909 (2001) 6. O’Neill, S., Curran, K.: The core aspects of search engine optimisation necessary to move up the ranking. Int. J. Ambient. Comput. Intell. (IJACI) 3(4), 62–70 (2011) 7. Wang, D., Li, Z., Cao, L., Balas, V.E., Dey, N., Ashour, A.S., McCauley, P., Dimitra, S.P., Shi, F.: Image fusion incorporating parameter estimation optimized Gaussian mixture model and fuzzy weighted evaluation system: a case study in time-series plantar pressure data set. IEEE Sens. J. 17(5), 1407–1420 (2016) 8. Chatterjee, S., Sarkar, S., Hore, S., Dey, N., Ashour, A.S., Balas, V.E.: Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings. Neural Comput. Appl. 28(8), 2005–2016 (2017) 9. Dey, N., Ashour, A., Beagum, S., Pistola, D., Gospodinov, M., Gospodinova, E.: Parameter optimization for local polynomial approximation based intersection confidence interval filter using genetic algorithm: an application for brain MRI image de-noising. J. Imaging 1(1), 60–84 (2015) 10. Jagatheesan, K., Anand, B., Samanta, S., Dey, N., Ashour, A.S., Balas, V.E.: Particle swarm optimisation-based parameters optimisation of PID controller for load frequency control of multi-area reheat thermal power systems. Int. J. Adv. Intell. Parad. 9(5), 464–489 (2017) 11. Ashour, A.S., Beagum, S., Dey, N., Ashour, A.S., Pistolla, D.S., Nguyen, G.N., Le, D.N., Shi, F.: Light microscopy image de-noising using optimized LPA-ICI filter. Neural Comput. Appl. 29(12), 1517–1533 (2018)
Chapter 6
3D Path Planning Based on Improved Particle Swarm Optimization Algorithm Yihu Wang and Siming Wang
Abstract Aiming at the problem that the particle swarm optimization (PSO) algorithm easy to fall into the local optimum. Combining the advantages of local search of bacterial foraging optimization algorithm (BFO), this paper introduces the chemotaxis and dispersal operation of the bacterial foraging algorithm into the PSO algorithm to obtain hybrid algorithm. This paper applies the hybrid algorithm to 3D path planning. The simulation results show that the hybrid algorithm effectively improves the PSO algorithm’s defects that are easy to fall into local optimal, improves the optimization efficiency and accuracy of the algorithm, and shows good performance in 3D path planning.
6.1 Introduction In recent years, the development of drone technology has been rapid, especially the rotary wing drone has been widely used in disaster prevention and relief, agricultural operations, power inspection, and other fields. Path planning is an important branch of its research field. At present, the commonly used path planning algorithms include A*(A-Star) algorithm [1], genetic algorithm, ant colony algorithm, particle swarm algorithm [2], etc. The particle swarm optimization algorithm is widely used because of its easy implementation and good optimization performance. When the particle swarm optimization algorithm is applied to path planning, the rules are simple and easy to implement, and its convergence speed is fast. The feasible solution can be found quickly, but because it is easy to fall into the local optimal, the feasible path found is not necessarily the optimal path. In the literature [3], an improved particle swarm optimization algorithm based on two kinds of antic maps is proposed. The simulation results show that it can well solve the problem of traditional PSO algorithm that is easy to fall into local optimal. The literature Y. Wang (B) · S. Wang School of Automatic and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_6
45
46
Y. Wang and S. Wang
[4, 5], respectively, proposed path planning method based on Levy-PSO algorithm and chaos optimization combined with basic PSO algorithm for practical industrial problems, and successfully applied it. Aiming at the shortcomings of current PSO algorithm in Unmanned Aerial Vehicle(UAV) path planning, this paper proposes a path planning hybrid algorithm based on PSO algorithm and BFO algorithm, using the BFO algorithm strong local search capability to reduce the possibility of the algorithm falling into local optimum [6], and dispersal operation of BFO helps the algorithm to jump out of local optimum. Based on this, this paper uses the improved PSO algorithm based on BFO algorithm to globally plan the path of UAV in 3D obstacle environment. And verify the effectiveness and feasibility of the hybrid algorithm by MATLAB simulation.
6.2 Path Planning Environment Modeling and Fitness Function 6.2.1 Environmental Modeling The research object of this paper is the three-dimensional path planning of UAV in the known environment. This paper refers to the method of literature [7], which simulates the urban building group to establish a three-dimensional elevation flight environment model, the mathematical model is expressed as z 1 (x, y) = h i , r < Ri z 2 (x, y) = Ri2 + r 2 , r < Ri r = (x − xi )2 + (y − yi )2
(6.1)
where z 1 (x, y) is the height information of the building, z 2 (x, y) is the boundary surface of the no-fly zone, (xi, yi ) is the coordinates of the center point of the i-th obstacle, h i is the height of the building, and Ri is the radius of the i-th obstacle. Integrate elevation information of buildings with no-fly zone information. The specific information fusion model is z(x, y) = max(z 1 (x, y), z 2 (x, y))
(6.2)
6 3D Path Planning Based on Improved Particle …
47
6.2.2 Fitness Function The fitness function is the basis for evaluating the quality of the generated path, and is also the basis for the evolutionary iteration of the population, which determines the efficiency and quality of the algorithm. In this paper, the fitness function comprehensively considers the length cost, the elevation cost, and the obstacle risk cost of the path. Suppose there are m paths, each path consists of n nodes, and there are i spherical and cylindrical obstacles in the environment. The path length cost is calculated as follows: Tm =
n
x j+1 − x j
2
2 2 + y j+1 − y j + z j+1 − z j
(6.3)
j=0
the Tm represents sum of the distances between all adjacent nodes in the m-th path, and x j , y j , z j is the j-th node of the path. Introduce the dangerous cost of obstacles and keep the path and obstacles at a certain distance. It is calculated as follows, for the k-th (k = 1, 2, …, g) obstacles, find the distance from the center of the obstacles to each path segment lik , then the k . minimum distance from the path to the obstacle L k = min l1k , . . . , lik , . . . , ln−1 Then the obstacle risk cost of the m-th path is
danm =
⎧ i ⎨
⎩ k=1 ∞,
L k , L k > rk (k = 1, 2, . . . , i)
(6.4)
others
where rk is the radius of spherical or cylindrical obstacles. The path length cost is calculated as follows: Cm =
n−1
(h i − h i+1 ).
(6.5)
k=1
The elevation cost is the sum of the height differences between each adjacent node of the track, where h i represents the height of the i-th node of the path. Integrate Tm , danm , and Cm to obtain the path fitness function as follow: f = w1 · Tm + w2 · danm + w3 Cm In the formula, w1 , w2 , and w3 are weight coefficients in the range [0, 1].
(6.6)
48
Y. Wang and S. Wang
6.3 Improved PSO Algorithm 6.3.1 Basic PSO Algorithm Suppose there is a particle population of size m with an n-dimensional space search area, recorded as X = [x1 , . . . xi , . . . , xm ]T , where the position of the i-th particle is expressed as xi = [xi1 , xi2 , . . . , xin ]T , the speed is the distance that the particle moves in each iteration, expressed as vi = [vi1 , vi2 , . . . , vin ]T , the individual extremum of the particle which has the best fitness expressed as Pi = [ pi1 , pi2 , . . . , pin ]T , and T Pg = pg1 , pg2 , . . . , pgn represents the global extremum which express the best fitness position for the entire population. The optimization problem feasible solution can be represented by the position information of each particle. Each iteration of the particle updates its position and speed according to the following formula:
k k+1 k k k k vid + c2 η pgd = wvid + c1 ε pid − xid − x gd k+1 k+1 k = xid + vid xid
.
(6.7)
In the formula, 1 ≤ i ≤ m, 1 ≤ d ≤ n, vik is the speed of the k-th iteration k is the velocity component of the d-th dimension of vik , xik is the of particle i, vid k is the velocity component of the d-th position of particle i at the k-th iteration, xid k k k is dimension of xi , pi is the individual extremum of particle i at the k-th iteration, pid k k the component of the d-th dimension of pi , pg is the global extremum of the particle k is the component of the d-th dimension of pgk , w is the swarm at the k-th iteration, pgd inertia weight, c1 , c2 is the learning factor, ε and η are random numbers uniformly distributed in the range [0, 1].
6.3.2 BFO Algorithm In this paper, the chemotactic operation and migration operation of the bacterial foraging algorithm are introduced into the PSO algorithm. The directional operation of bacteria is divided into two forms of movement and flipping. The steps of the chemotaxis operation are as follows: first, the bacteria swim a distance of one step unit in a random direction, and then calculates the position fitness, if the fitness is inferior to the previous position, select a random direction to flip. P(i, j + 1, k, l) = P(i, j, k, l) + C(i)ϕ(i).
(6.8)
In the formula, P(i, j + 1, k, l) is the position after j-th chemotactic operation, the k-th replication operation and the 1-th migration operation, C(i) is the unit step
6 3D Path Planning Based on Improved Particle …
49
size at which the bacteria perform a chemotaxis operation to swim forward. ϕ(i) indicate a random heading direction selected after flipping When any bacteria complete the corresponding breeding activities, give them the probability of migration Ped , each bacterial individual randomly generates a random number Rand on the [0, 1] region. If Rand < Ped , the individual will be initialized in the optimization interval, which is equivalent to the effect of achieving random migration. Instead, it turns to traverse the next individual until the end of the entire population traversal.
6.3.3 Hybrid Algorithm Although the standard PSO algorithm has superior optimization efficiency and convergence speed, it cannot balance the convergence speed, global search ability, and local search ability, especially in the late stage of the algorithm, it lacks the ability to jump out of local optimum. And often because the particle velocity is too large in the exploration, the particles easily fly through the local region where the optimal solution is located without finding the optimal value, thus falling into local optimum. Aiming at the above problems of PSO algorithm, this paper intends to improve the PSO by using the BFO. The essence of the chemotactic process of the bacterial foraging algorithm is that the bacteria search for the neighborhood of its location in the solution space, which direction is determined by the fitness function. The introduction of the chemotaxis process will enhance the local search ability of PSO, which can improve the problem that particles in PSO miss the better solution region due to excessive speed during the search process, and introduce migration operation on the particles when the particles fall into local optimum, which can help the algorithm jump out of the local optimum. The specific steps of the algorithm are as follows. Extract the environment information, the coordinates of the starting point and the target point, and initialize the position, velocity, individual extremum, and global extremum of the particle population. According to the formula (6.7) to update the velocity and position of the particle, calculate the individual fitness with formula (6.6), update the individual extremum and the global extremum, if the new global extremum is obtained at k-th iteration pgk ! = pgk−1 , add the chemotaxis operator for local optimization, initialize the bacterial population with the current position of the particle P(1, 1, 1, 1) = xik . Then the particles are moved and flipped and according to formula (6.8) but only perform chemotaxis and the fitness is calculated with formula(6.6). If the superiority is found in the individual neighborhood f (P(i, j, 1, 1)) < f pik , the solution updates the individual extremum, otherwise it jumps out after reaching a certain number of chemotaxis.
50
Y. Wang and S. Wang
(1) Determine whether the algorithm enters the local optimum, and if so, implement a migration operation on a part of the particles with poor fitness, otherwise proceed to the next step. Whether the condition under which the calculation is stopped is satisfied, if yes, output the result, otherwise it is judged whether the maximum number of iterations is reached, if yes, output the result, otherwise return to Step (2).
6.4 Simulation Verification In order to verify the effectiveness of the algorithm, the algorithm is simulated by MATLAB. For the standard PSO algorithm, number of particle populations N = 40, inertia weight coefficient apply linear decreasing strategy w = wmax − (wmax − wmin )(i/genmax ), where wmax = 0.9, wmin = 0.4, learning factor c1 = c2 = 2.0, the maximum number of iterations genmax = 1000. For the hybrid algorithms, number of particle populations N = 40, wmax = 0.9, wmin = 0.4, learning factor c1 = c2 = 2.0, maximum number of chemotaxis of bacterial chemotactic operators Nc = 40, number of swims Ns = 3, swimming step C = 0.001R (R is the width of the optimization interval), probability of migration Ped = 0.25, the number of algorithm iterations is 1000, the starting point of the path is set to (5,10,0), the end point is set to (70,50,5). Figures 6.1 and 6.2 show the path planning results of the traditional PSO algorithm and the hybrid algorithm in the 3D elevation map environment. The iteration number and optimal fitness curve of the two algorithms are shown in Fig. 6.3. As can be seen from Figs. 6.1, 6.2, and 6.3, the PSO algorithm converges quickly in path planning and has basically converged after the iteration number of 100, but the PSO algorithm falls into local optimum and did not find the optimal path. 80 70 60
0 0
80 60
10
40 Y 20
30
40
X
20 50
60
70
80
0
Y
50 20 15 10 5
40 30 20 10 0 0
10
20
30
40
X
Fig. 6.1. PSO algorithm path planning result
50
60
70
80
6 3D Path Planning Based on Improved Particle …
51 80 70 60
Z
80
Y
50 20 15 10 5 0 0
30
60 10
20
40 Y 20
30
40
X
10
20 50
60
70
80
40
0
0
0
10
20
30
40
50
60
70
80
X
Fig. 6.2. Hybrid algorithm path planning result 700 PSO BFO-PSO
Optimal fitness value
600
500
400 300
200
100
0
200
600
400
800
1000
Iteration
Fig. 6.3. Itertive curves of PSO and hybrid algorithm
The hybrid algorithm proposed in this paper inherits the advantage of fast convergence of the PSO algorithm, and because of the introduction of the bacterial chemotaxis operator, the path of better fitness value relative to the PSO algorithm is found at the beginning of the algorithm. It finds the path of better fitness value relative to PSO algorithm in the early stage of the algorithm, and because of the introduction of the migration operation, the algorithm still has the best ability to jump out of local optimum in the later stages of iteration. The fitness value of the path planning result, the path length, and smoothness are better than standard PSO algorithm.
52
Y. Wang and S. Wang
6.5 Conclusion Taking the UAV as the research object, the chemotaxis and migration operation of BFO algorithm are introduced into the PSO algorithm, and a hybrid path planning algorithm with global search ability and fast convergence is obtained. In the experiment, the PSO algorithm and the hybrid algorithm were used to perform global path planning for the UAV. Simulation studies show that compared with the traditional PSO algorithm, the hybrid algorithm can effectively find the optimal path and has the characteristics of fast convergence. At the same time, the probability of the algorithm falling into an infinite loop is very small, and it has the ability to jump out of local optimum, which can effectively avoid falling into local optimum. It has certain reference and feasibility in the application of UAV path planning.
References 1. Le, A.V.: Modified a-star algorithm for efficient coverage path planning in tetris inspired selfreconfigurable robot with integrated laser sensor. Sensors 18(8) (2018) 2. Evan Krell, F., Alaa Sheta, S.: Collision-free autonomous robot navigation in unknown environments utilizing PSO for path planning. J. Artif. Intell. Soft Comput. Res. 9(4), 267–282 (2019) 3. Li, M.: Research of Path Planning for Welding Robots Based on Hybrid Discrete Particle Swarm Optimization Algorithm. South China University of Technology (2014) 4. Wang, X., Yan, Y., Gu, S.: Welding robot path planning based on Levy-PSO. Control. Decis. 32(2), 373–376 (2017) 5. Lang, X., Liu, C.: Path planning for on machine verification system based on hybrid particle swarm optimization algorithm. Foreign Electron. Meas. Technol. 34(12), 30–34 (2015) 6. Yang, P., Sun, Y.: Particle swarm optimization based on chemotaxis operation of bacterial foraging algorithm. Appl. Res. Comput. 28(10), 3640–3642 (2011) 7. Jia, G.: Research on Three-Dimensional Path Planning of UAV Based on Genetic Algorithm and Sparse A* Algorithm. Nanjing University of Posts and Telecommunications (2017)
Chapter 7
Improving Parallel Magnetic Resonance Imaging Reconstruction Using Nonlinear Time Series Analysis Xiaoyan Wang, Zhengzhou An, Jing Zhou, and Yuchou Chang
Abstract Linear model is generally used in parallel magnetic resonance image (pMRI) reconstruction. Data acquired from multiple coils are learned and fitted for predicting and reconstructing missing k-space signal. Without sampling full k-space data, MRI speed is therefore accelerated and clinical scan cost can be reduced. However, due to noise and outliers existing multiple coil data, reconstructed image is deteriorated by noise and aliasing artifacts. To reduce noise and artifacts, some complicated models may remove noise and outliers in raw coil data and then predict missing data accurately. In this paper, a nonlinear time series analysis model was proposed from system identification and analysis perspective. The conventional pMRI reconstruction was formulated as a system identification task, but the proposed nonlinear time series analysis identifies model structured with removing noise and outliers. Experimental results demonstrated that the proposed model outperformed the conventional method. Noise and outliers were suppressed with high quality of the cardiac and brain applications.
7.1 Introduction Magnetic Resonance Imaging (MRI) [1–7] has some advantages including no ionizing radiation, multi-angle oblique imaging, and multi-parameter imaging. It has important applications in clinical diagnosis. However, longer scanning time leads to X. Wang · J. Zhou (B) School of Physics and Electronic Engineering, Yuxi Normal University, Yuxi 653100, Yunnan, China e-mail: [email protected] Z. An School of Mathematics and Information Technology, Yuxi Normal University, Yuxi 653100, Yunnan, China Y. Chang Computer Science and Engineering Technology Department, University of Houston-Downtown, Houston, TX 77002, USA © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_7
53
54
X. Wang et al.
magnetic resonance scanning susceptible to motion, and therefore image quality is deteriorated by artifacts. Furthermore, since time resolution of dynamic MRI is low, scan time is long, and cost is high, which limits more applications of MRI. In order to speed up imaging speed, parallel imaging (PI) has been used in recent years. Missing k-space data acquired below Nyquist sampling rate is reconstructed [8, 9], and therefore accelerates imaging speed. The main advantages of parallel imaging lie in the significant reduction in image acquisition time. The acceleration factor (R), such as R = 2, is proportional to the image acquisition time. Parallel MRI reconstructed image is noisy, since parallel imaging technique reduces imaging time. Therefore, the SNR of a parallel imaging sequence is always lower than an imaging sequence with fully sampled data. The parallel imaging technique utilizes the different distributions of the phased array coil sensitivity and performs spatial coding to reduce the required number of k-space acquisition data, so as to achieve the purpose of improving the imaging speed. The most widely used PI techniques include Sensitivity Encoding (SENSE) technology [10] and Generalized Auto-calibrating Partially Parallel Acquisitions (GRAPPA) [11]. SENSE is an image domain-based reconstruction algorithm. Coil sensitivity information needs to be known in advance. The GRAPPA technique is a reconstruction algorithm based on the k-space domain. It satisfies the frequency required by Nyquist’s sampling law to acquire center k-space data, also known as Auto-Calibration Signal (ACS). The missing k-space data is filled with data interpolation by other channels’ acquired data. Finally, the channels are combined to obtain the final unfolded image. ARMA model is auto-regressive model (AR model) and sliding average model (MA model). ARMA model is designed to explain the intrinsic autocorrelation of event sequences to predict the future. Among the three parameter models, the AR model has been widely used, since the parameter calculation process of the AR model is a linear equation and is relatively simple. The MA model generally requires a large number of parameters. Although the ARMA model requires the least number of parameters, the parameter estimation algorithm is solved by a nonlinear equation group, and its operation is much more complicated than the AR model. In this paper, the nonlinearity and recursion items are studied for accurate system analysis and prediction. Parallel MRI reconstruction is improved by suppressing noise and outliers with the proposed nonlinear system identification and analysis model.
7.2 Background Over the past decade, MRI imaging technology has developed rapidly. Imaging speed has become faster, and imaging quality has become better. Among many imaging methods, multi-coil parallel MRI has become an active research field. The latest MRI equipment can provide over 100 independent receiving channels, and the imaging time can be reduced to 1/8–1/2 of the traditional MRI imaging time. The acquisition
7 Improving Parallel Magnetic Resonance Imaging Reconstruction …
55
of MRI raw data is performed in the spatial frequency domain (k-space). The kspace data can be reconstructed to get the MRI image for diagnosis. However, the data scanning method is different in the imaging process, and therefore the corresponding image reconstruction algorithm also is different. All factors affecting the SNR of nonparallel imaging sequences also affect parallel imaging sequences. They include coil sensitivity, magnetic field strength, phase encoding steps, magnet hardware, the average number of signals, imaging tissue, pulse sequence, voxel size, receiver bandwidth, and so forth. In addition, the SNR of the parallel imaging sequence is further reduced by two other factors, as shown in the following equation [10] SNR SNR SNR P1 = √ SNR P I = √ g R g R
(7.1)
where R is the reduction factor, and g is geometry factor (g-factor). R indicates that the expected signal-to-noise ratio loss is caused by the reduced scan time (reduction factor R), because only 1/R lines in k-space are acquired. This has a similar effect on reducing the number of phase encoding steps or the average number of signals in nonparallel imaging. The geometric factor g is unique to parallel imaging. The g depends on the number of times each point is repeatedly acquired and the corresponding difference in coil sensitivity. The g-factor is not a constant value but varies with the position of the image. The g-factor depends on some factor including (1) the number and position of the surface coils; (2) the coil configurations; (3) the imaging plane; (4) the phase encoding direction in the scanning plane; (5) the locations of the voxels in the imaging area. It is related to the number, size, and direction of the surface coil elements. A time series [12–14] is a set of values that are numerically or observable values of a variable or indicator that exists in the natural or social science. They are in the order at the same or different intervals. It is the data formed by the state of a certain phenomenon or several phenomena at different moments, reflecting the evolution of the phenomenon, and the relationship between phenomena. The time series can also be the phenomenon that several related phenomena are at a certain point in time. A set of data sorted in a certain order, reflecting the intrinsic numerical relationship between related phenomena under certain time and place conditions. However, for some systems in practical problems, mathematical models are difficult to be established. However, for those systems, time series containing information about system evolution can be obtained through experiments or observations. This is also considered as an inverse problem. It requires little background information and knowledge on the system. Time series refers to the objective record of the historical behavior of the system under study. Therefore, it contains the structural characteristics of the system and its operational laws. Therefore, the structural characteristics of the studied system can be understood through the study of time series. The operating law of the research system can be revealed so as to predict and control its future behavior. It also guides to modify and redesign the system to operate in the desired structure.
56
X. Wang et al.
7.3 Method Time series analysis is mainly used for: (1) system description: according to the time series data observed by the system, the system is objectively described by the curve fitting method; (2) system analysis: when observations are taken from more than two variables, changes in one-time series can be used to account for changes in another time series to gain insights into the mechanism by which a given time series occurs; (3) prediction of the future: the ARMA model is generally used to fit the time series and predict the future value of the time series; (4) decision making and control: according to the time series model, the input variables can be adjusted so that the system development process is kept at the target value, that is, the process control is predicted when the process deviates from the target. Recursion and nonlinearity can incorporate accurate fitting elements, and therefore improve system description and analysis in traditional GRAPPA reconstruction. A nonlinear time series analysis model used for the proposed reconstruction technique is presented below. y(n) =
P
h(i)x(n − i) +
i=0
+
R
k( j)x(n − j) +
j=0 R R i=1 j=1
k(i, j)y(n − i)y(n − j) +
P P
h(i, j)x(n − i)x(n − j)
i=0 j=0 P R
z(i, j)x(n − i)y(n − j).
i=0 j=1
(7.2) In the Eq. (7.1) [15], x represents input and y represent output of the system. The h, k, and z denote first-order and second-order coefficients for input and output, respectively. Those coefficients are able to accurately describe dynamics of the system through modeling both input and output. AR (P) model and MA (R) model are actually special cases of ARMA (P, R) model. For GRAPPA reconstruction, both of AR (P) model and MA (R) model are corresponding to undersampled k-space data and preliminarily reconstructed k-space data. We first need to identify the best model and the criteria to evaluate optimal model. Generally speaking, Akaike information criterion (AIC) and Bayesian information criterion (BIC) are two good indicators of evaluation model for nonlinear time series analysis. Both AIC and BIC are defined as follows. AIC = 2t − 2ln Lˆ
(7.3)
BIC = ln(z)t − 2ln Lˆ
(7.4)
and
7 Improving Parallel Magnetic Resonance Imaging Reconstruction …
57
where t represents the number of estimated parameters in the model, Lˆ denotes the model’s maximum of the likelihood function, and z is the number of observed data. Since system is characterized by both AR and MA terms in traditional ARMA model, the added nonlinear terms are able to describe nonlinearities existing in parallel MRI reconstruction process. Those nonlinearities are often generated by some factors including noises that exist in data acquisition and reconstruction procedures, coils, and other hardware devices on MRI scanner. According to the nonlinear time series data obtained by calculating reconstruction procedure, reconstruction procedure is characterized by curve fitting method. Therefore, missing data can be predicted based on the nonlinear time series model.
7.4 Experimental Results Besides phantom and brain datasets used for evaluating the performance of the proposed method, reconstruction on cardiac dataset is also evaluated. For performance evaluation, fully sampled k-space data is used for reconstruction as a reference image. Furthermore, undersampled data is used for the traditional GRAPPA reconstruction [11], IIR GRAPPA (linear time series model) reconstruction [16], and the proposed nonlinear time series model-based reconstruction. The cardiac dataset was acquired by four cardiac coils with matrix size 150 × 256. The cardiac k-space data were undersampled with outer reduction factor (ORF) 4 and auto-calibration signal (ACS) was 48, respectively. The reference image is used for performance comparison, which is generated by fully sampled k-space data. Reconstruction results of cardiac MRI are presented in Fig. 7.1. The proposed method outperforms traditional GRAPPA with MA model in suppressing noise. There is a little bit of aliasing artifact existing in the reconstructed image by the proposed method. The aliasing artifact also exists in the traditional GRAPPA-based reconstruction. On the region of ventricular, the proposed method obviously suppresses noise compared to conventional GRAPPA method. So, as seen as Fig. 7.2, the extracted regions of interest (ROI) regions are also presented in Fig. 7.1. It is seen that the proposed nonlinear approach is close to the reference image, since two ROIs have similar noise level in compared to traditional GRAPPA method, which presents higher level of noise. The ROI extracted from GRAPPA reconstruction demonstrates obvious noise compared to reference image and the proposed method. System identification has been developed for dynamic system control design over half a century. From the content theory, system identification includes not only the establishment of system mathematical model, but also data modeling. System identification is essentially control-oriented from the perspective of the target. System identification and feedback control are combined to produce adaptive control. A large number of algorithms have been developed by using the noisy data to model and optimize the unknown parameters of the system. For this reason, we applied
58
X. Wang et al.
Fig. 7.1 Reconstruction performance evaluation among fully sampled data cardiac reconstruction (a), the traditional GRAPPA reconstruction (b), and the proposed nonlinear time series model-based reconstruction (c)
Fig. 7.2 The extracted regions of interest (ROI) regions are also presented in Fig. 7.1. They are among fully sampled data cardiac reconstruction (a), the traditional GRAPPA reconstruction (b), and the proposed nonlinear time series model-based reconstruction (c)
system identification for modeling parallel MRI reconstruction procedure. An optimal reconstruction with reduced noise and aliasing artifacts is achieved using the proposed nonlinear time series analysis model. System identification usually has the following steps. (1) System is fully understood and purpose is clearly identified, including the system’s input, output, signal
7 Improving Parallel Magnetic Resonance Imaging Reconstruction …
59
range, system operating conditions, and application of the resulting model. (2) Model is appropriately selected, such as variable system, steady system, stochastic system, deterministic system, linear system, nonlinear system, and so forth. (3) Experimental design and data acquisition are implemented. (4) Parameters or functions are estimated, including offline algorithms and online algorithms, which can be realtime processing of data and identification results updated. (5) Model is verified for comparing system output with model output, and reliability and validity of the identification results are evaluated. The proposed parallel MRI reconstruction method is generally following the above steps. However, reconstruction mode needs more evaluations on MRI data.
7.5 Conclusion In summary, a nonlinear time series analysis model is studied to improve reconstruction quality of a parallel MRI reconstruction method. Nonlinear terms are added to reconstruction procedure for suppressing noise and aliasing artifacts. Experiments on in vivo datasets are used for evaluation and comparison. The nonlinear time series analysis model outperforms the state-of-the-art methods. The future work will focus on selecting a compact representation of nonlinear time series analysis model for more parallel MRI reconstruction applications [17–19].
References 1. Liang, Z.P., Lauterbur, P.C.: Principles of magnetic resonance imaging: a signal processing perspective. Wiley-IEEE Press (1999) 2. Haacke, M., Thompson, M., Venkatesan, R., Brown, R., Cheng, Y.: Magnetic resonance imaging: physical principles and sequence design, 1st edn. Wiley-Liss (1999) 3. King, K., Bernstein, M., Zhou, X.: Handbook of MRI pulse sequences, 1st edn. Academic Press (2004) 4. Dey, N., Ashour, A.S., Beagum, S., Pistola, D.S., Gospodinov, M., Gospodinova, E.P., Tavares, J.M.R.S.: Parameter optimization for local polynomial approximation based intersection confidence interval filter using genetic algorithm: an application for brain MRI Image de-noising. J. Imaging 1(1), 60–84 (2015) 5. Hemalatha, S., Anouncia, S.M.: A computational model for texture analysis in images with fractional differential filter for texture detection. Int. J. Ambient Comput. Intell. (IJACI) 7(2), 93–114 (2016) 6. Moraru, L., Moldovanu, S., Dimitrievici, L.T., Shi, F., Ashour, A.S., Dey, N.: Quantitative diffusion tensor magnetic resonance imaging signal characteristics in the human brain: a hemispheres analysis. IEEE Sens. J. 17(15), 4886–4893 (2017) 7. Rajinikantha, V., Deyb, N., Satapathyc, S.C., Ashour, A.S.: An approach to examine magnetic resonance angiography based on Tsallis entropy and deformable snake model. Fut. Gener. Comput. Syst. 85, 160–172 (2018) 8. Glockner, J.F., Hu, H.H., Stanley, D.W., Angelos, L., King, K.: Parallel MR imaging: a user’s guide. RadioGraphics 25, 1279–1297 (2005)
60
X. Wang et al.
9. Larkman, D.J., Nunes, R.G.: Parallel magnetic resonance imaging. Phys. Med. Biol. 52, 15–55 (2007) 10. Pruessmann, K.P., Weiger, M., Scheidegger, M.B., Boesiger, P.: SENSE: sensitivity encoding for fast MRI. Magn. Reson. Med. 42, 952–962 (1999) 11. Griswold, M.A., Jakob, P.M., Heidemann, R.M., Nittka, M., Jellus, V., Wang, J., Kiefer, B., Haase, A.: Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 47, 1202–1210 (2002) 12. Kwatra, S.C.: Applied time series analysis. Proc. IEEE 67(11), 1578 (1979) 13. Hannan, E.: Time series analysis. IEEE Trans. Autom. Control 19(6), 706–715 (1974) 14. Takeuchi, H., Ishikawa, M., Kodama, N.: Time-series data analysis of blood sugar and HbA1c levels of diabetic. In: 4th IET International Conference on Advances in Medical, Signal and Information Processing (MEDSIP) (2008) 15. Bradley E., Kantz H.: Nonlinear time-series analysis revisited. Chaos: Interdisc. J. Nonlinear Sci. 25(9), 097610 (2015) 16. Chen, Z., Zhang, J., Yang, R., Kellman, P., Johnston, L.A., Egan, G.F.: IIR GRAPPA for parallel MR image reconstruction. Magn. Reson. Med. 63(2), 502–509 (2010) 17. Chang, Y., Wang, H., Zheng, Y., Lin, H.: Instrument variables for reducing noise in parallel MRI reconstruction. Biomed. Res. Int. 9016826, 2017 (2017) 18. Chang, Y., Wang, H.: Kernel principal component analysis of coil compression in parallel imaging. Comput. Math. Methods Med. 4254189, 2018 (2018) 19. Wang, H., Zhou, Y., Su, S., Hu, Z., Liao, J., Chang, Y.: Adaptive volterra filter for parallel MRI reconstruction. EURASIP J. Adv. Signal Process. 34, 2019 (2019)
Chapter 8
Low Computational Complexity Third-Order Tensor Representation Through Inverse Spectrum Pyramid Roumen Kountchev and Roumiana Kountcheva
Abstract Tensor representation of video sequences or 3D images becomes very popular recently. Despite certain advantages, the main obstacle for the wide application of this approach is the high computational complexity. In the paper is presented new method for third-order tensor representation in the spectrum space of the 3DWalsh–Hadamard Transform (WHT). To lessen the computational complexity, here is used pyramidal decomposition based on the 3D WHT with Reduces Inverse Spectrum Pyramid (RISP). In result is achieved high concentration of the tensor energy in a minimum number of spectrum coefficients, most of which—in the first (lowest) decomposition level. After the processing, the tensor is transformed into a multi-level spectrum tensor of same size. The proposed representation has low computational complexity because its execution needs operations like “addition” only. The specific properties of the 3D-RISP permit it to be used in various application areas which require parallel processing and analysis of 3D data represented as third-order tensors: sequences of correlated images (video, multi-spectral, multi-view, various kinds of medical images), multichannel signals, huge massifs of data, etc.
8.1 Introduction Recently, the most popular approaches for representation of linear groups of data of various kinds are based on tensors [1]. The usual way for image representation is the 2D matrix, and video sequences or groups of multispectral images or other kinds of correlated images can be treated as 3D matrices. Large number of databases exist which contain videos or image sequences (computer tomography, etc.). Such databases are usually of extremely big size and comprise groups of highly correlated R. Kountchev (B) Technical University of Sofia, Sofia, Bulgaria e-mail: [email protected] R. Kountcheva TK Engineering, Mladost 3, B 12, Sofia, Bulgaria e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_8
61
62
R. Kountchev and R. Kountcheva
images. Unfortunately, the object search and analysis in such image databases need huge amounts of data to be processed. Recently tensors, and specially the thirdorder tensors, are most suitable to represent video sequences. The main obstacle for the wide tensor decomposition implementation in real-time applications is the high computational complexity. To solve the problem, various algorithms are already developed. Most frequently used are the Multilinear Singular Value Decomposition (MSVD) [2], the Multilinear Principal Component Analysis (MPCA) [3] the Tucker decomposition, the Canonical Polyadic (CP) decomposition, tensor networks [4], etc. In general, they use eigenvectors of matrices or vector sequences calculated using tensor transforms based on matricization, vectorization, and tensorization [5, 6]. In the result of the processing, the calculated tensor elements are highly decorrelated and their energy is concentrated in the first decomposition components only. The main disadvantage is that this approach has high computational complexity. One alternative is based on the Tensor-SVD (t-SVD) [7] executed in the Fourier domain. Tensor decomposition implementations in many cases are based on deterministic discrete transforms: the Discrete Wavelet Transform (DWT) or the Discrete Cosine Transform (DCT) followed by SVD in the frequency domain [8]. In some publications [9–12] are proposed algorithms for cubical decomposition based on the 3D separable discrete transforms: the Discrete Fourier Transform (DFT), the Discrete Hartley Transform (DHT), the Discrete Cosine Transform (DCT), etc. To lessen the computational complexity in many cases are used specially developed algorithms, known as “fast”. One of them is the 3D Fast Fourier Transform (FFT) which reduces the computational complexity of the 3D transforms for a tensor of size N × N × N from O(N 4 ), to O(N 3 logN). Compared to SVD/PCA-based algorithms, the tensor decomposition based on deterministic orthogonal transforms offers lower energy concentration in the first decomposition components but accelerates the computations. This is why the use of the tensor decomposition based on orthogonal transforms is reasonable in cases when real-time processing of various multi-dimensional data is needed. In this work, a new approach is offered for hierarchical third-order tensor decomposition based on the Three-Dimensional Inverse Spectrum Pyramid (3D-ISP). In result, tensors are transformed into the spectrum space of the 3D Walsh–Hadamard Transform (WHT). The 3D-ISP is a generalization of the Inverse Difference Pyramid (IDP) used for the hierarchical representation of 2D matrices and introduced in earlier publications of the authors [13, 14]. The new approach ensures lower computational complexity together with high energy concentration in the first decomposition components. The tree-like decomposition structure of the 3D Inverse Spectrum Pyramid [15, 16] shown in Fig. 8.1 permits to neglect a part of the low-energy “leaves” and “branches” and as a final result—to reduce the tensor dimensionality (i.e., the computational complexity). The basic concept of the new approach is to represent each third-order tensor X of size N × N × N through the 3D Reduced Inverse Spectrum Pyramid (3D-RISP). To facilitate the computations, the decomposition is based on the Walsh–Hadamard Transform (WHT). This pyramid is a modification of the 2DRISP [13] generalized for the three-dimensional case. The new idea is that the tensor X is initially divided into Q sub-tensors X q for q = 1, 2, …, Q, each of size M ×
8 Low Computational Complexity Third-Order Tensor Representation …
63
Leaves ISP level p=3 512 coefficients
Leaves ISP level p=2 64 coefficients
Spectral 3D space
Root ISP level p=1 8 coefficients
Fig. 8.1 Basic tree-like structure of the 3D Inverse Spectrum Pyramid in the spectrum domain
√ M × M where M = N /3 Q . The value of M is defined in accordance with the condition M = 2m . For the calculation of each sub-tensor X q of size N × N × N (N = 2n ) is built an individual n-level 3D-RISP. In result, the tensor X is transformed into the corresponding spectrum tensor S which comprises n-levels of coefficients. The coefficients in the initial level have the highest energy concentration, while the energy in the next levels decreases quickly. In correspondence with Parseval’s theorem, the total energy of the coefficients of the tensor S is equal to that of the elements of the tensor X but is redistributed. The new approach is an alternative of the famous statistical tensor decompositions such as the Multilinear SVD [2], the Canonical Polyadic Decomposition [4, 5], the H-Tucker and their modifications based on the calculation of eigenvectors and eigenvalues [9–12]. For them is achieved full decorrelation of the decomposition components but the computational complexity is very high. The presented
64
R. Kountchev and R. Kountcheva
approach does not ensure full decorrelation of the spectrum tensor S elements but is highly efficient in respect of the achieved decorrelation at minimum computational complexity. The paper is structured as follows: in Sect. 8.2 is given the method for thirdorder tensor representation through the 3D Reduced ISP; Sect. 8.3 outlines the main characteristics of the spectrum tensor, and Sect. 8.4 contains the conclusions.
8.2 Method for Third-Order Tensor Representation Through 3D Reduced ISP The method for representation of the third-order tensor X of size N × N × N (N = 2n ) is based on its decomposition through the 3D Reduced Inverse Spectrum Pyramid (3D RISP) for which is repeatedly applied the direct and inverse 3D Walsh–Hadamard Transform (3D-WHT) [11, 12]: s(u, v, l) =
N −1 N −1 N −1 i=0
x(i, j, k) wal(i, u, N )wal( j, v, N )wal(k, l, N ) (8.1)
j=0 k=0
for u, v, l = 0, 1, …, N − 1, and N −1 N −1 N −1 1 x(i, j, k) = 3 s(u, v, l)wal(i, u, N )wal( j, v, N )wal(k, l, N ), N u=0 v=0 l=0
(8.2) where the one-dimensional Nth-order Walsh–Hadamard (WH) functions wal(i, u, N), wal(j, v, N) and wal(k, l, N) for n = lg2 N are defined by the relations: n−1
wal(i, u, N ) = (−1)r =0
qr (i)u r
,
where q0 (i) = i n−1 , qr (i) = i n−r ⊕ i n−r −1 , r = 0, 1, . . . , n − 1, i=
n−1
ir 2r ,u =
r =0
n−1
u r 2r ;
r =0 n−1
qr ( j)vr
wal( j, v, N ) = (−1)r =0 , q0 ( j) = jn−1 , qr ( j) = jn−r ⊕ jn−r −1 for r = 0, 1, . . . , n − 1;
(8.3)
8 Low Computational Complexity Third-Order Tensor Representation …
j=
n−1
jr 2r , v =
r =0
n−1
65
vr 2r ;
(8.4)
r =0 n−1
wal(k, l, N ) = (−1)r =0
qr (k)lr
,
q0 (k) = jn−1 , qr (k) = kn−r ⊕ kn−r −1 for r = 0, 1, . . . , n − 1; k=
n−1
kr 2r , l =
r =0
n−1
lr 2r .
(8.5)
r =0
8.2.1 Representation of a Tensor of Size 4 × 4 × 4 The tensor of size 4 × 4 × 4 is used here as a basic building unit. It is the sub-tensor of the main tensor X, but to simplify the notations, the index q of the sub-tensor X q is not used and the general notation is X. On the other hand, the basic tensor X is represented as a sequence of matrices which correspond to its four consecutive sections. Let the matrices [X k ] for k = 1, 2, 3, 4 are the sections of the basic tensor X of size 4 × 4 × 4. Then, they are represented as ⎡
b1,1 ⎢ b1,3 [X 1 ] = ⎢ ⎣ b1,9 b1,11 ⎡ b3,1 ⎢ b3,3 [X 3 ] = ⎢ ⎣ b3,9 b3,11
⎤ ⎡ b1,6 b2,1 ⎢ b2,3 b1,8 ⎥ ⎥, [X 2 ] = ⎢ ⎣ b2,9 b1,14 ⎦ b1,16 b2,11 ⎤ ⎡ b3,6 b4,1 ⎢ b4,3 b3,8 ⎥ ⎥, [X 4 ] = ⎢ ⎣ b4,9 b3,14 ⎦
b1,2 b1,4 b1,10 b1,12
b1,5 b1,7 b1,13 b1,15
b3,2 b3,4 b3,10 b3,12
b3,5 b3,7 b3,13 b3,15 b3,16
b4,11
⎤ b2,6 b2,8 ⎥ ⎥, b2,14 ⎦ b2,16 ⎤ b4,6 b4,8 ⎥ ⎥, b4,14 ⎦
b2,2 b2,4 b2,10 b2,12
b2,5 b2,7 b2,13 b2,15
b4,2 b4,4 b4,10 b4,12
b4,5 b4,7 b4,13 b4,15 b4,16
(8.6)
where bk,i for k = 1, 2, 3, 4 and i = 1, 2 ,…, 16 are the elements of the tensor X. The decomposition of the tensor X through the 3D-RISP is based on the execution of the operations explained below. In the level p =1 of 3D-RISP, through direct 3DWHT are calculated the 8 low-frequency coefficients s(u, v, l), for u, v, l = 0, 1. The volumes of the cubes V 1 − V 8 , shown in Fig. 8.2, are calculated using the elements of the matrices [X k ]: V1 =
4 2 k=1 i=1
bk,i ; V2 =
8 2 k=1 i=5
bk,i ; V3 =
12 2 k=1 i=9
bk,i ; V4 =
16 2
bk,i ;
k=1 i=13
(8.7)
66
R. Kountchev and R. Kountcheva
Fig. 8.2 Division of tensor X into 8 sub-tensors
k
N-1 N/2
V5 N-1
N/2
0
V1
i
V2 V7
N/2
V3
j
V5 =
4 4 k=3 i=1
bk,i ; V6 =
8 4 k=3 i=5
V6
V8
V4
N-1
bk,i ; V7 =
12 4
bk,i ; V8 =
16 4
k=3 i=9
bk,i .
k=3 i=13
(8.8) The eight low-frequency coefficients s(u, v, l) are defined as follows: s(0, 0, 0) = α = V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 ;
(8.9)
s(1, 0, 0) = β = V1 − V2 + V3 − V4 + V5 − V6 + V7 − V8 ;
(8.10)
s(0, 1, 0) = γ = V1 + V2 − V3 − V4 + V5 + V6 − V7 − V8 ;
(8.11)
s(1, 1, 1) = δ = V1 − V2 − V3 + V4 − V5 + V6 + V7 − V8 ;
(8.12)
s(0, 0, 1) = η = V1 + V2 + V3 + V4 − V5 − V6 − V7 − V8 ;
(8.13)
s(1, 0, 1) = χ = V1 − V2 + V3 − V4 − V5 + V6 − V7 + V8 ;
(8.14)
s(0, 1, 1) = ρ = V1 + V2 − V3 − V4 − V5 − V6 + V7 + V8 ;
(8.15)
s(1, 1, 0) = ξ = V1 − V2 − V3 + V4 + V5 − V6 − V7 + V8 .
(8.16)
The first approximation X of the restored tensor X, calculated on the basis of the coefficients s(u, v, l), is calculated through inverse 3D-WHT, in correspondence with the relation below:
8 Low Computational Complexity Third-Order Tensor Representation …
X = (1/43 )
1 1 1
67
s(u, v, l)W u,v,l = (1/64)[s(0, 0, 0)W 0,0,0
u=0 v=0 l=0
+ s(1, 0, 0)W 1,0,0 + s(0, 1, 0)W 0,1,0 + s(1, 1, 1)W 1,1,1 + s(0, 0, 1)W 0,0,1 + s(1, 0, 1)W 1,0,1 + s(0, 1, 1)W 0,1,1 + s(1, 1, 0)W 1,1,0 ].
(8.17)
Here, W u,v,l are the tensors of size 4 × 4 × 4, shown on Fig. 8.3. They represent the eight basic 3D-WHT functions in the first 3D-RISP level (p = 1) which correspond to the low-frequency spectrum coefficients s(u, v, l) for u, v, l = 0, 1. The weight of the bright cubes is (+1), and for the dark cubes it is (−1). In correspondence with Eq. (8.17) the tensor X is represented through the sequence of matrices [ X˜ k ], for k = 1, 2, 3, 4:
l
0
u v S(0,0,1)
S(1,0,1) S(u,v,l) for u,v,l = 0,1
S(0,0,0)
S(1,0,0)
S(0,1,1)
S(0,1,0)
S(1,1,1)
Fig. 8.3 Basic 3D WHT functions of size 4 × 4 × 4
S(1,1,0)
68
R. Kountchev and R. Kountcheva
⎡
A1 ⎢ A1 [ Xκ ] = ⎢ ⎣ C1 C1
A1 A1 C1 C1
B1 B1 D1 D1
⎤ ⎡ B1 F1 ⎢ F1 B1 ⎥ ⎥ for k = 1, 2; [ Xk] = ⎢ ⎣ P1 D1 ⎦ D1 P1
F1 F1 P1 P1
G1 G1 R1 R1
⎤ G1 G1 ⎥ ⎥ for k = 3, 4. R1 ⎦ R1 (8.18)
The elements of the matrices are defined in accordance with Eq. (8.18) where the coefficients s(u, v, l) for u, v, l = 0, 1 are calculated by using V 1 up to V 8 following Eqs. (8.7) and (8.8): A1 = (1/64)(α + β + γ + δ + η + χ + ρ + ξ ) = (1/8)V1 ;
(8.19)
B1 = (1/64)(α − β + γ − δ + η − χ + ρ − ξ ) = (1/8)V2 ;
(8.20)
C1 = (1/64)(α + β − γ − δ + η + χ − ρ − ξ ) = (1/8)V3 ;
(8.21)
D1 = (1/64)(α − β − γ + δ + η − χ − ρ + ξ ) = (1/8)V4 ;
(8.22)
F1 = (1/64)(α + β + γ − δ − η − χ − ρ + ξ ) = (1/8)V5 ;
(8.23)
G 1 = (1/64)(α − β + γ + δ − η + χ − ρ − ξ ) = (1/8)V6 ;
(8.24)
P1 = (1/64)(α + β − γ + δ − η1 − χ + ρ − ξ ) = (1/8)V7 ;
(8.25)
R1 = (1/64)(α − β − γ − δ − η + χ + ρ + ξ ) = (1/8)V8 .
(8.26)
In accordance with the algorithm for the execution of the 3D Reduced ISP (3DRISP) is calculated the difference tensor, E 1 . In this case, it is the difference E 1 = X − X which is the error between the first approximation and the original. The tensor E 1 is represented by the sequence of matrices [E 1k ], when k = 1, 2, 3, 4: ⎡
b1,1 − A1 b1,2 − A1 ⎢ b1,3 − A1 b1,4 − A1 [E 11 ] = ⎢ ⎣ b1,9 − C1 b1,10 − C1 b1,11 − C1 b1,12 − C1 ⎡ b2,1 − A1 b2,2 − A1 ⎢ b2,3 − A1 b2,4 − A1 [E 12 ] = ⎢ ⎣ b2,9 − C1 b2,10 − C1 b2,11 − C1 b2,12 − C1
⎤ b1,6 − B1 b1,8 − B1 ⎥ ⎥, b1,14 − D1 ⎦ b1,16 − D1 ⎤ b2,5 − B1 b2,6 − B1 b2,7 − B1 b2,8 − B1 ⎥ ⎥, b2,13 − D1 b2,14 − D1 ⎦ b2,15 − D1 b2,16 − D1
b1,5 − B1 b1,7 − B1 b1,13 − D1 b1,15 − D1
(8.27)
(8.28)
8 Low Computational Complexity Third-Order Tensor Representation …
⎡
b3,1 − F1 ⎢ b3,3 − F1 [E 13 ] = ⎢ ⎣ b3,9 − P1 b3,11 − P1 ⎡ b4,1 − F1 ⎢ b4,3 − F1 [E 14 ] = ⎢ ⎣ b4,9 − P1 b4,11 − P1
b3,2 − F1 b3,4 − F1 b3,10 − P1 b3,12 − P1
b3,5 − G 1 b3,7 − G 1 b3,13 − R1 b3,15 − R1
b4,2 − F1 b4,4 − F1 b4,10 − P1 b4,12 − P1
b4,5 − G 1 b4,7 − G 1 b4,13 − R1 b4,15 − R1
⎤ b3,6 − G 1 b3,8 − G 1 ⎥ ⎥, b3,14 − R1 ⎦ b3,16 − R1 ⎤ b4,6 − G 1 b4,8 − G 1 ⎥ ⎥. b4,14 − R1 ⎦ b4,16 − R1
69
(8.29)
(8.30)
In the next level (p = 2) of the 3D-RISP, the tensor E 1 is divided into 8 sub-tensors E 1t of size 2 × 2 × 2 (for t = 1, 2, …, 8). For each sub-tensor are calculated the quantities V t1 ÷ V t8 for which are needed 8 low-frequency coefficients st (u, v, l) for u, v, l = 0, 1 obtained through the direct 3D Walsh–Hadamard transform (3D-WHT): For the sub-tensor E 1t is obtained: V11 = b1,1 − A1 ; V12 = b1,2 − A1 ; V13 = b1,3 − A1 ; V14 = b1,4 − A1 ;
(8.31)
V15 = b2,1 − A1 ; V16 = b2,2 − A1 ; V17 = b2,3 − A1 ; V18 = b2,4 − A1 .
(8.32)
The 8 coefficients for the sub-tensor E 1t are s1 (0, 0, 0) = α11 = V11 + V12 + V13 + V14 + V25 + V16 + V16 + V17 + V18 = V1 − 8A1 = 0;
(8.33)
s1 (1, 0, 0) = β11 = V11 − V12 + V13 − V14 + V15 − V16 + V17 − V18 ;
(8.34)
s1 (0, 1, 0) = γ11 = V11 + V12 − V13 − V14 + V15 + V16 − V17 − V18 ;
(8.35)
s1 (1, 1, 1) = δ11 = V11 − V12 − V13 + V14 − V15 + V16 + V17 − V18 ;
(8.36)
s1 (0, 0, 1) = η11 = V11 + V12 + V13 + V14 − V15 − V16 − V17 − V18 ;
(8.37)
s1 (1, 0, 1) = χ11 = V11 − V12 + V13 − V14 − V15 + V16 − V17 + V18 ;
(8.38)
s1 (0, 1, 1) = ρ11 = V11 + V12 − V13 − V14 − V15 − V16 + V17 + V18 ;
(8.39)
s1 (1, 1, 0) = ξ11 = V11 − V12 − V13 + V14 + V15 − V16 − V17 + V18 .
(8.40)
The 8 low-frequency coefficients st (u, v, l) for the sub-tensors E 1t of size 2 × 2 × 2 when t = 2, 3, …, 8 are calculated in similar way. From the relations used for
70
R. Kountchev and R. Kountcheva
the coefficients in the level p = 2, it follows that: st (0, 0, 0) = αt1 = 0 for t = 1, 2, . . . , 8.
(8.41)
The spectrum tensor S in the first level (p = 1) comprises 8 coefficients α, β, γ , δ, η, χ , ρ, ξ and in the next level (p = 2) it is of 7 coefficients βt1 , γt1 , δt1 , ηt1 , χt1 , ρt1 , ξt1 , for t = 1, 2, …, 8 (56 in total). The tensor S comprises 64 elements (equal to the number of the elements of the tensor X). In result, the 3D-RISP pyramid is not “overcomplete” and this modification is called “reduced”. The spectrum coefficients in both pyramid levels which represent the tensor S are used to restore the initial tensor X. For this, for the coefficients in each pyramid level p = 1, 2 is executed inverse 3D-WHT. The restored tensor X of size 4 × 4 × 4 is represented in accordance with the relation below: X= X + E1.
(8.42)
The first tensor X of size 4 × 4 × 4 for the level p = 1 (the first approximation of the tensor X) is calculated in accordance with Eq. (8.17). The difference tensor E 1 of size 4 × 4 × 4 for the level p = 2 comprises 8 sub-tensors E 1t of size 2 × 2 × 2 for t = 1, 2, …, 8, defined by the relation: E 1t = (1/23 )
1 1 1
st (u, v, l)W u,v,l = (1/8)[st (0, 0, 0)W 0,0,0
u=0 v=0 l=0
+ st (1, 0, 0)W 1,0,0 + st (0, 1, 0)W 0,1,0 + st (1, 1, 1)W 1,1,1 + st (0, 0, 1)W 0,0,1 + st (1, 0, 1)W 1,0,1 + st (0, 1, 1)W 0,1,1 + st (1, 1, 0)W 1,1,0 ].
(8.43)
In this case, W u,v,l are tensors of size 2 × 2 × 2 which represent the 8 basic 3D-WHT functions from level 2 and are similar to these, shown in Fig. 8.3. Fig. 8.4 shows the 3D-RISP transformation of the tensor X of size 4 × 4 × 4 into the spectrum tensor S of same size. The new tensor is of 2 levels of coefficients (shown as circles) tinted in different colors (orange and blue). The coefficients can be transformed into one-dimensional sequence which is a vector of 64 coefficients. Together with the increasing of their sequential number, the absolute values decrease quickly. For the calculations, the elements of the tensor S are passed sequentially from 1 to 64 (indicated by the red arrows on Fig. 8.4) for the coefficients in the first level (a cube of size 2 × 2 × 2), tinted in orange. Each of the next 8 cubes of size 2 × 2 × 2 and the corresponding elements (blue) are processed in similar way. To reduce the information redundancy, the components of the 64-dimensional vector are treated by using amplitude or regional cutting-off the low-energy components depending on the value of the acceptable error in the restored tensor after the inverse 3D-RISP.
8 Low Computational Complexity Third-Order Tensor Representation … Tensor X 4×4×4
71
11 Tensor S 10 4×4×4 9 8 3 2 1 3D-RISP 6 7 3D-RISP-1
4
5
- element of layer 1
- element of layer 2
Fig. 8.4 Transformation of the tensor X of size 4 × 4 × 4 into the corresponding spectrum tensor, S
8.2.2 Representation of the Tensor of Size N × N × N Through N-Level 3D-RISP The new method for third-order tensor decomposition in the spectrum space of 3DWHT presented above for the 2-level 3D-RISP could be generalized for the cases when n > 2. For example, to restore the tensor X of size 8 × 8 × 8 (N = 23 ) from the corresponding 3-level spectrum tensor S (n = 3), for the coefficients in each 3D-RISP level are executed inverse 3D-WHT transforms of size 8 × 8 × 8, 4 × 4 × 4 and 2 × 2 × 2, correspondingly. Then, by analogy with Eq. (8.42), the restored tensor is represented as a sum of three tensors, each of size 8 × 8 × 8: X= X+ E1 + E2.
(8.44)
Here, the tensor E comprises 8 sub-tensors: 1 1 t st (u, v, l)W u,v,l each of size 4 × 4 × 4, for t = E 1 = (1/26 ) 1u=0 1v=0 l=0 1, 2, …, 8; st (u, v, l) are 8 low-frequency coefficients calculated for each sub-tensor E 1t in X. the structure of the difference tensor E 1 = X − E2. By analogy with E1 is defined the tensor of the second difference: E 2 = E 1 − In the general case, the tensor X decomposition through the n-levels 3D-RISP is defined by the relation: X= X+
n−2 k=1
where:
E k + E n−1 ,
(8.45)
72
R. Kountchev and R. Kountcheva
X = (1/23n )
1 1 1
s(u, v, l)W u,v,l , E 1 = X − X,
u=0 v=0 l=0
E k−1 for k = 2, 3, . . . , n − 2, E k = E k−1 − t E k ⊇ { E k , t = 1, 2, . . . , 8k }, t E k = [1/23(n−k) ]
1 1 1
st (u, v, l)W u,v,l for k = 1, 2, . . . , n − 2,
u=0 v=0 l=0
E n−2 . E n−1 = E n−2 − The coefficients which belong to same levels of the local pyramids for all subtensors X q for q = 1, 2, …, Q are arranged into Q common groups in which the correlation between neighbor coefficients is relatively high. To reduce the information redundancy, the corresponding information could be saved using some algorithms for lossless data coding.
8.3 Characteristics of the Spectrum Tensor Representation For the evaluation of the spectrum tensor S properties, are analyzed the relations which define the coefficients in the 3D-RISP decomposition levels. For the two-level pyramid the coefficients are calculated in accordance with Eqs. (8.9)–(8.16) for the initial level (p = 1), and with Eqs. (8.33)–(8.41)—for the second level (p = 2). On the basis of these relations is calculated the power distribution of the coefficients in each decomposition level. The corresponding powers (P1 and P2 ) are calculated in accordance with the relations: P1 = (1/8)
1 1 1
s(u, v, l)2
u=0 v=0 l=0
= (1/8) α 2 + β 2 + γ 2 + δ 2 + η2 + χ 2 + ρ 2 + ξ 2 P2 = (1/64)
1 1 1 s
(8.46)
st (u, v, l)2
u=0 v=0 l=0 t=1 s
2 2 2 2 2 = (1/64) + ηt1 + χt1 + ρt1 + ξt12 . βt1 + γt12 + δt1
(8.47)
t=1
The relation between P1 and P2 is defined below:
8 α 2 + β 2 + γ 2 + δ 2 + η2 + χ 2 + ρ 2 + ξ 2 P1 ψ= = δ 2 . 2 2 2 2 2 2 P2 t=1 βt1 + γt1 + δt1 + ηt1 + χt1 + ρt1 + ξt1
(8.48)
8 Low Computational Complexity Third-Order Tensor Representation …
73
From Eqs. (8.9)–(8.16) it follows that in the first level the coefficient α is equal to the sum of all elements of the tensor X: α=
8 p=1
Vp =
16 4
bk,i .
(8.49)
k=1 i=1
Each of the remaining 7 coefficients β, γ , δ, η, χ , ρ, ξ corresponds to the difference between the two sums which comprise the 32 neighbor elements of the tensor X. From this, it follows that the power of the coefficient α is much higher than that of the remaining coefficients in the first level. From Eqs. (8.31)–(8.41), for t = 1 and from the similar relations for t = 2, 3, …, 8 is obtained, that in the second level each coefficient αt1 = 0 and the remaining 7 coefficients βt1 , γt1 , δt1 , ηt1 , χt1 , ρt1 , ξt1 in the group t are equal to the difference of the two sums which contain the 4 neighbor elements of the tensor X. Hence, the values of these coefficients have high probability to be equal, or close to zero, and their power is relatively small. The analysis of the components in Eq. (8.48) shows that the total power of the 8 coefficients in the first level is much higher than that of the remaining 56 (in the second level). For this reason, the coefficient ψ is much higher than 1, i.e.,—the main part of the tensor X energy is concentrated in a small number of coefficients in the first level of 3D-RISP. This conclusion could be also generalized for a pyramid of more than two levels. For the case when n = 2, the coefficients in the first level are only 12.5 % from all coefficients. In the general case, the number of spectrum coefficients NΣ needed for the representation of the tensor X of size N × N × N for N = 2n through n-levels 3D-RISP is defined as follows: NΣ = 8 + 7 × 8 + 7 × 8 + · · · + 7 × 8 2
n−1
=8+7
n−1
8k = 8n = N 3 . (8.50)
k=1
The equation NΣ = N 3 proves that the pyramidal representation of the tensor X through the spectrum tensor S of n-levels is not “overcomplete”. For the case, when the number of pyramid levels is limited up to p < n, the total number of the retained coefficients is N ( p) = 8 + 7 × 8 + 7 × 82 + · · · + 7 × 8 p−1 = 8 + 7
p−1
8k = 8 p .
(8.51)
k=1
The relative number of the retained coefficients in a pyramid of p levels, compared to NΣ is: τ ( p) = N ( p)/NΣ = 1/8n− p .
(8.52)
In Table 8.1 is shown the change of the relative number of retained coefficients τ (p) compared to their total number NΣ as a function of the used pyramid levels, p.
74
R. Kountchev and R. Kountcheva
Table 8.1 Number of retained coefficients τ (p) in a pyramid of p levels Number of levels, p
1
τ (p)
1/8n
2 −1
3
1/8n
−2
1/8n
−3
…
n −2
n −1
n
…
1/82
1/8
1
The pyramid is limited up to the level p < N. The total number of coefficients for p = N is NΣ . This relationship shows that together with the increasing of p from 1 to n, the information redundancy in the space of the tensor S is reduced from 8n − 1 to 1 (i.e., no reduction in the last level). On the other hand, together with the increasing of p, is reduced the mean square approximation error ε(p) of the restored tensor, X: ε( p) =
⎧ ⎪ ⎨ ⎪ ⎩
1 N3
N N N
e p (i, j, k)2 =
i=1 j=1 k=1
0
1 N3
N N N
[x(i, j, k) − x˜ p (i, j, k)]2 for p = 1, 2, .., n − 1
i=1 j=1 k=1
−
for p = n.
(8.53) For p = n is satisfied the condition for the full restoration x˜n (i, j, k) = x(i, j, k) and in result is obtained ε(n) = 0. Practically, the full restoration of the transform of S into X is not possible due to existing small rounding errors. The computational complexity evaluated on the basis of the number of additions () and multiplications (M) needed to calculate the 2-level 3D-RISP coefficients is defined in accordance with Eqs. (8.7)–(8.16), and (8.33)–(8.41). The results are given in Table 8.2 and they show that for the calculations are needed 624 additions. The multiplications by 1/8 needed for the calculation of parameters A1 – R1 are executed by 3-positions right-shifting of the multiplied binary numbers V i (which is not a typical multiplication). To calculate the 8 coefficients s(u, v, l) through the 3D-WHT, each row of N elements of the tensor of size N × N × N (N = 2n ) is consecutively transformed by 1D-WHT into 3 mutually orthogonal directions [11]. To accelerate the calculations in each direction is used the “fast” 1D-WHT algorithm [12]. In result, the number of the needed additions is decreased from N 2 to Nlg2 N. Table 8.2 Computational complexity of the 2-level 3D-RISP
3D-RISP Parameters for n = 2
Σ
M
Level p = 1 V1 − V8
8 × 6 = 48
0
α − ξ
8 × 7 = 56
0
A1 − R 1
0
8
[E 1k ] for k = 1, 2, 3, 4
6 × 4 = 64
0
Level p = 2 V t1 − V t8, t = 1, 2, …, 8
8 × 8 = 64
0
β t1 − ξ t1 , t = 1, 2, …, 8
8 × 7 × 7 = 392
0
Total
624
8
8 Low Computational Complexity Third-Order Tensor Representation …
75
The algorithm is additionally accelerated when the number of coefficients is reduced from At the end, the total number of calculations is reduced from nN to N to−2. p 2 = 2(N − 1). N n−1 p=0 The execution of the additions is accelerated in accordance with the relation:
R(n) = (Nlg2 N )/2(N − 1) = (n/2)/ 1 − 2−n .
(8.54)
Hence, together with the growing of n, the value of R(n) becomes ≈n/2. In similar way, when the “fast” 3D-WHT is used, the calculation of the 8 coefficients s(u, v, l) is also accelerated. For n = 4, R(4) = 2 and as a result, the number of calculations needed for the 3D-RISP is reduced twice.
8.4 Conclusions In the paper is presented a new method for third-order tensor representation in the spectrum space of the 3D-WHT based on the multi-level 3D-RISP. The main advantages of the method are the lower computational complexity because the only mathematical operations which are needed are “additions” and their number is relatively low. Furthermore, the main part of the tensor energy is concentrated in a small number of coefficients from the first pyramid level which permits significant information redundancy reduction. The properties of the 3D-RISP open new possibilities for use in various application areas related to processing and analysis of 3D data represented as third-order tensors: sequences of correlated images (video, multi-spectral, multi-view, correlated medical images from various sources), multichannel signals, etc. The further investigation based on the presented method will be aimed at the achieving of higher efficiency through replacing the 3D-WHT by other kinds of deterministic statistical transforms, partial hardware implementations when parallel processing is needed, etc. Acknowledgements This work was supported by the National Science Fund of Bulgaria: Project No. KP-06-H27/16 “Development of efficient methods and algorithms for tensor-based processing and analysis of multidimensional images with application in interdisciplinary areas”.
References 1. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15(5), 706–722 (2009) 2. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000) 3. Lu, H., Plataniotis, K., Venetsanopulos, A.: Multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 19(1), 18–39 (2008)
76
R. Kountchev and R. Kountcheva
4. Domanov, I., De Lathauwer, L.: Canonical polyadic decomposition of third-order tensors: relaxed uniqueness conditions and algebraic algorithm. SIAM J. Matrix Anal. Appl. Soc. Ind. Appl. Math. 35(2), 636–660 (2014) 5. Kolda, T., Bader, B.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009) 6. He, Z., Gao, S., Xiao, L., Liu, D., He, H., Barber, D.: Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for sequence learning. In: 31st Conference on Neural Information Processing Systems (NIPS 2017), CA, USA, pp. 1–11 (2017) 7. Zhang, A., Xia, D.: Tensor SVD: statistical and computational limits. IEEE Trans. Inf. Theory 64(11), 7311–7338 (2018) 8. Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010) 9. Bergqvist, G., Larsson, E.: The higher-order singular value decomposition: theory and an application. IEEE Signal Process. Mag. 27(3), 151–154 (2010) 10. Cichocki, A., Mandic, D., Phan, A., Caiafa, C., Zhou, G., Zhao, Q., De Lathauwer, L.: Tensor decompositions for signal processing applications: from two-way to multi-way component analysis. IEEE Signal Process. Mag. 32(2), 145–163 (2015) 11. Sakai, T., Sedukhin, S.: 3D discrete transforms with cubical data decomposition on the IBM Blue Gene/Q. Technical Report 2013–001, Graduate School of Computer Science and Engineering, The University of Aizu, 31 p. (2013) 12. Agaian, S., Sarukhanyan, H., Egiazarian, K., Astola, J.: Hadamard Transforms. SPIE Press Book, Washington (2011) 13. Kountchev, R., Kountcheva, R.: Image representation with reduced spectrum pyramid. In: Tsihrintzis, G., Virvou, M., Howlett, R., Jain, L. (eds.) Chapter in: New Directions in Intelligent Interactive Multimedia, pp. 275–284. Springer (2008) 14. Kountchev, R., Rubin, S., Milanova, M., Kountcheva, R.: Comparison of image decompositions through Inverse difference and Laplacian pyramids. J. Multimed. Data Eng. Manag. 6(1), 19–38 (2015) 15. Kountchev, R., Kountcheva, R.: Hierarchical decomposition of 2D/3D images, based on SVD2 × 2 . Int. J. Multimed. Image Process. (IJMIP) 5(3/4), 286–296 (2015) 16. Kountchev, R., Kountcheva, R.: Image sequence decomposition based on the Truncated Hierarchical SVD. Int. J. Multimed. Image Process. (IJMIP) 7(1), 352–361 (2017)
Chapter 9
Spectral Reconstruction Based on Bayesian Regulation Neural Network Dongdong Gong, Jie Feng, Wenjing Xiao, and Shan Sun
Abstract In this paper, two sets of standard color cards of X-Rite are used to collect sample information using multi-spectral imaging system to study the spectral reconstruction accuracy. In order to solve the problems of the traditional neural network spectral reconstruction algorithms, this paper proposes appropriate improvements. The polynomial regression method is used to extend the camera response. The Bayesian regularization is used to improve the over-fitting phenomenon of the neural network. The combination of the two is used to improve the spectral reconstruction accuracy, and the spectral reconstruction results are discussed and evaluated. The experimental results show that the improved algorithm has significantly improved spectral accuracy and chromaticity accuracy, which is significantly higher than the traditional neural network spectral reconstruction algorithm. It has certain application value in the field of high spectral reconstruction accuracy and can meet the art. The requirement for high-precision color reproduction in the field.
9.1 Introduction Since the spectral reflectance of the surface of the object is one of the basic properties of the surface of the object, the reflectance of the surface of the object of different colors is different, and it does not change with the change of the lighting conditions. Spectral reflectance accurately describes the color of an object, so a quantitative description of color is important. The researchers proposed to build a multi-spectral imaging system to obtain spectral reflectance. In this paper, a multi-spectral imaging system is constructed by using a three-color camera to load a broadband filter. At present, the algorithms used for spectral reconstruction include pseudo-inverse method [1], Wiener method [2], principal component analysis method [3, 4], and BP neural network [5]. The BP algorithm is easy to form local minimum and excessive
D. Gong · J. Feng (B) · W. Xiao · S. Sun Yunnan Normal University, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_9
77
78
D. Gong et al.
training. To this end, this paper uses Bayesian regularized neural network to perform spectral reconstruction to improve reconstruction accuracy.
9.2 Principle of Spectral Reflectance Reconstruction 9.2.1 Polynomial Regression Digital cameras have difficulty in obtaining high-precision reflectivity with only three channels, so researchers have proposed increasing the reconstruction accuracy by increasing the number of filters or by directly using the number of extended camera response items. This paper combines these two methods to improve the accuracy of the transformation matrix Q. Because the signal vector C is scalable, more extensions are added to C to affect the accuracy of the characterization. In this paper, when a 6-channel digital camera signals is used to reconstruct the spectral reflectance, the signal vector is expanded with reference to the three channels.
9.2.2 Bayesian Regularization BP Neural Network Algorithm PCA is currently the most commonly used linear feature extraction method, in order to reduce the dimension of the feature. Any one of the spectral data sets can be represented as a linear combination of several main spectral basis vectors, and the N spectral data sets can be expressed as R = (r 1 , r 2 , r 3 , …, r n ). Since the degree of freedom of N is greater than R, a series of orthogonal base vectors can be linearly combined with representing R [6]. R = BA
(9.1)
where bi is the extracted base vector, represented by the vector B = (b1 , b2 , b3 , …, bp ); the coefficient vector is represented by A = (a1 , a2 , a3 , …, ap )T ; P represents the number of feature vectors. The base vector B is determined by PCA, and then In (9.1), find the corresponding conversion coefficient matrix A, that is, A = B−1 R, where R is the actual reflectivity of the training sample. It is assumed that there is a linear relationship between the system response value matrix C and the principal component coefficient matrix A, that is, A = QC, that is, the transformation matrix Q. Q = AC T (CC T )−1 The spectral reflectance of the reconstruction can be expressed as [7]
(9.2)
9 Spectral Reconstruction Based on Bayesian Regulation …
r = B A = Q BC = B AC T (CC T )−1 C1
79
(9.3)
where C 1 is the digital response value of the test sample and r is the estimated reflectance of the test sample. And BP neural network is used to establish the relationship between the response value and the principal component coefficient. The response value of the camera is used as the input of the neural network, and the output is the main component coefficient. LW, IW, b1 , and b2 are internal parameters of the neural network, and the input and output satisfy the following relationship: A = L W · tanh(I W · C + b1 ) + b2
(9.4)
Combined with the extracted base vector, the reconstructed spectral reflectance is obtained [8–10]: r = B A = B(L W · tanh (I W · C + b1 ) + b2 ).
(9.5)
In training process of BP neural network, there will be a problem of over-fitting, which leads to the fact that fresh samples other than the training samples can not be properly output, reducing the generalization ability. Regularization is a training method that further enhances the generalization ability by adding a penalty term of the training performance function of the neural network to simplify the weight connection structure of the network. ED=
m 1 (ti − xi )2 n i=1
(9.6)
where n is the number of samples, t i is the desired output, and x i is the actual output of the network. The Trainbr algorithm [11] introduces a correction function of the performance function based on the conventional mean squared performance function of neural network training. f (ω) = α E w + β E D EW =
m 1 2 ω . m i=1 i
(9.7) (9.8)
In the formula, α, β show the regularization coefficient, and ƒ (ω) represents the modified performance function. m, ωi represent the total number of network weights and network weights, respectively. If α β, the training algorithm tends to reduce the corresponding error of the learning set network, and the phenomenon of overfitting is prone to occur. If α β, the training pays attention to the mean square value of the network weight, and the resulting neural network has a simple structure, a smoother output, and no over-fitting problems [12].
80
D. Gong et al.
Fig. 9.1. Improved algorithm flow chart
Figure 9.1 is a flow chart of the improved algorithm. The BP neural network linearly fits the camera response value and the principal component coefficient. The polynomial model is used to increase the response value. The imaging channel improves the Q matrix accuracy, the Bayesian regularized neural networks prevent over-fitting to improve the accuracy of spectral reconstruction.
9.3 Experimental Results and Analysis 9.3.1 Experiment Process The experimental system is composed of a three-color digital camera and a set of broadband filters. The sample uses two sets of standard color cards, Digital ColorChecker SG (SG), and Color Checker Rendition Chart (RC). The light source adopts Image Quality Labs reflection tests source D65 light source, 2° field of view as experimental conditions, spectral reflectance is obtained by using spectrophotometer X-Rite SP64. The spectral reflectance wavelength is 400–700 nm, and the step size is 10 nm for sampling. Spectral data for each band. The direct directions of the two sources are both 45° from the sample in order to make the surface of the sample being illuminated uniform. According to Zou Jiping method [13], the uniformity of illumination is good after inspection, so there is no need to correct the response signal. After many experiments, the best shooting distance and exposure parameters were found. After the light source was stabilized, the film was taken. The training samples used 96 color blocks in the middle of the SG color card, The color blocks represent the true colors in nature, and the RC color cards are used as test samples. Figure 9.2 shows a schematic diagram of image information collection.
9 Spectral Reconstruction Based on Bayesian Regulation …
81
Fig. 9.2. Schematic diagram of image information collection
Test sample
Light source
Light source Filter
Canon 5D Mark
In this paper, the root mean square error (RMSE), CIEDE2000 (E 00 ) color difference formula [14], and fitness coefficient (GFC) are used to evaluate the quality of spectral reconstruction. GFC is defined as the cosine angle between the reconstructed spectral reflectance data r’ and the original spectral reflectance data r, ranging from 0 to 100%. When GFC ≥ 99.5%, the reconstruction result is acceptable. When GFC ≥ 99.99%, the result of the reconstruction can be perfect.
GFC =
λ r (λ)r (λ) . 1/2 2 2 1/2 λ r (λ) λ r (λ)
(9.9)
CIEDE2000 is by far the latest color difference formula, which is much more complicated than CIE94, and it also greatly improves the accuracy. E 00 =
L K L SL
2
+
Cab K C SC
2
+
Hab K H SH
2
+ Rr
Cab K C SC
Hab K H SH (9.10)
where S L , S C , S H are weighting functions,L , L , H represent the difference in brightness, the difference in chroma, and the difference in hue.
9.3.2 Analysis and Comparison of Results Table 9.1 shows the spectral reflectance reconstruction accuracy of the four reconstruction methods. It can be seen from Table 9.1 that the four algorithms are the best in the four spectral reconstruction algorithms. The average value of the RMSE of the pseudo-inverse method is 1.937 times that of the improved nonlinear algorithm, which indicates that the accuracy of the nonlinear algorithm is significantly higher than that of the linear algorithm because the photoelectric imaging system itself is a nonlinear system.
82
D. Gong et al.
Table 9.1. Spectral reflectance reconstruction accuracy of four algorithms E00
RMSE
GFC/%
Mean
Min
Max
Mean
Min
Max
Mean
Min
Max
Pseudo inverse
0.062
0.025
0.088
6.624
1.060
10.73
97.40
70.15
99.83
PCA
0.062
0.026
0.074
6.345
1.073
9.122
98.63
92.12
99.82
Neural network
0.038
0.011
0.164
3.500
0.559
8.135
99.05
94.50
99.97
Proposed
0.032
0.008
0.164
2.091
0.577
5.386
99.36
95.44
99.99
The average value of E 00 of the reconstructed sample and the actual sample obtained by the improved algorithm is 2.09, which is 1.409 lower than the average value of the color difference in the existing neural network algorithm, wherein the maximum color difference is reduced by 2.749, and the color reproduction precision is high, which can be high-precision color reproduction provides accurate information. In this paper, the mean fitness coefficient of the algorithm reaches 99.36%, and the maximum value reaches 99.99%, which indicates that the cosine angles for the reconstructed spectral reflectance and the actual value is small, and the spectral reconstruction effect is better. Considering the comprehensive consideration, the algorithm used in this paper has the best reconstruction effects, the neural network algorithm is the second, the pseudo-inverse method is the worst and there is a large reconstruction error. Figure 9.3 is a comparison of spectral reflectance of some test samples. It can be seen from the figure that the algorithm reconstruction curve is closer to the actual measured value than the neural network algorithm. Overall, the improved algorithm error is significantly smaller than the actual algorithm. In the former, the spectral reflectance recovery of the test sample is very high, indicating that the proposed algorithm has better spectral reflectance reconstruction performance. In order to more intuitively show the reconstruction accuracy of the improved algorithm and the existing neural network algorithm, the RMSE, E 00 of the 24color block in the test sample is counted. The RMSE distribution maps of the two algorithms are shown in Fig. 9.4, the improved algorithm has 75% of the RMSE values of the color patches before the improvement, and overall, the improved algorithm chromaticity accuracy is significantly higher than the former; Fig. 9.5 shows two under the algorithm, the color difference distribution map, 91.66% of the color block color difference is lower than the former, which proves that the improved algorithm is significantly improved in chromaticity precision. Overall, the reflectance of the sample using this method is closer to the actual measured value.
9 Spectral Reconstruction Based on Bayesian Regulation …
Reflectance
0.8
Actual Neural network Proposed
0.6 0.4
1.0
0.2 0.0
Actual Neural network Proposed
0.8
Reflectance
1.0
83
0.6 0.4 0.2 0.0
400 450 500 550 600 650 700
400 450 500 550 600 650 700
Wavelength/nm
Reflectance
0.8
Actual Neural network Proposed
0.6 0.4
0.8
0.2
0.4
0.0 400 450 500 550 600 650 700
400 450 500 550 600 650 700
Wavelength/nm
Wavelength/nm
Actual Neural network Proposed
0.6 0.4
1.0
Actual Neural network Proposed
0.8
Reflenctance
Reflectance
0.8
0.6
0.2
0.0
1.0
Actual Neural network Proposed
1.0
Reflectance
1.0
Wavelength/nm
0.6 0.4 0.2
0.2
0.0
0.0 400 450 500 550 600 650 700
400 450 500 550 600 650 700
Wavelength/nm
Wavelength/nm
Fig. 9.3. Comparison of spectral reflectance of some test samples
9.4 Conclusion Aiming at the problems of the neural network spectral reconstruction algorithm, the paper proposes an appropriate improvement and discusses the reconstruction effect of the proposed method. By using the polynomial regression method to increase the camera response, the reconstruction accuracy is effectively improved, At the same time, Bayesian regularization is used to avoid the defects of BP neural network. The
84
D. Gong et al.
Fig. 9.4. Two reconstruction methods calculate the root mean square error of the test sample
RMSE
Neural network Proposed
Testing sample number
Fig. 9.5. Two reconstruction methods calculate the color difference of the test sample
CIEDE2000
Neural network Proposed
Testing sample number
experimental results show that compared with the existing neural network algorithm, the average root mean square error of the improved algorithm is reduced by 0.006, the fitness coefficient is increased by 1.003%, and the average color difference is reduced by 1.409. Compared with the existing neural network algorithm, the proposed algorithm has improved spectral accuracy and chromaticity accuracy, which indicates that the method can be used in the field of high spectral reconstruction accuracy and has certain practical value.
9 Spectral Reconstruction Based on Bayesian Regulation …
85
References 1. Li, B.: Research on Multispectral Image Acquisition Method Based on Trichromatic Camera. Qufu Normal University (2012) 2. Jing, X., Yue, T.X., Zheng, L.: Color image spectral reconstruction algorithm based on wiener estimation. J. Shenyang Jianzhu Univ.: Nat. Sci. Ed. 30(1), 175–180 (2014) 3. Reng, P.Y., Liao, N.F.: Spectral reflectance reconstruction based on multispectral imaging. Opt. Technol. 31(3), 427–433 (2015) 4. Yu, H.Q., Liu, Z., et al.: Effects of sample characteristics on spectral image reconstruction. Packag. Eng. 35(13), 144–149 (2014) 5. Fu, W.Y., Liu, D.: Reconstr. Spectr. Reflectance Based Artif. Neural Netw. 36(7), 103–107 (2015) 6. Zhu, Y., LI, B.: The RGB digital camera’s multi-channel spectral reflectance estimated in various spaces using a trichromatic camera system. J. Imaging Sci. Technol. 44(4), 280–287 (2000) 7. Zhang, X.X.: Research on Spectral Reflectance Reconstruction Algorithm Based on Kernel Entropy Component Analysis. Yunnan Normal University (2017) 8. Dui, L.T.: Research on Spectral Reflectance Reconstruction Technology of Multi-spectral Images. East China Jiaotong University (2017) 9. Demuth, H.B., Baele, M.: Neural Network Toolbox User’s Guide. Ver. the Mathwork Inc Apple Hill Drive, vol. 21(15), pp. 1225–1233 (2003) 10. Yang, P., Liao, N.F., Song, H.: Research on spectral reflectance reconstruction method based on digital camera. Spectrosc. Spectr. Anal. 29(5), 1176–1180 (2009) 11. Li, Y., Shi, B.M.: Research on prediction of coal and gas outburst based on Bayesian regulation BP artificial neural network. Work. Cond. Autom. 2, 1–4 (2009) 12. Jiang, X.J., Tang, H.W.: Systematic analysis of generalization performance of feedforward networks. Syst. Eng. Theory Pract. 20(8), 36–40 (2000) 13. Zou, J.P.: Research on Spectral Reflectance Reconstruction Based on Multi-spectral Imaging System. Yunnan Normal University (2015) 14. Shi, C.C.: Research on Reflectance for Multispectral Imaging and Its Application. Tianjin University (2010)
Chapter 10
Loose Hand Gesture Recognition Using CNN Chen-Ming Chang and Din-Chang Tseng
Abstract A precise hand gesture recognition (HGR) system is an important facility for human-computer interaction (HCI). In this paper, we propose a multi-resolution convolutional neural network (CNN) to recognize the loose hand gesture, where loose means that the gestures can be more varied on the bending degrees of fingers, on the direction of palm, and on the bending angles of wrist.The proposed loose hand gesture recognition (LHGR) system learn the low-level features from both color and depth images and then concatenate the low-level features to learn the RGBD (RGB color and Depth) high-level features. The advantage is that it not only suppresses the problem of the inaccurate alignment pixels between color images and deep images, but also reduce the parameters of the CNN model. In addition, we use multi-resolution features to classify the hand gestures; therefore, the proposed model has stronger ability for smaller, farther, and blurrier images. In the training stage, we trained the proposed CNN model using various loose hand gestures to make the CNN more robust. In the experiments, we compared the proposed CNN model in several different architectures; the mAP (mean average precision) is highly to 0.9973. The proposed method has reliability in the scaling and rotation of hand gestures.
10.1 Introduction Human-computer interaction (HCI) has become one of the most popular automation research topics in those years. Owing to the intuitive mode of operation and a high degree of freedom, hand gesture recognition (HGR) has been a highly popular focus field in HCI.
C.-M. Chang · D.-C. Tseng (B) Institute of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan e-mail: [email protected] C.-M. Chang e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_10
87
88
C.-M. Chang and D.-C. Tseng
Fig. 10.1 A diagram of loose hand gestures. The two different figure shapes are recognized as the same gesture
Based on various sensors, HGR systems can be categorized into two categories: vision-based [1–3] and glove-based [4, 5]. A vision-based HGR using optical sensors to capture 2D images is very sensitive to light; so, most studies have added various restrictions to achieve a better hand segmentation. In contrast, glove-based HGR systems capture much more robust information from human hands. Nonetheless, users have to wear additional devices that cause inconvenience, involve extra costs, and give rise to inhibition of gesturing. Fortunately, the lower priced depth sensors provide a new opportunity for HGR systems. The common methods of depth sensing include stereo triangulation [6], structured light [7], and time-of-flight (ToF) [8]. Several study results [9, 10] reveal that these methods have no absolute winner; so, a device that would be widely accepted should promote user interaction. The past HGR studies can be divided into two types: one deals with the simplification of the complexity of hand gestures for real-time processing [3, 11]; in the other, the amount of calculations overhead required to attain precise gestures is not a significant factor [12, 13]. However, most systems are only treating standard gestures; it means that they did not accept the gestures with more variations on the bending degrees of fingers, the direction of palm, and the bending angles of wrist. An example of loose hand gestures is shown in Fig. 10.1; the hand gesture allow the rotations in roll, yaw, and pitch; moreover, the fingers are allowed to have different degrees of bending. CNNs have successfully been applied to analyzing visual imagery. The useful features are automatically derived through the use of multiple layers of nonlinear transformations. In recent years, CNNs have been widely used in HGR. Molchanov et al. [14] proposed an algorithm for drivers’ HGR from challenging depth and intensity data using 3D-CNNs on the whole video sequence and introduce spacetime video augmentation techniques to avoid overfitting. Ge et al. [15] proposed a novel 3D regression method using multi-view CNNs that can better exploit depth cues to recover fully 3D information of hand joints without model fitting. Han et al. [16] proposed a CNN method to reduce the difficulty of gestures recognition from a camera image. Barros et al. [17] proposed a CNN, called multichannel convolutional neural network (MCNN), which allows the recognition of hand postures with implicit feature extraction. Chai et al. [18] tackled the continuous gesture recognition problem with a two streams Recurrent Neural Networks (2S-RNN) for the RGB-D data input.
10 Loose Hand Gesture Recognition Using CNN
89
Mueller et al. [19] used two subsequently applied CNNs to localize the hand and regress 3D joint locations, even in the presence of clutter and occlusions. We here propose an LHGR system utilizing both color and depth images with a novel application of convolutional neural networks (CNNs). In the hand detection stage, the method is the same as the one mentioned in the previous article, but omits the wrist cutting. Next, we train a CNN model for gesture classification, which the features used were learned by the training sets. The proposed CNN focuses on an efficient architecture with two input paths, which can make a good use of RGB-D information to have a good recognition rate with a smaller amount of parameters. We hope that multi-scale features can participate in the final gesture classification, so the concept of single-shot multiBox detector (SSD) [20] network is referenced to add extra convolutional feature layers to the classifier at the end of the network. As mentioned above, the proposed method not only has reliability in the scaling and rotation of gestures, but allows the lower resolution (blurred) images as the inputs. The remaining sections of the paper are organized as follows. The proposed CNN architecture for LHGR is described in Sect. 10.2. Several experiments and comparisons among different CNN structures is reported in Sect. 10.3. At last, the conclusions are given in Sect. 10.4.
10.2 The Proposed Method First we present the training data sets we use for the CNN. Then we present the proposed CNN model and explain its design concept. Finally, we describe the used loss function.
10.2.1 Training Data Preparation The used color and depth images were captured by Microsoft Kinect II. The hand regions were detected by our previous algorithm and then resized into 100 × 100 pixels. There are 16 types of hand gestures which were captured from 5 persons (three males and two females). The hand image have different sizes, directions, and poses, as examples shown in Fig. 10.2.
10.2.2 The Proposed Neural Network Architecture The proposed CNN architecture is a dual-input model, which is inspired by network in network (NIN) [21] and SSD [20] networks and the network layers refer to VGG16 [22]. The two paths receive color and depth images, and learn the low-resolution
90
C.-M. Chang and D.-C. Tseng
Fig. 10.2 The samples of the gesture training dataset, where the odd rows are the color images and the even rows are the depth images
features, respectively. Then concatenate the low-resolution features to learn the 4channel high-resolution features. We also use multi-resolution features for the gesture classification. We took the advantages of NIN and SSD to design a CNN architecture where the two independent inputs are, respectively, the color images and depth maps, which is described in Table 10.1, and the simplified diagram is shown in Fig. 10.3; its 7 × 7 convolution layer consists of three 3 × 3 convolution layers, and the 5 × 5 convolution layer consists of two 3 × 3 convolution layers. In the proposed CNN model, we have two independent input paths: color and depth images. Because there are unavoidable inaccurate alignment pixels between color images and depth images with Kinect, we are not in a hurry to merge RGB and D information. Since the deeper feature maps are smaller and smaller, we concatenate the low-feature maps of RGB and D channels which are going through the same number of convolution and pooling layers. Such a method can reduce the differences between a color image and a depth image and is useful for training. After the concatenation, we only need one channel to learn this new 4D feature maps. The proposed model adopts the concepts of NIN, the module consists of a FCN, which all the kernel size are smaller than 3 × 3, and use many 1 × 1 kernels to add a nonlinearity. The global average pooling is also used before the prediction layer so that the system can accept input images of different sizes. In addition, we also adopt the concept of multi-resolution features of SSD. Here, we use four layers of feature maps from pool3, pool4, pool5, and pool6 to predict the gestures. Compared with the traditional CNN, the proposed method has a stronger ability for identification with poorer quality images, such as smaller, farther, and blurrier images.
10 Loose Hand Gesture Recognition Using CNN
91
Table 10.1 The architecture of the proposed CNN Layer
Kernel size
Data_1
Feature map
Layer
100 × 100 @1
Data_p
Kernel size
Feature map 100 × 100 @1
Conv1_1
3×3
100 × 100 @64
Conv1_1p
3×3
100 × 100 @64
Conv1_2
3×3
100 × 100 @64
Conv1_2p
3×3
100 × 100 @64
Conv1_3
3×3
100 × 100 @64
Conv1_3p
3×3
100 × 100 @64 50 × 50 @64
Pool1
3 × 3/stride2
50 × 50 @64
Pool1p
3 × 3/stride2
Conv2_1
3×3
50 × 50 @128
Conv2_1p
3×3
50 × 50 @128
Conv2_2
3×3
50 × 50 @128
Conv2_2p
3×3
50 × 50 @128
Pool2
3 × 3/stride2
25 × 25 @128
Pool2_p
3 × 3/stride2
25 × 25 @128
Layer
Kernel size
Feature map size
Concat
Pool2 + Pool2_p
25 × 25 @256
Conv3_1
3×3
25 × 25 @256
Conv3_2
3×3
25 × 25 @256
Pool3
3 × 3/stride2
12 × 12 @256
Conv4_1
3×3
12 × 12 @512
Pool4
3 × 3/stride2
6 × 6 @512
Conv5_1
3×3
6 × 6 @512
Pool5
3 × 3/stride2
3 × 3 @512
Conv6_1
1×1
3 × 3 @512
Pool6
Global average pooling
1 × 1 @512
Concat.
Pool4 + Pool 5 + Pool6
1 × 1 @552
Conv7
1×1
1 × 1 @16
7╳7
5╳5 5╳5
7╳7
3╳3
3╳3
1╳1
5╳5
Fig. 10.3 The simplified diagram of the proposed CNN architecture
10.2.3 Loss Function The objective loss function used for the proposed network is Softmax-loss function, to calculate the probability value σ (z)j belonging to each category after passing through the network. K is the number of categories, zj is the output value for each category in the last layer. For ease of use, we calculate the value of −log(σ (z)j ) for regression. Because the value of σ (z)j is between 0 and 1, −log(σ (z)j ) is always
92
C.-M. Chang and D.-C. Tseng
positive. If the classification is correct, the output of the loss function will be lower; otherwise the loss function will be higher. ez j σ (z) j = K k=1
ezk
, j = 1, . . . , K
(10.1)
10.3 Experiments We randomly selected 3,376 pairs of hand images as training samples and 1,116 pairs of hand images as testing samples. Based on the proposed LHGR system, we got a mean average precision (mAP) to 0.997333 using the proposed CNN model. In order to demonstrate the improved results, we design some different testing models to prove that the proposed system has an efficient architecture with input RGB-D images and has good effects on hand gestures of different sizes. Finally, compare the proposed model with the other popular CNN models, which included Alexnet, GoogLeNet, and NIN to show the improved effect.
10.3.1 CNN with Input RGB-D Images In this section we test five different-structured CNN models with input RGB-D images; their 7 × 7 convolution layer consists of three 3 × 3 3 convolution layers, and the 5 × 5 convolution layer consists of two 3 × 3 convolution layers. The first model is called DLHGR-0; it is a traditional single path-input CNN. We use color and depth data separately to train two CNN models, called DLHGR-0-color and DLHGR-0-depth. The second model is called DLHGR-1; it concatenates RGB and D information directly as the input for a single CNN architecture. The third model is called DLHGR-2; it has two independent CNN architectures with input color images and depth maps, and concatenates their results until the end. The fourth model is called DLHGR-3; it is similar to the proposed model but without multi-resolution features. The last model is called DLHGR-4; it not only has two independent CNN architectures to read color images and depth maps, but also concatenates those low-feature maps as the input to a created third CNN architecture. We got the testing results as shown in Table 10.2. Loss is the last Softmax-Loss value of the training model, which can be used to judge the convergence of training model. Mean average precision (mAP) is the average of the maximum precisions at different recall values. The models with two CNN architectures work better than the single CNN architecture. However, DLHGR-4 has three CNN architectures that is less effective due to the mutual interference between them. DLHGR-2 and DLHGR-3 have the same performance in mAP, but DLHGR-3 has fewer parameters, we regard DLHGR-3 is the best solution for CNN with input RGB-D images.
10 Loose Hand Gesture Recognition Using CNN
93
Table 10.2 The testing results of four CNN models with input RGB-D images Loss
mAP
Iteration
DLHGR-0-color
0.0486200
0.984889
100000
Param (M) 5.08
DLHGR-0-depth
0.0350199
0.989333
100000
5.08
DLHGR-1
0.0365404
0.992889
100000
5.08
DLHGR-2
0.0158716
0.995555
100000
9.60
DLHGR-3
0.0094135
0.995555
100000
5.67
DLHGR-4
0.0324673
0.988444
100000
14.39
Table 10.3 The testing results of three CNN models with multi-resolution features Loss
mAP
Iteration
DLHGR-3
0.00941349
0.995555
100000
Param (M) 5.67
DLHGR-5
0.02644520
0.995555
100000
5.96
DLHGR-6
0.00913981
0.997333
100000
6.90
DLHGR-7
0.00604911
0.996444
100000
10.44
10.3.2 CNN with Multi-resolution Features We here use different levels of multi-resolution features to retrain three alternative CNN models from DLHGR-3. DLHGR-5 has two last levels of multi-resolution features for classification; DLHGR-6 is the desired model, which has last three levels of multi-resolution features; and DLHGR-7 has four last levels of multi-resolution features. The testing results are shown in Table 10.3. DLHGR-3 represents the result without multi-resolution features. Using more multi-resolution features is more helpful for the classification of CNN. However, the improvement in the effect will still converge to an extent. We compare the mAP among DLHGR-3, DLHGR-5, DLHGR-6, and DLHGR-7. DLHGR-6 has the best ability, in which the mAP is up to 0.997333; so it can adapt to the situations that may be encountered in a real environment. Based on the above advantages, so we choice DLHGR-6 as the final proposed model in the study.
10.3.3 Compared with Other Popular CNN Models At last, compare the proposed DLHGR-6 model with the other popular CNN models, which included Alexnet, GoogLeNet, and NIN. The testing results are shown in Table 10.4. Because the traditional models only have a single CNN architecture, we, respectively, train them with color images and depth maps. Training a single CNN architecture model using color images is slightly more effective than using depth
94
C.-M. Chang and D.-C. Tseng
Table 10.4 Compare our DLHGR-6 model with other popular CNN models Loss
mAP
DLHGR-6
0.0091398
0.997333
Iteration 100000
Param 6.90 M
Alexnet (color)
0.0452420
0.987555
1000000
23.38 M
Alexnet (depth)
0.0383025
0.987555
1000000
23.38 M
GoogLeNet (color)
0.0361086
0.990222
2000000
4.72 M
GoogLeNet (depth)
0.0751908
0.988444
2000000
4.72 M
NIN (color)
0.6304840
0.808000
300000
968.14 K
NIN (depth)
0.7085650
0.797333
300000
968.14 K
images. Alexnet needs the most amount of parameters. Goolenet provides a nice mAP with few parameters, but more iterations are needed during training. NIN is the simplest network architecture, but it can not train to a good convergence result for the gestural database. When the recognition rate is already close to 100%, it will be more difficult to improve the CNN model. The experimental results show through our model has better recognition ability than the other popular CNN models and uses a less amount of parameters. Therefore, the improved method proposed in this study can further improve the effect of CNN.
10.4 Conclusions We propose a LHGR system with a novel CNN model, which is a two input path model, and get a mAP to 0.997333. The two paths receive color images and depth images, and learn the low-resolution features, respectively. Then concatenate the lowresolution features to learn the 4-channel high-resolution features. The proposed CNN model can better and more efficiently use color images and depth images, and needs less amount of parameters. Confirming by experiments, the two CNN architecture is better than the single architecture, but the number of architectures cannot increase endlessly. Additionally, the proposed CNN model uses multi-resolution features in the classification, it has a better effect on the small size input (means the smaller gesture on a color image or the farther gesture on a depth map). So the proposed CNN model has the good recognition in different-size images, it can adapt to the situations that may be encountered in a real environment. In the case of practical application, because there are two independent input paths, if one input path is missing, the system can still use another input path to classify gestures. Compared with the proposed LHGR method based on relational features, CNNbased method gets a higher mAP than relational feature-based (0.997333 > 0.887641). Compared to the current deep learning framework, we confirmed that the proposed model is better than some traditional popular CNN models (Alexnet, GoogLeNet, and NIN). Based on the multi-usability of the system, we still have many points to
10 Loose Hand Gesture Recognition Using CNN
95
improve. The currently trained CNN model can only recognize 16 kinds of gestures. In the future, we can add more classes, even included faces, pedestrians, cars, cats, dogs, etc. Let the CNN model have the ability to identify more classes of objects. In addition to using CNN models for classification, we can also try to detect the hand gestures with region-based CNNs, such as Faster R-CNN and SSD. Thus the system can remove its dependence on the human skeleton. The gesture grammar we have defined at the present is limited to static gestures. However, dynamic gestures are more diverse and closer to the meaning of sign language than static gestures. We can learn the dynamic gestures from the video sequence using 3D-CNNs. Acknowledgements This work was supported in part by the Ministry of Science and Technology, Taiwan under the grant of the research project MOST 106-2221-E-008-097-MY2.
References 1. Wachs, J.P., Kölsch, M., Stern, H., Edan, Y.: Vision-based hand-gesture applications. ACM Commun. 54(2), 60–71 (2011) 2. Guo, J.-M., Liu, Y.-F., Chang, C.-H., Nguyen, H.-S.: Improved hand tracking system. IEEE Trans. Circuits Syst. Video Technol. 22(5), 693–701 (2012) 3. Malima, A., Ozgur, E., Cetin, M.: A fast algorithm for vision-based hand gesture recognition for robot control. In: Proceedings of the IEEE 14th Signal Processing and Communications Applications, Antalya, Turkey, pp. 1–4 (2006) 4. Pławiak, P., Sosnicki, T., Niedzwiecki, M., Tabor, Z., Rzecki, K.: Hand body language gesture recognition based on signals from specialized glove and machine learning algorithms. IEEE Trans. Ind. Inform. 12, 1104–1113 (2016) 5. Tubaiz, N., Shanableh, T., Assaleh, K.: Glove-based continuous Arabic sign language recognition in user-dependent mode. IEEE Trans. Hum.-Mach. Syst. 45, 526–533 (2015) 6. Saxena, A., Chung, S.H., Ng, A.Y.: 3-D depth reconstruction from a single still image. Int. J. Comput. Vis. 76, 53–69 (2008) 7. Zhang, Y., Xiong, Z., Yang, Z., Wu, F.: Real-time scalable depth sensing with hybrid structured light illumination. IEEE Trans. Image Process. 23(1), 97–109 (2014) 8. Foix, S., Alenya, G., Torras, C.: Lock-in Time-of-Flight (ToF) cameras: a survey. IEEE Sens. J. 11, 1917–1926 (2011) 9. Sarbolandi, H., Lefloch, D., Kolb, A.: Kinect range sensing: structured-light versus time-offlight Kinect. Comput. Vis. Image Underst. 139, 1–20 (2015) 10. Mutto, C.D., Zanuttigh, P., Cortelazzo, G.M.: TOF cameras and stereo systems: comparison and data fusion. In: Remondino, F., Stoppa, D. (eds.) TOF Range-Imaging Cameras, Springer, Berlin, Heidelberg, Germany, pp. 177–202 (2013) 11. Hong, J., Kim, E.S., Lee, H.-J.: Rotation-invariant hand posture classification with a convexity defect histogram. In: Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, Seoul, South Korea, pp. 774–777 (2012) 12. Erol, A., Bebis, G., Nicolescu, M., Boyle, R.D., Twombly, X.: A review on vision-based full DOF hand motion estimation. In: Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, pp. 75–82 (2005) 13. Oikonomidis, I., Kyriazis, N., Argyros, A. A.: Efficient model-based 3D tracking of hand articulations using Kinect. In: Proceedings of the 22nd British Machine Vision Conference, Dundee, UK, pp. 1–11 (2011)
96
C.-M. Chang and D.-C. Tseng
14. Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, pp. 1–7 (2015) 15. Ge, L., Liang, H., Yuan, J., Thalmann, D.: Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, pp. 3593–3601 (2016) 16. Han, M., Chen, J., Li, L., Chang, Y.: Visual hand gesture recognition with convolution neural network. In: Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Shanghai, China, pp. 287–291 (2016) 17. Barros, P., Magg, S., Weber, C., Wermter, S.: A multichannel convolutional neural network for hand posture recognition. In: Proceedings of 24th International Conference on Artificial Neural Networks, Hamburg, Germany, pp. 403–410 (2014) 18. Chai, X., Liu, Z., Yin, F., Liu, Z., Chen, X.: Two streams recurrent neural networks for largescale continuous gesture recognition. In: Proceedings of 23rd International Conference on Pattern Recognition, Cancun, Mexico, pp. 31–36 (2016) 19. Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. In: Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, pp. 1284–1293 (2017) 20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Netherlands, pp. 21–37 (2016) 21. Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of the International Conference on Learning Representations, Banff, Canada, pp. 1–10 (2014) 22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International Conference for Learning Representations, San Diego, CA, pp. 1–14 (2015)
Chapter 11
Multi-focus Image Fusion Method Based on NSST and BP Neural Network Xiaosheng Huang, Ruwei Zi, and Feilong Li
Abstract In this paper, a multi-focus image fusion method based on nonsubsampled shearlet transform (NSST) and backpropagation (BP) neural network is proposed. Firstly, the NSST is performed on each source image and the NSST contrast of the image is calculated according to the high-frequency and low-frequency coefficients. Then the NSST contrast of the partial region of the source image is selected as the training sample for the feedforward neural network which is updated by the back propagation method. Thirdly, the focus and de-focus area is classified by the trained neural network, and the focus area is fused. Then consistency check is performed on the specific pixels determined by the neural network. The simulation results show that the image fusion effect of this method has better advantages in both subjective and objective evaluation.
11.1 Introduction Multi-focus image fusion refers to two or more images fusion with different focusing and sharpness obtained by changing the focal length of a scene with the same imaging device. Burt proposed the Laplace pyramid decomposition algorithm in 1983 [1]. Later, some improved fusion methods based on ratio low-pass pyramids and gradient pyramids were proposed [2]. In [3], an image fusion method based on artificial neural network is proposed, whose main idea is to segment the source multi-focus image and use the neural network to select the focus area for fusion. In recent years, many multi-scale transformation (MST)-based image fusion methods have been proposed. In [4], it proposed a multi-focus image fusion method based on neural network and contrast. The source image is subjected to wavelet transform, and the wavelet contrast is input to the neural network to select the focal region to get the fused image. As wavelet transform can only capture limited orientation information, in 2007, NSST is proposed by Labate [5]. NSST is an optimal approximation compared with other X. Huang · R. Zi (B) · F. Li School of Software, East China Jiaotong University, Nanchang, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_11
97
98
X. Huang et al.
multi-scale transformation methods. Literature [6] proposed a new method of visible light and infrared image fusion based on NSST-SF (space frequency) and pulse couple neural network(PCNN), which upgrades the traditional methods and combines NSST and PCNN in the fusion algorithm to improve the quality of the image greatly. But some methods need large storage space, long processing time, and large amount of modeling calculation in multi-scale. And some cannot accurately position the focus area according to the characteristics of multi-focus images. In this paper, we propose a fusion method based on NSST and BP neural network which can accurately locate the focus area of multi-focus image. The fusion results indicate that this method can preserve richer image information and improve the contrast and sharpness of the image.
11.2 Image Fusion Method 11.2.1 NSST Transformation The images can be divided into two parts by NSST: the low-frequency domain and the high-frequency domain based on the Laplacian pyramid filter bank in this paper. The low-frequency region represents the region R, and when the next layer is decomposed, the decomposition operation is continued only for the low-frequency component. The high-frequency domain is multi-directional decomposed by the improved shear wave filter bank. That is, the tapered regions C h and Cv are further divided more precisely in the frequency domain in multiple directions [7]. In this paper, the number of NSST decomposition layers is set to 4, and the pyramid filters selected for decomposition are all set to “maxflat” to get 32, 32, 16, 16 directional subgraphs of each layer. Image I can be decomposed into subgraph {L} and high-frequency the low-frequency direction coefficient subgraph D k,r , where k represents the number of layers of the decomposition scale, and r represents the r direction, and the size of each subgraph is the same as the original image.
11.2.2 Feature Extraction and NSST Contrast The images A and B are subjected to NSST transformation and decomposed into a {L A , L B } and a series of high-frequency direction set of low-frequency subgraphs k,r k,r subgraphs D A , D B . The formula is as follows: C k,r A =
D k,r D k,r A , C Bk,r = B LA LB
(11.1)
11 Multi-focus Image Fusion Method Based on NSST …
99
Fig. 11.1 Neural network structure
where D k,r A represents the r th high-frequency image of the kth layer of the source image A decomposed by NSST, and L A represents its low-frequency image. C k,r A represents the NSST contrast of the source image A, C Bk,r represents the contrast of image B and the contrast is used as a feature input neural network for training.
11.2.3 BP Neural Network Training In this paper, we use a three-layer network structure to characterize the mapping between input and output (see Fig. 11.1). Each layer has a corresponding contrast feature, so the number of NSST decomposition layers is set to 4, and the neurons in the input layer of the neural network is the same. The number of neurons in the output layer and the hidden layer of the neural network is 1 and 9, respectively, which are used to classify the focusing areas, so that the network training effect is the best. The hidden layer activation function adopts the Tanh function (Eq. 11.2). The gradient is reduced with an impulse and the adaptive learning rate is used to optimize the weight. T anh(x) =
e x − e−x e x + e−x
(11.2)
11.2.4 Algorithm Framework and Steps The method overall framework is shown in Fig. 11.2 and includes the following main steps: Step 1: Perform four-layer non-subsampled shearlet wave decomposition on source images A and B, and calculate NSST contrast sequences in different directions according to Eq. (11.1); Step 2: Select several pairs of experimental areas at the corresponding positions of the training images A and B, that is, the focus region in one image is out of focus
100
X. Huang et al.
Fig. 11.2 The fusion algorithm framework
in another image. Establish the three-layer neural network structure constructed in Fig. 11.2, which updates the correction weight with the gradient of the impulse and the adaptive learning rate. In the training phase of the neural network, the direction contrast difference of the two training image experimental areas is used as the input feature vector Tin (i) and the output target vector Tout (i) to train the network, and if the output is judged to be the focus area, as 1, otherwise 0; where, Ai and Bi represent pixels in the experimental areas of A and B; 1,r 2,r 2,r 3,r 3,r 4,r 4,r − C , C − C , C − C , C − C Tin(i) = C 1,r Ai Bi Ai Bi Ai Bi Ai Bi Tout (i) =
1 i f Ai is clearer than Bi 0 otherwise
(11.3) (11.4)
Step 3: The images to be fused are subjected to Step 1 to obtain the contrast, then classified by the trained network. The ith pixel of the image is composed as follows: imageF(i) =
imageA(i) i f Tout (i) = 1 imageB(i) otherwise
(11.5)
Step 4: Perform a consistency check on the fusion result of Step 3. This operation is performed using a 3 × 3 neighborhood window filter.
11 Multi-focus Image Fusion Method Based on NSST …
101
11.3 Experiments Results The experiment is implemented on the MATLAB R2016a platform, and three matched clock gray-scale images (size is 512 × 512) (see Fig. 11.3) are used as training sets of the neural network. We compare the other three fusion methods to verify the effectiveness and feasibility of the proposed method: Method 1 uses NSST transform for the two images, averaging at low frequency and maximizing at high frequency (mean_NSST). Method 2 [4] uses an image fusion technique based on neural network and contrast, which get the fused image with BP neural network and inverse wavelet transform (WT_ BPNN). Method 3 [8] proposed a multi-focus method based on NSST and PCNN. The number of layers in this algorithm is also taken as 4, and the improved SF is used as the fusion rule of the input of PCNN to get the fused image (NSST_SF_PCNN). The experimental results are shown in Fig. 11.4. Figure 11.4a, b is regarded as the image to be fused in the experiment, and Fig. 11.4c–f shows the experimental results. From the subjective visual analysis, the fused image of Method 1 in Fig. 11.4c has a low contrast which is relatively ambiguous. The Method 2 in Fig. 11.4d is improved subjectively with respect to the Method 1, but the image has ripple interference. The fusion method based on NSST and adaptive PCNN has remarkable fusion effect (see Fig. 11.4e); however, the fused image shows that aliasing occurs in the right half, and part of the image is blurred. The image (see Fig. 11.4f) fused by the proposed method is subjective clear, gray moderate, and contains more abundant image information. After the subjective evaluation of the fusion images, we introduce the commonly used objective evaluation indicators, including, information entropy (EN), mutual information (MI), standard deviation (SD), spatial frequency (SF), and average gradient (AVG) to analyze the experimental results of different fusion methods. The experimental results are obtained by averaging the results based on 20 experimental data. As it is shown in Table 11.1, the average gradient of the images which fused with the method of this paper is better than that of Method 1 and 2, although it is slightly lower than the value of Method 3, the fusion effect of the method is better than Method 3 from the subjective visual judgment. Furthermore, the fused images in this paper are superior to the other three methods in terms of EN, SD,
Fig. 11.3 Training image: a Left focus image, b Right focus image, c Fused image
102
X. Huang et al.
Fig. 11.4 Image fusion experiment results: a Left-focused, b Right-focused, c Method 1, d Method 2, e Method 3, f The proposed method
Table 11.1 Comparison of different method results Method
SD
EN
SF
MI
AVG
mean_NSST
43.7617
7.2521
24.2057
4.8393
11.0857
WT_ BPNN
44.3598
6.3427
20.4523
3.6021
10.4524
NSST_SF_PCNN
45.2746
7.3427
23.1254
5.1583
11.9235
The proposed method
45.2918
7.4233
24.9024
6.8186
11.8614
SF, and MI. Among them, MI of the proposed method is far better than the other three methods. That is to say, the image fused by the new method is rich in detail information and contains more information on the source images, which makes the fused images quality higher.
11.4 Conclusions In this paper, a method of fusion image based on NSST and BP neural network is proposed. First, the NSST contrast of the source image as the training sample for the feedforward neural network is calculated which is updated by the backpropagation
11 Multi-focus Image Fusion Method Based on NSST …
103
method. When the network reaches its optimum, the NSST contrast feature of the images to be fused input into the trained network to obtain the focus area of the image, then we can get the fusion image reconstructed by the pixels. As the experimental results show, the new method can achieve a good fusion effect for multi-focus images, preserve the source image information, and have better fusion performance. In this paper, the method is more intuitive and simple for image fusion in multi-scale domain and has achieved good results in both objective and subjective evaluation. Acknowledgements This work was supported by the National Natural Science Foundation of China (61763011) and the Science and Technology Program of Educational Department of Jiangxi province (GJJ150526).
References 1. Burt, P., Adelson, E.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983) 2. Matsopoulos, G.K., Marshall, S., Brunt, J.N.: Multi-resolution morphological fusion of MR and CT images of the human brain. Image Signal Processing 141(3), 137–142 (1994) 3. Li, S.T., Kwok, J.T., Wang, Y.N.: Multi-focus Image Fusion Using Artificial Neural Network. Pattern Recogn. Lett. 23(8), 985–997 (2002) 4. Wang, Z.F., Fan, G.L., Wang, N.C.: Multi-focus image fusion based on contrast and neural network. Computer Applications 7(7), 1590–1602 (2006) 5. Easley, G., Labate, D., Lim, W.Q.: Sparse directional image representations using the discrete shearlet transform 5(1), 25–46 (2008) 6. Wei, W., Kong, L., Zhang, Y.L.: Novel fusion method for visible light and infrared images based on NSST–SF–PCNN. Infrared Phys. Technol. 4(65), 103–112 (2016) 7. Kutyniok, G., Lim, W., Reisenhofer, R.: ShearLab 3D: faithful digital shearlet transforms based on compactly supported shearlets. ACM T Math Soft 42(1), 1–42 (2014) 8. Yang, L.S., Wang, L., Guo, Q.: Multi-focus image fusion method based on NSST and adaptive PCNN. Computer Science 45(12), 217–222 (2018)
Chapter 12
Classification of 3D Remote Sensing Images Through Dimensionality Reduction and Semantic Segmentation Network Daoji Li, Chuan Zhao, Donghang Yu, Baoming Zhang, and Zhou Yuan Abstract Recently, methods based on semantic segmentation networks have achieved state-of-the-art performance in the classification of 2D remote sensing images. Three-dimensional remote sensing information is the integrated remote sensing information that combines 3D location and remote sensing spectral data, allowing it to obtain both spectral data and spatial information in three dimensions. It can thus be used in the classification of 3D remote sensing images because of the disorder of 3D remote sensing image. In this study, a simple classification method based on dimension reduction was proposed. The 3D remote sensing image was divided into 2D image and DSM data. Then, the 2D remote sensing image was classified by the network and the DSM data was used for post-processing. The principle was experimentally verified and found to be simple but effective. DSM data can be used to modify false ground points and it was found that the accuracy could be improved by up to 0.2%.
12.1 Introduction As we all know that the remote sensing information collected by a satellite platform is drastically growing and the multitype data have prompted the high dimensionality of the remote sensing information [1]. Therefore, the application of three-dimensional (3D) remote sensing data came into being. At present, three-dimensional remote sensing information has widespread applications in terrain analysis and reconstruction. While most traditional remote sensing technologies can only be used to obtain spectral information, 3D remote sensing information is the integrated remote sensing information that combines 3D location and remote sensing spectral data, allowing it to obtain both spectral data and spatial information in three dimensions.
D. Li (B) · C. Zhao · D. Yu · B. Zhang · Z. Yuan Information Engineering University, Zhengzhou, Henan, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_12
105
106
D. Li et al.
Unlike from traditional remote sensing, 3D remote sensing data are collected from the imager attached on the UAV. The 3D lidar point cloud data are the common 3D remote sensing data, which are usually used in building reconstruction or vegetation reconstruction. Li et al. proposed a new framework for the automatic creation of compact building models from aerial LiDAR point clouds, each point is known to belong to the building. This approach utilizes Triangulated Irregular Network (TIN), and the performance is validated by extensively comparing the results to those obtained by state-of-the-art techniques [2]. Wang et al. proposed a method combining gap fraction model and using terrestrial LIDAR to retrieve leaf area. The method achieves satisfactory agreement with the real structure in visual and quantitative evaluation [3]. Supervised classification procedure has always become quite popular among the remote sensing field because the operator can detect errors and correct them [4]. Deep learning is the fastest growing supervised classification method. In recent years, deep learning has been developed rapidly in many fields with due to the strong learning ability. The emergence of Convolutional Neural Network (CNN) has led to the development of image processing technology by leaps and bounds, especially in semantic segmentation [5–7] and object detection [8, 9]. CNN performs better in the field of two-dimensional image processing. Many excellent classification networks have been presented, such as Encnet [10], U-Net [11], RefineNet [12], DANet[13], and Deeplabv3+ [14]. Three-dimensional remote sensing images classification is one of the basic problems in scene understanding for 3D digital cities. It is not easy to classify 3D remote sensing images under complex urban environments, because the 3D data are usually noisy, sparse, and disorganized [15]. Point cloud classification can be realized by various methods, such as region growing [16, 17], energy minimization [18], and machine learning [19, 20]. Rabbani et al. proposed an effective method based on curvature and other higher level derivatives [16]. Niemeyer et al. proposed a novel hierarchical approach for the classification of airborne 3D lidar points, in which the spatial and semantic context is incorporated via a two-layer conditional random field (CRF) [21]. Compared with the pure point-based classification method, the framework improves the point-based classification method by 2.3% and the segment-based classification method by 3.0%. Zhang et al. utilized Support Vector Machine (SVM) to classify the segments of 3D point clouds. Experiments suggest that this method can classify urban point clouds with high overall classification accuracy and Kappa coefficient. In order to take advantage of the excellent image classification network above, the 3D remote sensing data are converted into 2D data and Digital Surface Model (DSM) data. Then the excellent classification networks can be used to classify the 2D remote sensing image. After that, the DSM data can be used in the post-processing of the predicted label from 2D remote sensing images. In order to verify the effectiveness of the proposed method, the datasets from 2D Semantic Labeling Contest (Vaihingen) of ISPRS are used. The remainder of this paper is organized as follows. Section 12.2 elaborates the proposed method, including
12 Classification of 3D Remote Sensing Images …
107
the rasterization, normalization, network, and post-processing. The experiments are shown in Sect. 12.3, we conclude this work in Sect. 12.4.
12.2 Proposed Method 12.2.1 The Whole Approach Framework The overall classification framework of the proposed method is shown in Fig. 12.1. First, 3D remote sensing data are converted into 2D spectral image and DSM data. Then, the 2D spectral image is predicted by the semantic segmentation network. The DSM data are normalized to 0–255. In order to eliminate the impact of topographic relief, the top hat transformation is utilized in DSM data processing. Then, the Normalized Digital Surface Model (NDSM) data are obtained. Finally, the NDSM data are used for the post-processing of the classification results, which are predicted by the network. 3D Remote Sensing Image
Rasterize 2D remote sensing
Classification Network
DSM Normalization Top-Hat transformation
Convolution Neutral Network
NDSM
Inference
Post-processing
Classification for 2D
Classification for 3D point cloud
Fig. 12.1 The whole framework of the proposed method
108
D. Li et al.
12.2.2 Rasterization There are many ways to rasterize laser point clouds [22]. In this paper, the gridding method is applied to create the raster file points. First, the values of each pixel are calculated using spatial points located in the raster dataset, as shown in Fig. 12.2. Place a grid on the geospatial lidar data. Each cell in the grid has the same spatial dimension and represents a specific area on the ground. For example, if the resolution of the raster from the lidar data is expected to be 1 m, the 1 m by 1 m grid should be set over the lidar data points. In this study, the size of the grid is set as 0.5 m by 0.5 m. In each cell, we use the lidar points within the cell to calculate the values applied to the cell. The simplest method is to take the mean value, maximum value, or minimum height value of all lidar points in the particular cell, corresponding to the three methods in Fig. 12.2. At the end of the class forecast, all the points have the same class as the point selected. However, there may be cells in the raster that do not contain any lidar points. These cells will have a “no data” value and the nearest points are used instead. Doing so makes no difference to the results because there are no points in these cells to be assigned. Cell selected
Point Selected
Class Predicted
Assignment
mean
max
0.5m
min
Fig. 12.2 The diagram of point cloud rasterization, three methods (mean, max, min) for rasterization
12 Classification of 3D Remote Sensing Images …
109
12.2.3 Normalization and Hat Transform In order to eliminate the impact of data differences, DSM data should be first normalized. After that, the improved top hat transformation is adopted [23]. The transformation is based on the principle of morphological filtering and multiscale filtering. Unlike the morphological filtering, there are many different structuring elements for Top-Hat transformation. The specific steps are as follows: (1) The size of structural element is set in the range of (minSize, maxSize), in this paper, the range is set as (5, 50). (2) N times of top-hat transformation are performed on the data, where N = maxSize-minSize. (3) Count the number of times that each pixel in DSM is recorded as a ground point. (4) Count the number of point which is marked as ground point for i times. (5) Calculate the frequency pi of the point which is marked as ground point for i times, where pi = qi /P, where the P is the pixel number of DSM. (6) Calculate the threshold which is used to distinct the ground points and object N points, the threshold = i pi , which is a weighted average. i=0
(7) The surface simulation and interpolation are carried out on the ground points. Then, the ground points or the ground level can be obtained. The height of the object on the ground can be obtained by making a difference with DSM data.
12.2.4 DSM Post-processing The predicted results indicate two common failures after evaluation. These are illustrated in Fig. 12.3: (1) False ground, which should be the top-hat shown with the red circle; and (2) false top-hat, which should be the ground shown with the red rectangle. Therefore, we decided to use NDSM data to modify ground points. First, a threshold to distinguish between the ground and objects should be determined. Then, each pixel predicted as ground should be retested. For the pixels whose NDSM is less than the given threshold, the same procedure should be performed. The corresponding post-processing scheme upon following this process is shown in Fig. 12.4.
12.2.5 Network for Classification As aforementioned, there are many excellent deep learning networks in the domain of semantic segmentation. The selection of a classification network is the most important step in the whole scheme.
110
D. Li et al.
2D image
Predicted result
Label map
Fig. 12.3 Two common failures: false ground pixels (shown in the red circle) and false top-hat pixels (shown in the red rectangle) NDSM data Pixels below the threshold
False ground or building pixels checked by NDSM Categorical probability map
Ground pixels
Reclassified by the rest of the category probability matrix
Fig. 12.4 Scheme for post-processing
In this study, Deeplabv3+ was selected as the classification network. This network was proposed by Liang-Chieh Chen in Google Inc and is regarded as the best semantic segmentation network. Deeplabv3+ utilizes the encoder-decoder structure with Atrous Spatial Pyramid Pooling (ASPP), which can obtain multiscale contextual information. This model has achieved 89% and 82.1% accuracy with the PASCAL VOC2012 and Cityscapes datasets, respectively. Therefore, Deeplabv3+ is the best selection for 3D remote sensing image classification. The structure of Deeplabv3+ is shown in Fig. 12.5.
12.3 Experiment 12.3.1 Training Deeplabv3+ There were a total of 16 images with labels in the ISPRS Vaihingen datasets. To train the Deeplabv3+ network, the datasets were divided into three parts: training datasets, validation datasets, and test datasets. These are shown in Fig. 12.6.
12 Classification of 3D Remote Sensing Images …
111
Fig. 12.5 Structure of Deeplabv3+
training datasets
validation datasets
test datasets
Fig. 12.6 Training datasets, validation datasets, and test datasets for Deeplabv3+
We began training the Deeplabv3+ on an RTX 2080ti GPU with an 11 GB onboard memory. To reduce model overfitting, data augmentation was applied. The augmentation operation included rotation, gamma correction, and salt and pepper noise. Finally, we obtained 10000 patches for training. In addition, we also obtained 4000 patches for validation. The size of a patch was 128 by 128.
112
D. Li et al.
Table 12.1 The classification results for the test datasets Class
IoU
Acc
Precision
Recall
F1
Kappa
Ground
0.7447
0.9067
0.8165
0.8944
0.8537
0.7778
Water
0.2288
0.9943
0.3630
0.3822
0.3723
High tree
0.7096
0.9297
0.7881
0.8769
0.8301
Building
0.8111
0.9403
0.8961
0.8954
0.8957
Car
0.5983
0.9934
0.8344
0.6790
0.7487
Objects
0.5619
0.9043
0.8385
0.6301
0.7195
The Lovasz-Softmax loss was selected as the loss function [24]. The minimization of loss was done using stochastic gradient descent with mini-batches of size 16, with momentum set to 0.9, an initial learning rate of 0.001, and an L2 weight decay of 0.0005. The learning rate dropped by 0.5 per 10 epochs. We initialized the weights using the Xavier algorithm. The proposed model was implemented using Keras. Table 12.1 shows the classification results for the test datasets. It can be seen that the images are classified almost perfectly by Deeplabv3+.
12.3.2 Post-processing by NDSM After prediction using 2D image data by Deeplabv3+, the results could be modified by NDSM. Ground pixels in NDSM that are higher than the given threshold should be reclassified using the remaining category channels. The reclassification method is shown in Fig. 12.4. However, the threshold needs to be determined in advance. To find the optimal threshold for correction, a threshold-OA curve was drawn, which is shown in Fig. 12.7. From this curve, we can see that when the threshold is 28, the results of classification are the most improved. The accuracy was improved by 0.2%.
12.3.3 Visualization of 3D Remote Sensing Images Figure 12.8 shows the post-processing and initial results. The false ground pixels were modified by the NDSM data, which can be seen from the red circle area. The classification results for the 3D remote sensing images are shown in Fig. 12.9. The classification result after post-processing is closer to the true label. The effectiveness of the proposed method was verified by this experiment. Thus, it is feasible to convert 3D remote sensing images to 2D images for classification, and NDSM data can be obtained from 3D remote sensing images and used for post-processing.
12 Classification of 3D Remote Sensing Images …
113
0.8043
28
Fig. 12.7 The post-processing accuracies for different thresholds; the accuracy is highest when the threshold of NDSM is set as 28
2D image
Label
Initial result
Post processing result
Fig. 12.8 Comparison between post-processing result and initial result
Initial clssification
After post-processing
Fig. 12.9 Classification results for 3D remote sensing images
Label
114
D. Li et al.
12.4 Conclusion In this paper, a new method for 3D remote sensing image classification is proposed. For this, the 3D remote sensing images are divided into 2D remote sensing images and DSM data. The 2D images can be classified using excellent semantic segmentation networks, such as Deeplabv3+, and the DSM data can be used in post-processing. The accuracy of classification can be improved by about 0.2% after post-processing. However, in this study, we only corrected the false ground points in the proposed method. Therefore, the OA is not significantly improved. This is preliminary research. We believe that there will be a better way to utilize DSM data for 3D remote sensing image classification in the future.
References 1. Dey, N., Bhatt, C., Ashour, A.S.: Big Data for Remote Sensing: Visualization, Analysis and Interpretation. Springer, Cham (2018) 2. Li, M., Rottensteiner, F., Heipke, C.: Modelling of buildings from aerial LiDAR point clouds using TINs and label maps. ISPRS J. Photogramm. Remote Sens. 154, 127–138 (2019) 3. Xiangyu, W., Donghui, X., Guangjian, Y., Wuming, Z., Yan, W., Yiming, C.: 3D reconstruction of a single tree from terrestrial LiDAR data. In: IEEE Geoscience and Remote Sensing Symposium, pp. 796–799. IEEE, Quebec City, Canada (2014) 4. Nath, S.S., Mishra, G., Kar, J., Chakraborty, S., Dey, N.: A survey of image classification methods and techniques. In: IEEE, International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), pp. 554–557 (2014) 5. Zhao, W., Du, S., Wang, Q., Emery, W.J.: Contextually guided very-high-resolution imagery classification with semantic segments. ISPRS J. Photogramm. Remote Sens. 132, 48–60 (2017) 6. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, A.H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818. ECCV, Munich, Germany (2018) 7. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890. IEEE, Honolulu, Hawaii (2017) 8. Li., R., Wang, Y., Liang, F., Qin, H., Yan, J., Fan, F.: Fully quantized network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2810–2819. IEEE, Long Beach, California, USA (2019) 9. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 821–830. IEEE, Long Beach, California, USA (2019) 10. Zhang, H., Dana, K., Shi, J., Zhang, Z., Wang, X., Tyagi, A., Agarwal, A.: Context encoding for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7151–7160. IEEE, Salt Lake City, USA (2018) 11. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241. Springer, Cham (2015) 12. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for highresolution semantic segmentation: In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1925–1934. IEEE, Honolulu, Hawaii (2017)
12 Classification of 3D Remote Sensing Images …
115
13. Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146–3154. IEEE, California, USA (2019) 14. Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818. ECCV, Munich, Germany (2018) 15. Nguyen, A., Le, B.: 3D point cloud segmentation: a survey. In: Proceedings of the IEEE Conference on Robotics, Automation and Mechatronics (RAM), pp. 225–230. IEEE, Manila, Philippines (2013) 16. Rabbani, T., van den Heuvel, F.A., Vosselman, G.: Segmentation of point clouds using smoothness constraint. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 36, 248–253 (2006) 17. Awwad, T.M., Zhu, Q., Du, Z., Zhang, Y.: An improved segmentation approach for planar surfaces from unstructured 3D point clouds. Photogramm. Record 25, 5–23 (2010) 18. Yan, J., Shan, J., Jiang, W.: A global optimization approach to roof segmentation from airborne lidar point clouds. ISPRS J. Photogramm. Remote Sens. 94, 183–193 (2014) 19. Zhang, J., Lin, X., Ning, X.: SVM-Based classification of segmented airborne LiDAR point clouds in urban areas. Remote Sens. 5, 3749–3775 (2013) 20. Ghamisi, P., Höfle, B.: LiDAR data classification using extinction profiles and a composite kernel support vector machine. IEEE Geosci. Remote Sens. Lett. 14, 659–663 (2017) 21. Niemeyer, J., Rottensteiner, F., Soergel, U., Heipke, C.: Hierarchical higher order crf for the classification of airborne lidar point clouds in urban areas. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 41, 655–662 (2016) 22. Höfle, B., Mücke, W., Dutter, M., Rutzinger, M., Dorninger, P.: Detection of building regions using airborne LiDAR–A new combination of raster and point cloud based GIS methods. In: GI-Forum 2009-International Conference on Applied Geoinformatics, accepted. GI-Forum, Salzberg, Austria (2009) 23. He, M., Cheng, Y., Qiu, L., Zhao, Z.: Algorithm of building extraction in urban area based on improved Top-hat transformations and LBP elevation texture. Acta Geodaetica Cartogr. Sin. 46(9), 1116–1122 (2017) 24. Berman, M., Rannen, T. A., Blaschko, M. B.: The Lovász-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4413–4421. IEEE, Salt Lake City, USA (2018)
Chapter 13
Research on Deep Learning Algorithm and Application Based on Convolutional Neural Network Mei Guo, Min Xiao, and Fang Yu
Abstract The field of artificial intelligence has developed rapidly this year, and new high-tech companies with their main business have sprung up. After years of theoretical knowledge accumulation and computer hardware equipment upgrade, deep learning began to show its talents in the field of artificial intelligence such as computer vision and voice processing. Deep learning is mainly based on multi-layer neural network to imitate, learn, and apply images, texts, and voices. As one of the deep learning algorithms, convolutional neural networks are favored by researchers in related fields because of their local connections and the advantages of weight sharing. On the basis of expounding the deep learning algorithm of convolutional neural networks at home and abroad, this paper analyzes and summarizes the application of deep learning algorithm by convolutional neural networks and prospects its development trend, hoping to engage related convolutional neural networks. Provided a reference for researchers.
13.1 Introduction With the rapid development of artificial intelligence, deep learning has gradually entered the line of sight of everyone. Deep learning mainly refers to the use of machine learning algorithms on multi-layer neural networks to solve various recognition and judgment problems such as images, voices, and texts. With the continuous improvement of the theory of deep education, the continuous upgrading of computer equipment and the efforts of countless researchers, deep learning, a complex machine learning algorithm, has been used in computer vision such as image recognition, text recognition, speech recognition, and natural language processing. The field shows great potential. Machines based on deep learning can imitate human activities, and thus fundamentally promote the rapid development of artificial intelligence. There M. Guo · M. Xiao (B) · F. Yu College of Software and Communication Engineering, Xiangnan University, Chenzhou 423000, Hunan, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_13
117
118
M. Guo et al.
are many neural networks related to deep learning, such as feedforward neural networks, radial basis function networks, Boltzmann machines, etc. The most widely known convolutional neural network is one of the feedforward neural networks. The convolutional neural network not only contains convolution calculation, but also has a certain depth structure. It is a widely used deep learning algorithm. For example, Hao et al. use the deep neural network (DNN) to extract advanced features from the advanced features extracted by the deep learning architecture [1]. Huang et al. used convolutional neural networks to identify mesoscale eddies in the Xisha and Nansha Islands [2]. Yao Y et al. added high-level feature attention to Adaptive instance normalization (AdaIN), and solved the problem of texture migration confusion under multiple subject images [3]. Based on the summary of the convolutional neural network structure, this paper will analyze and summarize its application in various industries.
13.2 Convolutional Neural Network Convolutional Nets (CNNs) is a deep-structured feedforward neural network with convolutional computation (Feedforward Neural Networks), which is one of the representative algorithms of deep learning [4]. The convolutional neural network structure is proposed by the biologist’s observation and analysis of the cat’s visual cortex response mode. According to the accepted domain, the cells of the visual cortex can be divided into complex cells and simple cells. The stimulation of complex cells from the precise position of the receptive field can be kept locally, while the simple cells are stimulated to the highest extent from the edge position. response. The development of convolutional neural networks is mainly due to its superior local connections and weight sharing.
13.2.1 Local Connection of Convolutional Neural Networks Convolutional neural networks are different from BP neural networks in that the individual neuron nodes between layers are not all connected. The advantage is that the correlation between the layers and the local spaces between the layers can be utilized as much as possible. The neuron nodes between each adjacent layer can be connected to achieve local connections to learn local features. The typical convolutional neural network layer structure is shown in Fig. 13.1. The convolutional neural network input layer can handle suspect data. Generally, the default four-dimensional tensor input data, that is, the two-dimensional pixel space in a single-channel pixel space, multichannel dimension, diverse In this dimension, normalization needs to be performed before the learning data is input into the convolutional neural network. The convolutional layer generally contains multiple convolution kernels for performing feature extraction on the input data. The deuteration layer performs feature selection and
13 Research on Deep Learning Algorithm and Application …
119
Fig. 13.1 Typical convolutional neural network layer structure
information filtering on the feature map of the convolutional layer output through a predetermined pooling function. The output layer structure and working principle of the convolutional neural network are similar to those of the traditional feedforward neural network. On the image segmentation problem, the input image is classified pixel by pixel, and the classification result of the corresponding position in the pixel space is output in the form of semantic features [5].
13.2.2 Weight Sharing in Convolutional Neural Networks Weight sharing is another feature of convolutional neural networks that can further reduce the amount of parameters based on local connections. The so-called weight sharing is to use the same filter to scan the same position of an input picture. The data collected by the filter is the weight, and the weights are from the same filter to the same picture. The location is scanned so sharing can be achieved. The advantage of weight sharing is that in the process of extracting image features, it is not necessary to fully consider the local features of the image, thereby greatly reducing the parameter amount of the model, and facilitating the optimization of the model later.
13.3 The Overall Structure of the Convolutional Neural Network The convolutional neural network mainly includes the convolutional layer and the pooling layer. As the main module of the convolutional layer, the extraction of the input image features can be effectively realized, and the loss function is minimized by the gradient descent method, thereby adjusting the weight parameters layer by layer. Finally, achieve a certain degree of precision. With the rapid development of convolutional neural networks, the network structure is continuously optimized.
120
M. Guo et al.
In 2012, Krizhevsky’s AlexNet won the ImageNet Large Scale Visual Recognition Challenge [6]. After that, various convolutional neural networks kept refreshing records, such as ZFNet in 2013 and VGGNet in 2014. GoogLeNet [7] and 2015 ResNet [8].
13.4 The Application of Deep Learning Algorithm Based on Convolutional Neural Network The improvement of theoretical knowledge related to deep learning and the upgrading of corresponding computer equipment have provided a good foundation for the development of machine learning. It has attracted the attention of relevant researchers in various countries and regions around the world and has emerged in various parts of the world. A new high-tech company based on deep learning. At present, the main processing objects of machine learning can be divided into computer vision and speech text, wherein the computer vision part can be subdivided into image processing and natural language processing.
13.4.1 Application in the Field of Natural Language Processing Natural language processing is an important processing area for deep learning. Because of the strong ambiguity in human language, such as “I play basketball with classmates,” “I play basketball with NIKE logo,” this also increases the natural language. The difficulty of handling the mainstream method of natural processing before is the statistical model, but deep learning has not attracted attention in many statistical models. The NEC Research Institute of the United States was the first to use deep learning to deal with natural language. Since 2008, it has adopted a multilevel convolution structure to deal with natural language from the perspectives of semantic role labeling and part-of-speech tagging, and has achieved more accurate results. After nearly 10 years of development, deep learning has also made great progress in natural language processing.
13.4.2 Application in the Field of Image Recognition Image recognition processing is the earliest application field of deep learning algorithm, but due to the theoretical knowledge and equipment limitations at that time, it was not until 2012 that a breakthrough with significant influence was achieved. Hinton’s research team won the championship in the ImageNet image classification
13 Research on Deep Learning Algorithm and Application …
121
competition this year, and the accuracy of the second place using the traditional algorithm was 10% higher, which suddenly caused a great uproar in the field of computer vision. This has led to a deep learning boom in the field of image recognition. Since then, Google and Baidu have also applied a deep learning model to publish a refreshing search engine that can query through input images. The convolutional neural network did not receive much attention when it first came out. Later, it benefited from the improvement of the algorithm, that is, in the neural network training, the concept of weight attenuation is used to reduce the weight of the weight, which is gradually favored by researchers. Of course, the upgrade of computer hardware equipment is also an important reason for its great progress. At present, deep learning has been able to recognize ordinary computer images with greater precision, which greatly improves processing efficiency. Although deep learning has been well developed in the field of image recognition, it still faces an important challenge, face recognition.
13.4.3 Application in the Field of Speech Recognition For a long time before, the model adopted by the language recognition system is a mixed Gaussian model, which is a shallow learning network, and cannot fully describe its characteristic state and spatial distribution, so the recognition result has a high error rate. However, after the field of deep learning was introduced into the field of speech recognition in 2009, after a few years of development, its error rate has been reduced from 21.7 to 17.9%. Like deep learning in the field of image processing, deep learning has attracted widespread attention in the field of language recognition. Both Google and Baidu use deep neural network architecture to model language recognition. Although Google is the first company to introduce deep learning to practical applications, in its released products, the neural network architecture is only about 4–5 layers. And Baidu’s neural network architecture has reached nine layers. This allows Baidu’s online products to use more complex network models to better solve problems. Conversely, it can fundamentally train deep learning neural network models with a large amount of complex data.
13.5 The Conclusion As an important branch of machine learning, deep learning has shown great potential in the fields of image recognition, natural language processing, and speech processing. Although the application of deep learning algorithms in various fields is still in the initial research and trial application stage, it is believed that with the corresponding learning theory knowledge and the continuous upgrading of computer hardware equipment, deep learning will make great progress and optimization in the future,
122
M. Guo et al.
and promote relevant further development of the field that will bring positive changes to people’s lives. Acknowledgements This paper is funded by Project of: 1. School level scientific research project of XiangNan Universit, Research on network security situation prediction based on data fusion, (No. 2017XJ16). 2. Chenzhou Municipal Science and Technology Project, Research on Real-time Monitoring System of Intelligent Trash Can, (No. [2018]102).
References 1. Zhang, H., Hao, W.: An attention-based hybrid neural network for document modeling. IEEE Trans. Inf. Syst. 100(6), 1372–1375 (2017) 2. Huang, D.M., Du, Y.L., He, Q.: DeepEddy, a simple deep architecture for mesoscale oceanic eddy detection in SAR images. In: 2017 IEEE 14th International Conference on Networking, Sensing and Control (ICNSC), pp. 673–678. IEEE, Calabria (2017) 3. Yao, Y., Ren, J., Xie, X.: Attention-aware multi-stroke style transfer. CVPR, 101–105 (2019) 4. Liu, Q.F.: Application Research of Convolutional Neural Network in Agricultural Scene, Xinjiang University (2019) 5. Krizhevsky, A., Sutskever, Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst., 1097–1105 (2012) 6. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014) 7. He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer vision and pattern Recognition, pp. 770–778 (2016) 8. Pu, S.: Deep learning algorithm and application analysis based on convolutional neural network. Inf. Comput. 17, 36–37 (2018)
Chapter 14
Object Attribute Recognition Based on Hard Negative Mining and Convolutional Neural Network Ming Lei and Fang Liu
Abstract At present, most of the object attribute recognition methods operate directly on the original map, which is vulnerable to the interference of complex background information, thus affecting the recognition accuracy. In order to improve the accuracy of image recognition and classification, this paper divides object attribute recognition into two coupled subproblems: object location and object classification, establishes CS-CNN (Cascading Convolutional Neural Network) which combines Faster R-CNN (Region-CNN) and CNN, and proposes an improved model CS-CNNPF. In the CS-CNN-PF localization part, in order to remedy the defect of imbalance between positive and negative samples in Faster R-CNN, an improved Faster RCNN based on hard negative mining strategy is designed to locate the object. In the CS-CNN classification part, PReLU and Focal Loss are introduced to improve the activation function and loss function. The experimental results show that compared with the original CNN, CS-CNN, the accuracy of object attribute recognitions of CS-CNN-PF is improved by 29.73%, 6.82%, respectively.
14.1 Introduction The main task of object attribute recognition [1] is to locate the object to be recognized from the input image, and give the specific category name and confidence score of the object. Natural scenes are complex and changeable, high speed movement of the object, shooting angle, light and other objects occlusion, and many other factors will bring great challenges to feature extraction. Aiming at the problem of multiobject attribute recognition, the method of object attribute recognition based on convolution neural network of candidate regions has become the focus of research. Previous common methods are to locate and classify objects directly on the original M. Lei · F. Liu (B) Shenyang Ligong University, Shenyang, China e-mail: [email protected] M. Lei e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_14
123
124
M. Lei and F. Liu
picture. For the location and classification of objects, complex background is a kind of highly disturbing information, and the recognition effect of object attributes will be affected to a certain extent. This paper focuses on cascading convolution neural network model which combines Faster R-CNN (Region-CNN) and CNN, and applies it to multi-object attribute recognition. A method of object attribute recognition based on cascading convolution neural network is designed to reduce the existing technical defects.
14.2 Object Location Based on Improved Faster R-CNN Model 14.2.1 Hard Negative Mining Strategy In order to reduce the interference of complex background on target attribute recognition model, an improved Faster R-CNN algorithm based on OHEM (Online Hard Example Mining) [2] is proposed in this paper. OHEM is used to mine hard negative samples twice to improve the success rate of object location. Hard negative samples are defined as predictive boxes that do not contain object when a trained positioning model is tested. Using the strategy of hard negative mining can make the positioning model pay more attention to hard negative samples, and then improve the discriminant ability and positioning performance of the model. The specific implementation process of the algorithm is as follows: Step 1: Establish initial data set D0, data cache set C t , data set D. Step 2: The initial data set D0 is put into the data cache set C t , and use C t to train the Convolutional Neural Network to get the model M t . Step 3: If the model M t is used to locate the object on a larger sample size data set D, and there are no hard negative samples, the model M t is returned. Step 4: If M t in Step 3 does not return, the data cache set C t is further expanded. The data set D is used to test the model M t in order to mine hard and negative samples. Then add it to the data cache set C t , and the data cache set C t+1 is obtained. Step 5: The data cache set C t+1 is used to train the convolutional neural network to get the model M t . Step 6: Repeat Step 2 to Step 5 until you get the best positioning performance model M.
14 Object Attribute Recognition Based on Hard Negative Mining …
125
14.2.2 OHEM-Faster R-CNN Object Location Model This section mainly uses OHEM-Faster R-CNN algorithm as the theoretical basis to design and implement the object location model. There are two main processes to implement the positioning model: training and testing. The implementation of the model is shown in Fig. 14.1: Location Model Training Process Five-step alternating training method and convolutional layer feature sharing are used in the location model. RPN (Region Proposal Network) model and Fast RCNN model are combined in a convolutional neural network structure. The specific steps of the algorithm are as follows: Step 1: Training RPN. The convolutional layer shared by RPN and Fast R-CNN is initialized using the pre-trained model parameters on the ImageNet data set, and the initial weight W 0 is obtained. From the weight W 0 , a RPN network is trained separately. Step 2: Training Fast R-CNN. Similarly, Faster R-CNN is initialized by Step 1 method, and the initial weight W 0 is obtained. The candidate box generated by Step 1 is fed into the Faster R-CNN model, and a Faster R-CNN network is trained separately, and the weight W 1 is obtained. Step 3: Fine-tuning RPN.
Original model
Train verification set
Model training
Initialize the test model Hard negative mining
Image data processing
Add hard negative samples and train them.
Test set
Optimal location model Position coordinates
Fig. 14.1 OHEM-Faster R-CNN object location model
126
M. Lei and F. Liu
The weight parameter W 1 obtained by Step 2 is used to initialize the parameters of RPN network again, and the convolutional layer parameters have been shared between the two networks. Step 4: Fine-tuning Faster R-CNN. Similarly, the learning rate of the convolutional layer shared by the two networks is set to 0, at which time only the layer parameters unique to Faster R-CNN network are updated. The whole location network has achieved unified training. Step 5: Use the online hard negative mining strategy. The location model completed by preliminary training is tested with a larger sample size data set to mine the hard negative samples. Then the hard negative samples are added to the training samples, and the location network is trained again. Finally, the model with the best location effect is obtained. Location Model Testing Process The concrete implementation steps of the algorithm are as follows: Step 1: Input the test set pictures in turn into the location model of the completed training. Step 2: Forward propagation of models. The input image is calculated by alternating convolution and downsampling operations, and then the output of the model is obtained. Step 3: Effect evaluation of location model.
14.3 Object Attribute Recognition Method Based on Improved Cascade Convolution Neural Network Model CS-CNN Model 14.3.1 Improved Activation Function and Loss Function In order to further improve the object attribute recognition ability of CS-CNN (Cascading Convolutional Neural Network) model, the activation function and loss function of the algorithm are improved, respectively. That is to say, PReLU [3] activation function is used instead of ReLU activation function. Focal Loss function is used to replace Cross-Entropy Loss. 1. PReLU activation function PReLU adds an adaptive learning parameter ai . Compared with ReLU, PReLU has more advantages in convergence speed. The definition of PReLU is shown in Eq (14.1), x (input of activation function) f (x) =
x if x ≥ 0 a, x i f x < 0
(14.1)
14 Object Attribute Recognition Based on Hard Negative Mining …
127
2. Focal Loss function When the training sample data is not balanced, the classification performance of cross-entropy loss function is poor [4]. The definition of Focal Loss is shown in Eq (14.2), pt (the probability of correct classification), γ (focusing parameter) FocalLoss( pt ) = −(1 − pt )γ log( pt )
(14.2)
14.3.2 Improved CS-CNN Model Object Attribute Recognition Method Step l: Identify and annotate data sets [5]. The vehicle data set of Shenyang University of Technology is divided into training verification set and test set. Pictures were randomly selected from the data set of Beijing University of Technology as the migration test set. Objects in data set pictures are annotated. Step 2: Image data preprocessing Step 3: Initialize the cascade network hyper parameters. It includes the maximum number of iterations Pmax and Qmax of location network and classification network, initialize the learning rate εloc and εclc of hyper parameters. Step 4: Initialize the weights and threshold parameters of each layer of cascade network. VGG-D (Visual Geometry Group-D) model parameters are used to initialize the convolution layer shared by RPN and Fast R-CNN for the positioning sub-network of cascade network. while the Gauss distribution with the mean value of 0 and the standard deviation of 0.01 is used to initialize the parameters of the unique layer of the two models, and the bias of each layer is initialized with constant 0. For the categorized sub-networks of cascade networks, the first four convolution layers and the last three full connection layers of VGG-E (Visual Geometry Group-E) are initialized by VGG-A (Visual Geometry Group-A). The parameters of the other layers are initialized by Gauss distribution with mean value of 0 and standard deviation of 0.01. The bias of each layer is initialized by constant 0. Step 5: Conduct training of location network. The weights and thresholds of the location network are further adjusted by iterative cyclic training. When the number of iterations of the location network reaches the maximum set Pmax , Step 6 is executed. Step 6: The location network is tested with test set pictures and migration test set pictures. The prediction frames of all objects on each test picture are obtained, and the location coordinates of the prediction frames are output. Step 7: According to the coordinate of the output of the location network, the object area predicted by the location network is cut out from the original image and used as the input image data of the classification network in the next stage of the cascade network.
128
M. Lei and F. Liu
Step 8: Image data preprocessing. It includes image size normalization, color image format conversion, and pixel value normalization. Step 9: Classification of network training. The weights and thresholds of the classification network are further adjusted by iterative cyclic training. When the number of iterations of the classification network reaches the maximum set Qmax , step 10 is executed. Step 10: The classification network is tested by using test set pictures and migration test set pictures. The confidence score of each category of the picture is output, and the category with the highest confidence score is output.
14.4 Experimental Simulations 14.4.1 Model Parameters The basic feature extraction network used in Faster R-CNN location model is VGG (Visual Geometry Group). VGG has six network forms, namely A, A-LRN, B, C, D, and E. In this paper, VGG-D is selected as the feature extraction network of location model, and VGG-E is used as the object attribute recognition model. For the VGG-D model, thirteen convolutional layers and the first four subsampling layers are used, while the fifth subsampling layer and three full-connection layers are discarded. The specific network structure parameters of VGG-D model are shown in Table 14.1.
14.4.2 Experimental Simulations of Object Location As shown in Tables 14.2 and 14.3, the location effect of OHEM-Faster R-CNN model is better than that of Faster R-CNN model in both test set and migration test set. The recall rate of OHEM-Faster R-CNN model is increased by 2.28% and 4.11%, respectively. OHEM-Faster R-CNN model adopts hard negative mining strategy, reduces the number of missed detection in positioning model, and can better locate part of the occlusion vehicle object.
14.4.3 Contrastive Experimental Simulations of Object Attribute Recognition CNN (non-cascading model): Object location and attribute classification are carried out directly on the original picture. CNN [6] is selected for feature extraction network and single task loss function is used for training.
14 Object Attribute Recognition Based on Hard Negative Mining …
129
Table 14.1 The parameters of VGG-D model Layer
Type
Number of convolutional (pooling) kernels
Size of convolutional (pooling) kernels
Stride
Filling
1
Convolutional
2
Convolutional
64
3×3
1
2
64
3×3
1
2
3
Pooling
64
2×2
2
0
4
Convolutional
128
3×3
1
2
5
Convolutional
128
3×3
1
2
6
Pooling
128
2×2
1
2
7
convolutional
256
3×3
2
0
8
Convolutional
256
3×3
1
2
8
Convolutional
256
3×3
1
2
10
Pooling
256
2×2
2
0
11
Convolutional
512
3×3
1
2
12
Convolutional
512
3×3
1
2
13
Convolutional
512
3×3
1
2
14
Pooling
512
2×2
2
0
15
Convolutional
512
3×3
1
2
16
Convolutional
512
3×3
1
2
17
Convolutional
512
3×3
1
2
Table 14.2 The location result of SYIT-Vehicle data set Object location method
Total number of objects
Detection number
Number of missed detection
Recall rate (%)
Faster R-CNN
1228
1179
49
96.01
OHEM-Faster R-CNN
1228
1207
21
98.29
Table 14.3 The location result of BIT-Vehicle data set Object location method
Total number of objects
Detection number
Number of missed detection
Recall rate (%)
Faster R-CNN
1265
1121
144
88.62
OHEM-Faster R-CNN
1265
1173
92
92.73
CS-CNN (cascading Model): Object location is performed on the original picture, and then attribute recognition is performed on the localized object area image; OHEM-Faster R-CNN is used in the location model, and CNN is used in the attribute recognition model.
130
M. Lei and F. Liu
Table 14.4 The attribute recognition result of SYIT-Vehicle data set Bus (%)
Minivan (%)
Microbus (%)
SUV (%)
Sedan (%)
Truck (%)
Average (%)
CNN
82.16
71.35
77.30
65.41
66.49
70.27
72.16
CS-CNN
91.89
83.24
83.78
79.46
84.32
81.08
83.96
CS-CNN-PF
97.30
88.65
87.57
85.95
90.27
87.03
89.46
Table 14.5 The attribute recognition result of BIT-Vehicle data set Bus (%)
Minivan (%)
Microbus (%)
SUV (%)
Sedan (%)
Truck (%)
Average (%)
62.16
44.86
51.35
48.65
56.76
53.51
52.88
CS-CNN
83.24
71.35
76.76
70.81
70.27
75.68
75.79
CS-CNN-PF
90.27
77.84
85.95
75.14
80.00
86.49
82.61
CNN
CS-CNN-PF (improved cascading model): The location model uses OHEMFaster R-CNN, and the attribute recognition model uses CNN-PF. That is to say, PReLU activation function is used instead of ReLU activation function, Focal Loss function is used instead of Cross-Entropy Loss function. As shown in Tables 14.4 and 14.5, the overall classification effect of CS-CNN-PF model is better than the other two models. On the test set, the accuracy increases by 17.3% and 5.50%, respectively, while on the migration test set, the accuracy increases by 29.73% and 6.82%, respectively. Figure 14.2 shows the results of vehicle classification. Below each picture is the confidence level of the object belonging to six types of vehicles. The category of the maximum confidence level is used as the prediction category of the object. It can be seen that CS-CNN-PF has better attribute recognition ability for six types of vehicles.
14.5 Conclusion In this paper, the problem of object attribute recognition is decomposed into two subproblems: object location and object classification. Aiming at object location, Faster R-CNN algorithm, which has excellent detection performance in object detection algorithm, is used as location model, and OHEM algorithm is used to improve Faster R-CNN. The simulation results show that the improved Faster R-CNN algorithm has superior positioning performance. Aiming at the object classification, the proposed cascade model is used to realize the secondary classification of the positioned object, and the activation function and loss function of the cascading model are optimized. CS-CNN-PF reduces the number of missed detection in the test set, reduces the interference of background information to the model, improves the accuracy of
14 Object Attribute Recognition Based on Hard Negative Mining …
131
Fig. 14.2 The classified result of vehicle type
object attribute recognition of convolutional neural network, and fully illustrates the feasibility and effectiveness of the improved cascading model in solving the problem of object attribute recognition. Acknowledgements This work was supported in part by Open Foundation of Science and Technology on Electro-Optical Information Security Control Laboratory (Grant no. 61421070104), Natural Science Foundation of Liaoning Province (Grant no. 20170540790), and Science and Technology Project of Educational Department of Liaoning Province (Grant no. LG201715).
132
M. Lei and F. Liu
References 1. Grauman, K., Leibe, B.: Visual object recognition. Synth. Lectures Artif. Intell. Mach. Learn. 5(2), 1–181 (2011) 2. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 761– 769. Las Vegas, NV, USA (2016) 3. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp. 1026–1034. Santiago, Chile (2015) 4. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision. Venice, Italy (2017) 5. Dong, Z., Wu, Y., Pei, M., Jia, Y.: Vehicle type classification using a semisupervised convolutional neural network. IEEE Trans. Intell. Transp. Syst. 16(4), 2247–2256 (2015) 6. Li, Z., Dey, N., Ashour, A.S., Cao, L.: Convolutional neural network based clustering and manifold learning method for diabetic plantar pressure imaging dataset. J. Med. Imaging Health Inform. 7(3) (2017)
Chapter 15
Research on Intelligent Identification Method of Power Equipment Based on Deep Learning Zhimin He, Lin Peng, Min Xu, Gang Wang, Hai Yu, Xingchuan Bao, and Zhansheng Hou Abstract After investigation and analysis, this paper studies deep learning to solve the automatic analysis and identification of massive unstructured media data in the power system. In terms of feature extraction, based on the Alexnet model, two independent CNN models are proposed to extract the characteristics of power equipment. In terms of recognition algorithm, the advantages of traditional machine learning methods are combined with the advantages of random forests. Intelligent identification algorithm for power equipment combined with CNN.
15.1 Introduction 15.1.1 Background At present, the newly built intelligent substation and some modified unattended substations are gradually adopting intelligent monitoring technologies such as highdefinition video surveillance and infrared thermal imaging, and adopting helicopters, drones, robots, etc [1]. Such means to achieve efficient and rapid substation inspection has also been rapidly promoted and applied; traditional manual inspection also collects a large number of visible, infrared, ultraviolet, and other detection images. These large amounts of media data streams provide a data foundation for using deep learning for power device identification [2]. For this reason, in view of the deficiencies of the conventional identification method, the power device is identified by introducing deep learning.
Z. He (B) · L. Peng · M. Xu · G. Wang · H. Yu · X. Bao · Z. Hou Global Energy Interconnection Research Institute, Nanri Road 8, Nanjing 210003, China e-mail: [email protected] Z. He State Grid Key Laboratory of Information & Network Security, Beiqijia, Beijing 100000, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_15
133
134
Z. He et al.
15.1.2 Purpose and Significance At present, image processing and recognition technology has been widely used in the fields of biomedicine, robot vision, remote sensing, navigation, transportation and military reconnaissance, and some mature algorithms, but the image processing technology is applied to the analysis of key equipment in power systems. Identification is a typical cross-disciplinary research topic. The research history is still short, and the research results are few, and the related research mainly focuses on the extraction and identification of high-voltage transmission lines, towers, and insulators. Therefore, it is necessary to design a method to actively identify the on-site operation information, and to realize the automatic acquisition of the operation and equipment information by identifying the operation target and analyzing the work environment. The various types of information on the site are mainly spatial dimensions, and the time dimension and logical relationship are supplemented by reorganization, providing accurate and timely auxiliary guidance for the operators.
15.1.3 Summary of Research Level at Home and Abroad The concept of deep learning originated from artificial neural networks. Artificial neural networks attempt to solve various machine learning problems by simulating the mechanism of brain cognition. Due to its good self-learning ability, modeling ability, and strong robustness, in the 1990 s, artificial neural networks received the attention of academic and industrial circles. However, the ANN network has many parameters and it is prone to over-fitting. Therefore, although the ANN can achieve high recognition accuracy during the training process, the accuracy of recognition is often not high when testing the ANN using the test data set. In 2006, Hinton proposed the concept of “deep learning” and published four papers related to deep learning in the famous “Science” magazine. These articles have aroused widespread concern and stimulated the research boom at home and abroad [3]. Compared with shallow learning, the advantages of deep learning are as follows: (1) In the ability of the network to express complex objective functions, the shallow structure of the ANN can not represent complex high-dimensional functions in some cases, but the deep structure of the ANN can well represent such complex high-dimensional functions. (2) In terms of time complexity and space complexity, if the layer network structure can effectively represent a function, if you try to use less than the layer network structure, you need to increase the calculation factor, thus, the time complexity And space complexity will also be affected; in addition, a large amount of training data is needed in the process of optimizing and adjusting parameters, which has a limitation, that is, a small amount of training data cannot be used to train a large number of parameters, otherwise Will reduce the generalization of the neural network.
15 Research on Intelligent Identification Method of Power Equipment …
135
(3) From the perspective of bionics, the principle of deep learning is to simulate the structure of neurons in the brain. It first stratifies the input data, and then processes the hierarchical data layer by layer. The eigenvalues of the networks at different layers reflect the different levels of the original input data. (4) In data sharing, model parameters learned from one data source can be migrated to another data source. Therefore, for similar recognition tasks, the parameters of the deep learning model can be used multiple times, which is equivalent to Provide unsupervised data to get more useful information.
15.2 Solutions and Key Technologies Based on the collected image information of power equipment, this paper uses different image recognition algorithms to compare and analyze the effectiveness of image feature extraction of different devices, and evaluates the visual recognition degree of different methods to establish the classification features of visually distinguishable devices.
15.2.1 Power Equipment Image Processing and Feature Extraction The human visual system consists of visual senses, visual pathways, and multi-level visual centers. When observing an object, humans capture visual data through visual senses, that is, both eyes; and transmit visual information through visual pathways, that is, optic nerves. In the process of transmission, the visual data passes through the multi-polar visual central system, and the information cross-mixing operation is carried out in the central system, and then propagated again to form mixed information and transmitted to the brain. After receiving the information, the brain analyzes, and processes the information. In view of this, this paper proposes a two-channel CNN model by analyzing and testing many current mainstream CNN models. The model obtains two sets of device features through two independent CNN models. The final device image feature is obtained by a hybrid operation [4]. The two-channel convolutional neural network structure of this paper is designed and extended based on the Alexnet network structure. This paper expands and improves the full-link layer of Alexnet, and proposes and designs an 11-layer deep convolution network. The TCNN consists of two independent and parallel convolutional neural network models, CNNa and CNNb. CNNa and CNNb have different inputs but the model structure is the same. TCNN performs feature learning on images through CNNa and CNNb respectively, and finally cross-mixes the learned features at the top of the model to obtain the final device image features.
136
Z. He et al.
The input of CNNa is the image with the size of 256×256 after the original image is normalized, and the input of CNNb is the V channel component extracted after HSV transform of the original image. CNNa and CNNb are 9-layer network structures, including 5 convolutional layers, and 4 fully connected layers. Unlike AlexNet, the last fully connected layer of CNNa and CNNb outputs 512 neural units. After obtaining the feature data of the last fully connected layer of CNNa and CNNb, the TCNN performs a second cross-mixing operation on the two sets of feature data. That is, the output of the two fully connected layers is first crossconnected, and the result is the input of the fully connected layer, the number of neural units output is 512; then the back fully connected layer is split into two parts, and the two parts of the data are mixed and connected. The feature vector is the final feature of the image, the number of neural units output is 256.
15.2.2 Power Equipment Classification There are many types of equipment used in power systems. Many of them have a single color, a similar appearance, and complex environments such as substations, resulting in complex background images. Considering that the Logisitc classifier is generally used to implement the two-class problem, while the Softmax classifier can solve the multi-class problem, Softmax tends to have a higher classification error rate for complex confusing targets. Therefore, Logistic classifiers and Softmax classifiers are not suitable for critical device identification in power systems [5]. In this paper, the random forest classifier is used to generate the decision-making “forest” by generating a plurality of randomly selected sample subsets and the decision tree generated by the feature subspace, and the classification result is obtained by voting in the classification stage. The classification method proposed in this paper consists of two parts, as shown in Fig. 15.1: the training phase and the test phase. In the training phase, firstly, the TCNN proposed in this paper randomly selects images from the database and extracts image features on the device image database. Then, based on the adaptability of random forest classification, the learned features are analyzed and based on the analysis. The result is feature selection. Considering that the features extracted by the low-level convolutional layer do not contain rich semantic information, and if these features are used, the dimensions of the image features will increase significantly. Therefore, only the features of the TCNN fully connected layer and the mixed layer are analyzed and selected. Finally, the selected features are used to train the random forest. In the test phase, the depth feature of the image is first calculated using TCNN, and then the feature subset selected in the training phase is used as the final feature of the image, and the input image is finally classified by the trained random forest.
15 Research on Intelligent Identification Method of Power Equipment …
137
Fig. 15.1 The framework of the classifier
15.2.3 Establish a Standardized Image Analysis Library The purpose of normalized storage is to be able to quickly and efficiently classify and index, and compare the acquired images with the images in the analysis library for typical object, typical fault/case judgments. On the basis of a large number of map acquisition and analysis, based on the feature quantity, through data mining, machine learning, intelligent retrieval, and other methods, the standardized typical object, typical fault/case image analysis library is constructed to improve the analysis ability [6]. In order to be able to quickly find the desired image in a large image analysis library, the image analysis library needs to be pre-processed and indexed. The clustering tree generated by the cluster is used to establish the index of the image analysis library to solve the index problem of high-dimensional data. Research data mining techniques based on cluster analysis, including classification, hierarchy, grid-based, density-based, model-based, and other clustering methods. Dividing method: divide a given large amount of data into multiple groups or clusters by a certain rule or different division methods, each group should contain at least one set of data, and each group has a significant difference, different types of data can only belong to different groups. The data in each group formed by this method has strong similarity, which is convenient for overall analysis. Hierarchical method: clustering by dividing the data into several groups to form a tree structure Analysis, according to the different ways of building trees can also be divided into top-down splitting algorithm and bottomup condensing algorithm; grid-based method: the data object is divided into a finite number of unit forms, thus forming a network structure, Clustering operations on the grid speed up processing; density-based methods: methods of continuous clustering
138
Z. He et al.
by density of adjacent localities exceeding a certain threshold, i.e. in each given area. A point containing certain data, so as to filter out some abnormal points and improve the efficiency of data analysis; model-based method: each existing The cluster is assumed to be a model that optimizes the adaptability between given data and the model by finding the best fit to the model. The data is generally generated by a potential probability distribution. The algorithm generally uses statistical principles or nerves. The method of the network. As the number of typical objects increases, on the one hand, new typical objects, typical faults/cases need to be added to the analysis library and classified, and on the other hand, the existing situation, typical faults/cases need to be combined to analyze the situation on-site. Therefore, a multi-class support vector analysis system can be established by machine learning method, and the posterior probability output, and lexical semantic relationship can be integrated to reduce the training time and training sample size, and improve the automatic analysis and automatic classification of the analysis library. The support vector machine is based on the statistical learning theory and the principle of structural risk minimization. The basic idea is to map the samples of the input space to the high-dimensional feature space through nonlinear exchange, and then find the optimal classification surface that separates the samples linearly in the feature space. Different kernel functions are used to transform into different feature spaces. There are three commonly used kernel functions, namely, polynomial, RBF (radial basis), and Sigmoid (perceptron). These kernel functions are suitable for most nonlinear classification problems.
15.3 Conclusion In this study, a method based on deep learning for intelligent identification of power equipment is proposed. A two-channel CNN feature extraction model based on human visual information transmission mechanism is introduced. Based on the convolutional neural network structure, the combination of deep learning is studied. A device identification algorithm for random forest classifiers; a standardized image analysis library was established. Deep learning is closely related to traditional machine learning methods. The deep learning module is added to the traditional machine learning framework to obtain higher recognition rate of key power equipment in power systems, recognition rate is as high as 95%. In recent years, our team have been engaged in research on intelligent identification methods for power equipment, and have achieved many research results in the power industry. Acknowledgments This work was financially supported by the science and technology project to State Grid Corporation “Research on Intelligent Reconfiguration and Cognitive Technology of Complex Dynamic Operating Environment Based on Deep Vision.” We would like to express our heartfelt gratitude to our colleagues and friends, they gave us a lot of useful advice during the process of studying this topic research and also provided enthusiastic help in the process of typesetting and writing thesis! At the same time, we want to thank all the scholars
15 Research on Intelligent Identification Method of Power Equipment …
139
who are quoted in this paper. Due to our limited academic level, there are some disadvantages to writing a paper, and we will solicit the criticism and corrections from experts and scholars.
References 1. Yamamoto, K., Yamada, K.: Analysis of the onfrared images to detect power lines, Proceedings/TENCON, USA: IEEE Press, pp. 343–346 (1997) 2. Li, J.F.: Research on Electric Equipment Image Recognition and its Application based on Deep Learning. Guangdong University (2018) 3. Wang, D., Li, Z., Dey, N., Ashour, A.S., Moraru, L., Biswas, A., Shi, F.: Optical pressure sensors based plantar image segmenting using an improved fully con-volutional network. Optik – Int. J. Light Electr. Opt., 155–162 (2019) 4. Sun, X.: Power Equipment Identification and Condition Monitoring in Complicated Background. North China Electric Power University (2018) 5. Benjun, D.: Research on 3D Point Cloud Data Recognition Technology of Substation Equipment Based on Laser Scanning. Zhengzhou University (2016) 6. Yongli, H., Xinglin, P., Yanfeng, S., Baocai, Y.: Multi-source heterogeneous data fusion method and its application in object positioning and tracking. Scientia Sinica(Informationis) 43, 1288– 1306 (2013)
Chapter 16
Chinese Color Name Identification from Images Using a Convolutional Neural Network Lin Shi and Wei Zhao
Abstract Human color vision is a kind of subjective perception. Individual observers may have different color visions. However, the widely used RGB values were highly correlated with the display devices and even the same RGB value represented different colors on various display devices. Furthermore, physically identical patches with identical chromaticity coordinate values had different color perceptions in different spatial contexts. At the same time, individual color vision did not prevent people from color-related communications through color vocabulary. Considering the above problems and the characteristics of color vocabulary used in daily life for communication, we built a color vocabulary labeled data set based on human subjective color visual perception through corresponding visual psychophysics experiments, and then used the data set to train deep neural networks. The network can automatically identify 11 Chinese color vocabularies corresponding to images. The indicator F1-measure of the model reached an accuracy rate of more than 75% in different test sets. Based on the subjective visual perception data set and the deep neural network, we tried to avoid the perceived difference of other color extraction methods.
16.1 Introduction Color is an important visual information of objects. Unlike physical properties, human color vision is a subjective perception. Different individuals may have different color visions. However, the traditional color extraction algorithms are based on the RGB color model, which may cause the colors extracted on different objects to be inconsistent. Moreover, the computer’s color extraction method lacks the recognition of the situation, and is easily affected by the physical surface texture, illumination, and shadow, and cannot accurately identify the same color in different environments. L. Shi (B) · W. Zhao School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_16
141
142
L. Shi and W. Zhao
16.2 Color Vocabulary Labeled Data Set Since the training data set has a great influence on the accuracy of the classification result, and there is currently no solid color block database with color labels, the data set required for this experiment should be built from the experiment. According to the paper A Cross-Cultural Color-Naming Study [1], different cultures have different strategies when naming colors. This database uses common color labels based on Chinese naming rules, which includes 11 labels of black, white and blue, gray, green, red, pink, yellow, purple, brown, and orange [2]. Experiment The process related to this experiment was briefly described as follows: (1) Participants The experiment selected 10 participants, including 6 females and 4 males, aged from 23 to 27 years old, and the ethnic groups were Han (the choice and use of color vocabulary will also be influenced by different national cultures). And they have passed the color-blind identification test and the basic color perception test. (2) Experimental materials The experiment selected a total of 290 single-color images, including 11 focal colors [3] vocabulary based on Chinese color naming rules. The number of black and white images is small, and each includes 10 color block images. The remaining blue, green, red, pink, yellow, purple, brown, and orange color images were adjusted to 30 categories according to different saturation and brightness. Single-Color Image Data Set For the marking of single-color images, two sets of experiments need to be designed. The purpose of designing two sets of experiments is to prevent color from interfering with each other. This phenomenon is called simultaneous contrast in color science [4]. The first experiment could obtain the perception of each color by the human eye when there is no simultaneous contrast. But in nature and in life, each color is inevitably affected by one or more other colors. So, the second experiment placed many different single-color images on the same experimental interface for the participants to mark, so each color image will be affected by another color block, resulting in a color contrast effect. Multiple-Color Image Data Set Random selecting 2–4 single-color images with different color labels for splicing a multiple-color image. There was a total of 55 combinations of the two labels from the 11 color labels. Each combination contains 10 multi-color block images. Complex Image Set The complex type of pictures in the experiment will capture from the Internet. These pictures naturally have some environmental interferences, such as light changing, background interference, and noise or compression in the transmission. The source of the complex picture in the experiment is as follows:
16 Chinese Color Name Identification from Images …
143
Fig. 16.1 Example images in the data set. Car images, cloth images, and comprehensive images were shown in the first, second, and third row, respectively
1. Car pictures: Search for pictures with “multiple/single color vocabulary name + car” as keywords, grab a total of 1000 pictures. 2. Clothing pictures: Search for pictures with the words “multiple/single color vocabulary name + clothes/dress/pants/shirt…”, grab a total of 2000 pictures. 3. Comprehensive pictures: Search for pictures with “multiple color vocabulary names” as keywords, and grab a total of 2000 pictures. Figure 16.1 showed some example images in the data set.
16.3 The Model of Color Recognition 16.3.1 Design of Multi-label Convolutional Neural Networks This paper had two training datasets. The first datasets included single-color images, multi-color images, and complex images with color labels. The second datasets included complex images. Architectures of neural networks were shown in Figs. 16.2 and 16.3. The training epoch of this paper was determined according to the convergence of the network model in the various experiments. Since the two training datasets were different in size, the training batch size of the first training set is 256, that means each batch processes 256 samples, and the test batch size is 25. The training batch size of the training set is 100, and the batch size is 25 in testing. 20% of the complex image datasets are used as the test set, the total test sample is 1000, which includes 200 car test samples, 400 clothing test samples, and 400 comprehensive data samples.
144
L. Shi and W. Zhao
96 96 3 96 96 32
3 input
3 3 Conv , 3 32 +Relu
32 32 32
32 32 32
Max 3 pooling 1/3
Dropout 3 0.25
32 32 64 32 32 64 16 16 64 16 16 64 16 16 128 16 16 128 8 8 128 8 8 128
3 3 Conv , 3 64 +Relu
3 3 Conv , 3 64 +Relu
Max 3 pooling 1/2
3 output
Dropout 3 0.25
3 sigmoid
3 3 Conv , 3 128 +Relu
3 3 Conv , 3 128 +Relu
Max 3 pooling 1/2
Dropout 3 0.25
Fully 3 connected 11
Dropout 3 0.5
Relu 3 acƟvaƟon
Fully 3 connected 256
1 1 11
1 1 256
1 1 256
1 1 256
Fig. 16.2 Color_CNN_1. 3 × 3 means the size of kernel, 32/64/128 means the stride, and the output dimensions are listed in the figure 96 96 3
96 96 32
96 96 32
3 input
3 3 Conv , 3 32 + Relu
3 3 Conv , 3 32 + Relu
3 output
48 48 32
Max 3 pooling 1 /2
3 sigmoid
48 48 32 48 48 64
Dropout 3 0 . 25
3 3 Conv , 3 64 + Relu
Fully 3 connected 11
1 1 11
48 48 64 24 24 64
3 3 Conv , 3 64 + Relu
Dropout 3 0 .5
1 1 128
Max 3 pooling 1 /2
Relu 3 acƟvaƟon
1 1 128
24 24 64
Dropout 3 0 . 25
Fully 3 connected 128
1 1 128
Fig. 16.3 Color_CNN_2. 3 × 3 means the size of kernel, 32/64/128 means the stride, and the output dimensions are listed in the figure
The model included the binary cross entropy [5] for multi-label classification and independent Bernoulli distribution. As shown in Fig. 16.4, the x-axis is epoch, the y-axis is loss or accuracy. The accuracy rate of the test set in pane a is about 93%; the accuracy rate of the test set in pane b is about 90%; the accuracy rate of the test set in pane c is about 88%; and the accuracy rate of the test set in pane d is about 83%.
16.3.2 Evaluation Criteria We selected precision, recall and F1-measure as the evaluation indicators of the results. F1-measure [6] is the harmonic mean of the accuracy rate and the recall rate, which is considered to be the same as the weight of the recall rate. The formula is as follows:
16 Chinese Color Name Identification from Images …
145
Fig. 16.4 Accuracy and training loss of two different training sets for the two multi-label classification models. a Color_CNN_1 with the first training set; b Color_CNN_2 with the first training set; c Color_CNN_1 with the second training set; and d Color_CNN_2 with the second training set
F1 =
2 × Precision × Recall Precision + Recall
From the evaluation indicators in Tables 16.1 and 16.2, it shows the COLOR_CNN_1 with deeper network layer was better than COLOR_CNN_2 when the training data were the same. After comprehensive consideration and comparison, the COLOR_CNN_1 network structure was chosen. Table 16.1 Comparison of two network structures (first type of data set) CNN
Data type
Precision
Recall
F1-measure
COLOR_CNN_1
Car
0.8326
0.6854
0.7519
COLOR_CNN_2
Clothes
0.8465
0.6898
0.7602
Comprehensive
0.8233
0.6754
0.7333
Car
0.8204
0.6732
0.7395
Clothes
0.8319
0.6806
0.7487
Comprehensive
0.8072
0.6648
0.7291
146
L. Shi and W. Zhao
Table 16.2 Comparison of two network structures (second type of data set) CNN
Data type
Precision
Recall
F1-measure
COLOR_CNN_1
Car
0.8053
0.6742
0.7339
COLOR_CNN_2
Clothes
0.8182
0.6783
0.7417
Comprehensive
0.7829
0.6597
0.7160
Car
0.7886
0.6569
0.7168
Clothes
0.8027
0.6654
0.7276
Comprehensive
0.7613
0.6501
0.7013
At the same time, it can be seen from the table that for the same neural network, whether it is COLOR_CNN_1 or COLOR_CNN_2, the F1-measure score of the first type of data is higher than the second type of data. This also confirms that the data does have a great impact on the correct rate of the neural network. So, the color-labeled data sets produced in this paper is meaningful for identifying color vocabularies.
16.3.3 Choosing the Appropriate Threshold The threshold is usually 0.5, because if the threshold is too small, the number of false positive will increase, resulting in a small Precision and a large Recall. When the threshold is too large, the number of false negatives will increase, resulting in a large Precision and a small Recall. Figure 16.5 shows the test results of different thresholds. It can be seen from the above test result graphs under different thresholds that each type of data has the best F1-measure value at the thresholds of 0.7.
Fig. 16.5 Three types of image data test results. a car image data sets, b clothing image data sets, and c comprehensive image data sets
16 Chinese Color Name Identification from Images …
147
16.4 Conclusion We built a color vocabulary labeled data set based on a visual psychophysical experiment, and then put the data set together with collected images to train the convolutional neural networks. After training, the convolutional neural network could identify 11 Chinese color names from corresponding images in good accuracy and solved some problems of visual chromatic aberration. The model improved the applicability and robustness of image color recognition. It also simplified the image pre-processing such as reducing the process steps of image segmentation. In order to get the better recognition accurate, it can obtain a larger and more reliable data set. Adding more color images and color vocabulary labels could make the network learn more features and informations; or elaborate the color names like classifying blue into dark blue and light blue according to the color shade. Since the experiment was based on visual psychophysics, it could include more participants in the experiment to make the result more representative.
References 1. Lin, H., Luo, M.R., Macdonald, L.W., Tarrant, A.W.S.: A cross-cultural colour-naming study. Part I: Using an unconstrained method. Color Res. Appl. 26(1), 40–60 (2001) 2. He, G.: English and Chinese cultural connotation of color words in comparison. Asian Soc. Sci. 5(7), 160–163 (2009) 3. Heider, E.R.: Universals in color naming and memory. J. Exp. Psychol. 93(1), 10 (1972) 4. Arend, L., Reeves, A.: Simultaneous color constancy. JOSA A 3(10), 1743–1751 (1986) 5. De Boer, P.T., Kroese, D.P., Mannor, S., Reuven, Y.: Rubinstein. A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005) 6. Sasaki, Y.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)
Chapter 17
Research and Design of a Key Technology for Accelerating Convolution Computation for FPGA-Based CNN Liang Ying, Li Keli, Bi Fanghong, Zhang Kun, and Yang Jun
Abstract Convolutional Neural Network is an important network model of deep learning algorithm. It is widely used in handwriting recognition, Natural Language Processing, and so on. It is also a hot topic in the research of machine learning and computer image vision, so it has a certain research significance and value. This paper first proposes a simple convolution neural network model—SpNet model, and analyzes the different types of parallelism in the training process of convolution neural network. In view of the extensive use of convolution computation, from the perspective of software and hardware, a scheme to speed up convolution computation is designed.
17.1 Introduction to Relevant Theories 17.1.1 Introduction to Convolution Neural Network Figure 17.1 shows a classic Convolutional Neural Network architecture, where Input is the input layer, Conv is the convolution layer, ReLU is the activation function, Pool is the pooling layer, and Fully Connected is the fully connected layer [1]. Input
Conv
ReLU
Conv
ReLU
Pool
ReLU
Conv
ReLU
Pool
Fully Connected
Fig. 17.1 Convolutional Neural Network architecture
L. Ying · L. Keli · B. Fanghong · Z. Kun · Y. Jun (B) School of Information Science and Engineering, Yunnan University, Yunnan Kunming, Kunming 650091, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_17
149
150
L. Ying et al.
17.1.2 The Training Process Image definition label is the training process of Convolutional Neural Network. The computer will adjust its weight through a training process called back propagation. Back propagation can be divided into forward propagation, loss calculation, back propagation, and weight update [2]. The back propagation loss calculation is shown in Eq. 17.1, where target is input image data, output is output image data, and E total is propagation loss. E total =
1 2
(target − output)2
(17.1)
Visualization is only an optimization problem in calculus, and the formula (17.2) can be used to find out which input is the most direct cause of network loss [3]. W = Wi − η
dL dW
(17.2)
dL This is a mathematical equivalent of dW , where W is the weight at a particular layer and η is the learning rate. A back propagation process is performed through the network to detect the weight of the largest loss and adjust them to reduce the loss.
17.2 SpNet Model This section designs an architecture model of Convolutional Neural Network with a small network scale, a small number of training weights, and a low level of neural network, called Simple Net (hereinafter referred to as SpNet) model. The structure of SpNet model consists of 3 convolution layers, 2 lower sampling layers, and 1 output layer, respectively, with a total of 6 layers. The structure model diagram is shown in Fig. 17.2.
Input The Initial Image Information
Convolution
Subsampling
Convolution
Subsampling Convolution
Fig. 17.2 Structural model diagram of SpNet algorithm
Full Connection
17 Research and Design of a Key Technology …
151
17.3 Research on Correlation Algorithm Research 17.3.1 Im2col Algorithm Im2col is an algorithm used to process matrices [4]. It is commonly used in convolution operation. The idea is to transform the convolution nucleus and the image data into the matrix A and B, and then A and B multiplied in result of which is got the matrix C.
17.3.2 Strassen Algorithm Strassen algorithm is an algorithm that optimizes matrix multiplication based on partition method [5]. It divides the n-order matrix into 2 × 2 matrix blocks and continuously divides each matrix block into N /2 × N /2 matrices.
17.3.3 Improved Strassen Algorithm The improvement process is: for the multiplication of rectangular matrix of any order, as shown in Fig. 17.3. In the matrix A, the largest l is found first, so that l = 2 k, where the length of the side of A is l, and then the A matrix is divided into four sub-matrices a, b, c, and d. Similarly, in the B matrix, it is also divided into four sub-matrices e, f, g, and h according to l. In this way, the C matrix is also divided into four sub-matrices, respectively, ae+bg, af+bh, ce+dg, cf+dh. Two pieces of ae can be multiplied with the original Strassen matrix multiplication, and bg can be multiplied with the rectangular matrix of any order, so the improved Strassen matrix multiplication can be invoked [6]. Using the improved Strassen algorithm and naive matrix multiplication algorithm, a time comparison table can be obtained, as shown in Table 17.1. ae+bg
a
b
c
d
×
e
f
g
h
af+bh
=
cf+dh ce+dg
A
B
Fig. 17.3 Multiplication of any order matrix
C
152 Table 17.1 Time comparison between naive matrix algorithm and Strassen matrix algorithm
L. Ying et al. Matrix size
Naive matrix algorithm(s)
Strassen algorithm
4 × 25 and 25 × 12
0.0035
0.0033
4 × 25 and 25 × 300
0.9012
0.8951
In conclusion, the improved matrix multiplication algorithm has certain advantages, which can verify that the matrix multiplication algorithm proposed in this chapter can accelerate convolution calculation.
17.4 An Accelerated Convolution Calculation Scheme Based on FPGA 17.4.1 Design of Training Architecture of Convolutional Neural Network Based on FPGA The overall training architecture design of Convolutional Neural Network based on FPGA is shown in Fig. 17.4. In this architecture, there are module controller, data controller, and operation controller, respectively, whose functions are as follows: module controller configures different computing modules according to different layers, and then reconfigures modules to FPGA in a specific order. The data controller divides the training data into different mini-batch and loads them into DRAM for training. The running controller is responsible for calling the module configuration to complete the calculation. After the completion of the overall architecture design, the Convolutional Neural Network model is specifically modularized. The overall framework of the parameterized module is shown in Fig. 17.5. The input controller consists of an address Fig. 17.4 The overall training architecture of Convolutional Neural Network based on FPGA
17 Research and Design of a Key Technology …
153
Fig. 17.5 Parameterized module’s overall framework
generator and a scheduler that generates the address and reads the input data for each clock cycle, which caches the input data and sends it to the cell. The output controller contains an address generator for storing input data into DRAM and is responsible for transmitting updated weights back to the CPU through PCI-E.
17.4.2 Design and Implementation of Convolution Computing Module The functions of the convolution calculation module are as follows: multiply and accumulate the image data with the corresponding convolution kernel, and then output the convolution after the activation function. This paper proposes the realization method of “parallel + serial” convolution calculation, and the top-level design diagram of its computing unit is shown in Fig. 17.6.
17.4.3 Testing the Convolutional Neural Network 17.4.3.1
Functional Simulation
As shown in Fig. 17.7, test1, test2, test3, test4, and test5 are five adjacent pixel points in the image, with pixel values of 1, 2, 3, 4, and 5, respectively. Weight1, weight2, weight3, weight4, weight5 are the five right values that are adjacent to the nucleus of the convolution, and the weights are 2, 3, 4, 1, 5; The result is the result of a convolution operation. Both the input and output bit widths are 8 bits. In the simulation diagram, it can be seen that the output signal changes with the change
154
L. Ying et al.
Fig. 17.6 The top-level design of the cell
Fig. 17.7 Function simulation of 10 parameters
of the clock signal. When the rising edge comes, the convolution kernel and the corresponding pixel points on the image operate.
17.4.3.2
Performance Test
The Neural Network model mentioned in this paper was used to process images on FPGA chip and Intel Core i5 processor, respectively. The results are shown in Table 17.2. The time of processing an image based on FPGA neural network designed in this paper is about 1/3 of the CPU, and the power consumption is about 1/5 of the CPU.
17 Research and Design of a Key Technology … Table 17.2 Comparison results
155 The proposed method
Common CPU methods
Time
16.2
40.1
Power consumption
8
39
17.5 Summary As the data and information of the Convolutional Neural Network are stored and processed separately by each neuron, the Convolutional Neural Network can be parallelized so that the training data and training parameters can be updated at the same time. FPGA contains abundant logic calculation resources, which can deal with the intensive convolution calculation in the training process of Convolutional Neural Network. Therefore, in this paper, the parallelism of different types of Convolutional Neural Network in the training process is analyzed, and the software and hardware design is carried out for its intensive convolution calculation, and finally the accelerated calculation of a Convolutional Neural Network is realized.
References 1. Han, S., Liu, X., Mao, H., et al.: EIE: efficient inference engine on compressed deep neural network. ACM Sigarch Comput. Architect. News 44(3), 243–254 (2016) 2. Sordoni, A., Galley, M., Auli, M., et al.: A neural network approach to context-sensitive generation of conversational responses. Trans. Royal Soc. Tropical Med. Hyg. 51(6), 502–504 (2015) 3. Namat¯evs, Ivars: Deep convolutional neural networks: structure, feature extraction and training. Inf. Technol. Manag. Sci. 20(1), 40–47 (2017) 4. Fukushima, Kunihiko: Neocognitron for handwritten digit recognition. Neurocomputing 51, 161–180 (2003) 5. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. Comput. Sci., 4013–4021 (2015) 6. Sahin, S., Cavuslu, M.A.: FPGA implementation of wavelet neural network training with PSO/ iPSO. J. Circuits Syst. Comput. 27(6), 1850098 (2017)
Chapter 18
Research on Voice Mark Recognition Algorithms Based on Optimized BP Neural Network Kong Yanfeng, Jia Lianxing, and Zhang Xiyong
Abstract In order to improve the performance of voiceprint recognition system, a new method of Voiceprint Recognition Based on wavelet analysis and BP neural network-niche genetic optimization algorithm (BP-GA) is proposed to overcome the shortcomings of the commonly used pattern recognition algorithms (LPCC, MFCC, etc.). Firstly, wavelet analysis is used to extract time-domain and frequency-domain feature variables of speech signals. Then, niche genetic algorithm is used to overcome the problem that traditional multi-layer artificial neural networks are easy to fall into local minimum in training. Finally, wavelet feature variables are used as training data of optimized neural networks to obtain the final voiceprint recognition algorithm. The experimental results show that the performance of Voiceprint Recognition Based on wavelet analysis and BP neural network-niche genetic algorithm has the advantages of fast recognition speed, high recognition rate, low error rate, automatic error correction, and robustness to different pronunciators.
18.1 Introduction As network technology rapidly develops, the information security is more and more important. Traditional password authentication based on personal identification technology has been insufficient, while biometrics technology is increasingly mature and showing its superiority. Among them, voiceprint recognition is a new technology developed in recent years, it is simple, precise, economical, and non-contacted in comparison with other recognition technology [1]. K. Yanfeng · J. Lianxing College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, China K. Yanfeng 92815 Army Branch of Weapon, Xiangshan, Zhejiang, China Z. Xiyong (B) Wuhan Donghu University, Wuhan, Hubei, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_18
157
158
K. Yanfeng et al.
This paper puts forward a new recognition algorithm based on wavelet analysis and BP-GA optimization algorithm, compared with traditional algorithm such as LPCC (Linear Prediction Cepstrum Coefficient) and MFCC (Mel Frequency Cepstrum Coefficient), it has the advantage of fast recognition velocity, high recognition rate, low fault rate, automatic error correcting, and strong robustness with different voice-makers.
18.2 Wavelet Transform of Voice Signal For the object of voiceprint algorithm is digital variation, the preprocessing is needed for voice signal. The LPCC and MFCC algorithm employ FFT frequency-domain analysis method, which can only extract frequency-domain feature, but can not extract time-domain feature, even if add time-window, the time-domain resolution is still inadequate, and the frequency-domain features are easily lost. Wavelet transform is a new time-frequency analysis method, which has the merit of resolution adjustable, strong anti-interference and can reflect signal’s nonstationary and transient feature [2]. Besides, it has low computation and is easy to realize, fully reflects not only human ears’s feature but also voice signal’s dynamic feature, and improve the rate of final voiceprint recognition. Suppose the signal with noises is [3]: s(i) = f (i) + e(i), i = 0, 1, 2, · · · , n − 1
(18.1)
In Eq. (18.1), f (i) is the real voice signal, e(i) is Gaussian white noise or other noise, s(i) contains noise signal and useful low-frequency signal to be extracted, i represents the number of wavelet decomposition layers. Voice signals are mainly stable signals in engineering, while noises are mainly high-frequency signals. So we should select a wavelet base and determine the layer of wavelet decomposition. Make 3-layer decomposition of signal as shown in Fig. 18.1. Mark CD1, CD2, CD3, CA3 as d1, d2, d3, a3 and make normalization, mark the normalized parameters as Q1, Q2, Q3, Q4. Construct a new vector [Q1, Q2, Q3, Q4], and this vector is the voice feature vector used for BP-GA algorithm training, the wavelet decomposition result of speech signal is shown in Fig. 18.2. S CA1 CA2 CA3
CD1 CD2
CD3
Fig. 18.1 Schematic diagram of wavelet signal decomposition
18 Research on Voice Mark Recognition Algorithms …
159
Fig. 18.2 Wavelet decomposition results of password speech signal
18.3 Basic Conception of BP-GA BP-GA (back propagation- Genetic Algorithm) is a new optimization algorithm raised recently, it has the characteristics of high efficiency, strong global searching capability, etc. [9]. GA is independent of gradient information, and has no requirement of the object function’s continuity, even its definite expression [4, 5]. BP-GA takes the advantage of artificial neural network and genetic algorithm, not only overcomes the low efficiency and long convergence time of GA in structure optimization, but also improves global solving capability of GA, so it is an effective and applicable solution [6, 7]. Its working process is shown in Fig. 18.3.
18.3.1 Construction of BP Neural Network VoicePrint Recognition Multi-layer feed forward neural networks divides the network into several layers, which are arrayed in order, the neuron in layer i only receives signals from neuron in layer (i−1), and the neurons in all layers have no feedback. For a forward network, when vector x is inputted, vector y is outputted through the network, so the forward neural network can be seen as a transformer of x to y mapping [8]. The neural network’s topological structure is shown in Fig. 18.4. Three layers BP neural network is employed. The input vector in the first layer can be adjusted according to practice. The first layer is a normalized layer, where three nodes are taken. The input vectors are software threat, hardware threat, and
160
K. Yanfeng et al. Start
Genetic data
Determine N groups of initial vectors
Coding and initialize population
Get instructor sample by numerical simulation
Population evaluation and generate new population
Train neural network
N
If the ending condition is reached?
Trained neural network
If the mapping accuracy meets the requirement?
Y
N Generate next generation
Y Population evaluation
End training
End
Fig. 18.3 Flow chart of BP-GA
Fig. 18.4 Topological structure of BP neural network
f 1(x)
x1
1
x2 2
xm
(x)
(x)
k
(x)
f 2 (x)
f o(x)
management vulnerability. Experts are organized to form an evaluation group to make an evaluation of each factor. The values in (0, 1) are supposed as the risk evaluation results, i.e., the better the results are, the lower the risk of information subject’s success is. The second layer is the input layer of BP network, five nodes are taken. The third layer is the output layer, one node is taken, the output characteristic function is s-style function, the output value is information construction risk degree, which are the continual numbers in (0, 1). ω1i j (i = 1, 2, 3; j = 1, 2, 3) is the weight of input layer, o1i (i = 1, 2, 3) is the node of input layer, ω2 j ( j = 1, 2, 3) is the weight of middle layer, o2i (i = 1, 2, 3) is the node of middle layer, θ 2 j ( j = 1, 2, 3) is the weight of the second middle layer, θ3 is the weight of output layer, ω0 is risk degree. The mathematical expressions of the node are:
18 Research on Voice Mark Recognition Algorithms …
2 − 1, u2 = ω1i j o1i − θ 2 j ( j = 1, 2, 3) j −u2 1+e j i=1
161
3
o2 j =
1 , u3 = ω2 j o2 j − θ 3 ( j = 1, 2, 3) 1 + e−u3 j=1
(18.2)
3
ω=
(18.3)
18.4 Design of Genetic Algorithm Genetic algorithm contains six basic parts: coding, initial population, fitness function, genetic manipulation and parameter control and end rules [7]. Coding is the bridge of problem and algorithm. In order to carry out genetic searching in a big space and improve algorithm accuracy, float-encoding is employed. Each individual of the initial population is composed of 40-bit binary strings, the value is determined as follows: generate a random number in (0, 1), if it is bigger than 0.5, the value of this bit is 1, or it is 0. Now the most used method is based on the penalty function idea. As the problem talked about in this paper is the least error optimization problem, then the fitness function can be expressed as: Min f =
1 , ei = ωi − ωi (i = 1, 2, · · · , m) m ei2
(18.4)
i=1
Genetic manipulation contains five parts, they are selection, crossover, changing step, mutation, and population update. Selection: individual is selected by roulette wheel method. Step setting: using scaling method. Crossover: in an even crossover way, the mask sample is generated at random. Mutation: mutation bit is selected randomly, then get the bit value reversed. Population update: using pareto solution reserve strategy, the parent pareto solution is taken directly into child generation. Two terminated rules are adopted here, for one rule, when the difference of the maximum and minimum of the object function is smaller than given precision1e-6, the algorithm is considered convergent, and the program is terminated. For the other rule, suppose the largest genetic generation value is 500, when the iteration time of algorithm reaches 500, the program is terminated. The operation results are shown in Fig. 18.5.
162
K. Yanfeng et al.
Fig. 18.5 Results of niche genetic algorithm
18.5 Simulation Analyses of an Example with BP-GA Algorithm Take 200 samples in speech database as feature vector in this example. The object output of real man is 1, and the false man is 0, the criterion for recognition is 0.5, make up MATLAB simulation procedure, the initial population is 500, crossover rate is 0.5, mutation rate is 0.042, the step scale is 0.8, take tansig function as the transfer function in the network’s anonymous layer and satlin function as the transfer function in the output layer, which is single output to improve training speed. The optimized training results of BP neural network is shown in Fig. 18.6. From Fig. 18.6, the false rate of recognition is about 1.6% using wavelet-BP-GA algorithm. To test the validity of the algorithm, we made a comparison with MFCC algorithm and LPCC algorithm, the detailed data of the experiment are shown in Table 18.1.
18.6 Conclusions Wavelet analysis can fully extract the signal’s time-domain and frequency-domain features; BP network can make the recognition system perfect by gaining experience from fault; GA algorithm can improve the learning speed and convergence rate, decrease the network’s false rate, and shorten the voice system’s self-learning time. In short, compared with LPCC and MFCC, the wavelet-BP-GA pattern has the advantage of fast recognition velocity, high recognition rate, low fault rate, automatic error correcting, and strong robustness with different voice-makers.
18 Research on Voice Mark Recognition Algorithms …
163
Fig. 18.6 Optimized training results of BP neural network
Table 18.1 Comparison of wavelet-BP-GA with MFCC and LPCC Recognition pattern
Training time/s
Recognition time/s
Recognition rate (%)
False rate (%)
MFCC
–
11.51
89
5.1
LPCC
–
12.23
87
4.4
Wavelet-BP-GA
325.86
9.16
95
1.58
References 1. Cao, X.L., Li, J.H.: Wavelet packet based feature extraction for voice recognition. Comput. Simul. 27(11), 324–327 (2010) 2. Zhang, Q., Jiang, S.X., Li, X.S., et al.: Spark plug gap identification based on wavelet and neural network. Comput. Simul. 34(4), 176–178 (2017) 3. Hang, H.: Modern Speech Signal Processing, 2nd edn. Electronic Industry Press, Beijing, China (2014) 4. Koushan, K.: Automatic hull from optimization towards lower resistance and wash using artificial intelligence. In: Proceedings of FAST 2003, 7th International Conference on Fast Seat Transportation, Ischia, Italy, pp. 120–135 (2003) 5. Li, Z.H., Zhang, T.S.: Research of fitness sharing niche genetic algorithms based on extension clustering. J. Harbin Inst. Technol. 48(5), 178–180 (2016) 6. Kong, Y.F., Zhang, Z.S., Cheng, G.T.: Optimization design of elastic constrained mechanism in Swashplate engine based on neural network and genetic algorithm. In: 2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering. IEEE, Chang Chun (2010)
164
K. Yanfeng et al.
7. Kong, Y.F., Zhang, Z.S., Cheng, G.T.: Optimization design of elastic coupling in swashplate engine based on BP-GA. Appl. Mech. Mech. Eng. 32, 2380–2385 (2010) 8. Wang, S.H., Jing, X.X., Yang, H.Y.: Study of isolated speech recognition based on deep learning neural networks. Appl. Res. Comput. 32(8), 2289–2292 (2015) 9. Deb, K., Pratap, A., Agarwal, S., et al.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Chapter 19
Single Image Super Resolution via Deep Convolutional Dual Upscaling Branches with Different Focus Zuzhong Liang, Hui Liu, Zhenhong Shang, and Runxin Li
Abstract Deep Convolutional neural networks (CNN) have demonstrated that they possess outstanding reconstruction accuracy and computing efficiency on single image super resolution. However, CNN models always result in difficult training process due to its deep layers (beyond 30 sometimes) and high-resolution space computing environment. In this point, a super resolution model which has dual branches and which is based on deep CNN is proposed. Specifically, one branch reconstructs image details using recursive module and Network in Network (NIN) in low-resolution space, when another technically transmits low-frequency information by training an upscaling filter. By means of end-to-end training with a relatively shallow network structure, the proposed algorithm steps into state-of-the-art performance while achieving highly efficient computation.
19.1 Introduction In a narrow sense, single image super resolution (SR) is a process to restore loss detail in downsampling low-resolution (LR) image by means of a specific reconstructionbased algorithm. SR has its wide applications in multiple fields, varying from line fitting [1] to complicated satellite and aerial imaging [2]. In computer vision community, SR follows the evolutionary rules of gradual development. For example, the fundamental interpolation-based SR methods always have unimpressive performance; the optimization process is unfortunately time-consuming although reconstruction-based SR methods [3, 4] own an impressive improvement. Over the past several years, SR methods based on convolutional neural network (CNSR) attach lots of attention for its strong data generalization. CNSR assumes that missing pixels in LR could be interposed and then modulated precisely via a nonlinear mapping supervised learning. In 2015, Dong et al. [5] introduced a deep convolutional Z. Liang · H. Liu (B) · Z. Shang · R. Li Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650504, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_19
165
166
Z. Liang et al.
network in SR (SRCNN) for the first time. Its reconstruction performance can be comparable with many traditional non-machine learning algorithms, even though merely three layers are employed. After that, there are an increasing number of example-based SR works. In [6, 7], global residual learning and skip connection are introduced, [8] brings in sparse coding, etc. However, while certain meliorations have got on the basis of SRCNN, their model structure could optimize purposely for details reconstruction. On the one hand, predefined upscaling filters, always mean the cubic spline interpolator, are applied beforehand to increase the resolution in advance. Similar to conventional non-CNSR [9, 10] methods, such operation fails to provide useful high-frequency information for SR, or worse yet, it raises distortions at times. On the other hand, since LR and high-resolution (HR) images could all be loosely divided into highand low-frequency information and the former matters more than the latter in SR [11], if two of them share the limited computing resource equably, the available computational power could not be made full use of. In other words, by scheduling the resource reasonably, improvement could be attained unceasingly on this work. In this paper, we propose to do some structure optimization on the basis of earlier researches according to the inadequacies analyzed above. Firstly, network would be modeled as dual branches (and therefore the model abbreviated to DBSR). One named residual learning branch (RLB) is mainly used to construct those intricate details in SR by extracting features from LR images. The second planar component branch (PCB) coarsely restores the general outline of HR image by training a magnifying filter when taking LR image as input. As a whole, RLB and PCB manage information in LR domain and expand the size of image at the end through an upsampling layer, thus periodic degradation on texture and discontinuity could be prevented slightly as feature extraction from amplifying images would cause some false originally. What’s more, tiny content abundant regions would be restored precisely when information within them are paid much attention, and in turn less building blocks are needed overall as each of them is highly efficient. Finally, we introduce a feature extraction module (FEM) with recursive connection in RLB, and the number of feature maps inside FEM regularly decreases along the depth, causing time saved vastly. After that, Details construction module (DCM) is proposed to reduce entire parameters just as NIN [12] and to save time in convolution eventually. As a result, the proposed network achieves 0.80 dB higher on test dataset Set5 in qualitative performance compared with FSRCNN. Further, their average running time has no much difference. In conclusion, our main contributions could be summarized as follows: A network with dual and complementary branches is put forward, and the allocation of computing resource has been taken into account when designing the whole structure. We optimize the quota of kernel numbers used in feature extraction and further introduce the NIN module to construct that elaborative information. Thanks to the employment of convolution in LR domain and with the help of welldesigned modules in RLB, the entire time cost on image processing is competitive when compared with other algorithms.
19 Single Image Super Resolution via Deep Convolutional …
167
Afterward, the remainder in this work would be organized as follows: Sect. 19.2 briefly describes the piece of thoughtlet which is related to our proposed structure; Sect. 19.3 focuses on the network description, and then formulates those modules mathematically. Parameters settings adopted on the process of training and experimental results would be demonstrated after figuring out the inner architecture of DBSR in Sect. 19.4. At last, we make a summary and prospect in Sect. 19.5.
19.2 Related Works 19.2.1 Variant Shortcuts Stacking a cascade of convolutional block is an intuitively straight way for impressive performance in SR while larger receptive fields are explored and complex nonlinear mapping function could be fitted in this direction. However, slowly updated parameters, especially for those at the bottom layer (near the input), resulting from gradient vanishing made difficult training process come about. For this reason, enlightened by local residual learning put forward by He et al. [13], Kim et al. [6] combine the concept of global residual learning and gradient clipping in model structural discipline and break through the bottleneck. Benefitting from a deeper model, VDSR exploits a larger image contextual regions and thereby explores characteristic of high-level semantic features. References [11, 14, 15] could be classified into this category. Mao et al. [16] bring up another symmetry connection tentatively in RED30. They use n convolutional layers (CL) on the first half part for feature extraction and then n deconvolutional blocks (DB) whereafter for details reconstruction. The first CL connects to the last DB and the second CL to penultimate in DB, and so on, just like the upper portion of concentric rings. Shortcut connection like this conveys abstract feature information, under the same conditions, in a dramatically efficient manner because of those long span links. In DRRN [17], Tai et al. consider recursive supervision to overcome the drawbacks of purely increasing depth by stacking new layers, such as overfitting implemented in limited data and huge storing and retrieval burden. Based on this, they embedded recursive connections into an inference net, which concatenates all its former output of blocks, making all these recursions share the weights. We utilize a variant form of recursive connection in this work, details will be discussed in Sect. 19.3.2.
19.2.2 Skillfully Nesting Modules Conventional convolutional layer could be seen as a general linear module (GLM) and this module has its learning gaps in information abstraction [18]. To be specific,
168
Z. Liang et al.
many more filters are required for the representation of the same concept on the nonlinear manifold for covering all its shapes. For this reason, [12] proposes to replace GLM with a micro network (NIN), which puts an mlpconv layer among dual convolutional layers. As a generalized nonlinear function approximator, this additional network could be implemented with two fully connected layers. That is, 1 × 1 kernels are introduced in practice. Encouraged and inspired by [12], Szegedy et al. [18] design the Inception module for object recognition. Inception-based network tries to achieve filter level sparsity in dense matrixes computation as favorable properties on sparsity beneficially decrease difficulty in model training. After that, the nesting module (we indicate Inception here) is employed widely, such as [19, 20]. In [21], Li et al. aggregate multiscale local features acquired from the nesting module to construct high-frequency details information. A similar trick is adopted here in Sects. 19.3.2 and 19.3.4.
19.3 Proposed Method 19.3.1 The Entire Structure In a whole, the model proposed (named DBSR) transmits information on dual branches embedding recursive connection and NIN module inside, Fig. 19.1 demonstrates it in details. Specifically, DBSR is separated into two branches. We denote one of them as residual learning branch (RLB), which is used as high-frequency information extraction and reconstruction. What’s more, the planar component branch (PCB) conveys those general outlines to HR domain. Furthermore, RLB is composed of feature extraction module (FEM) and details construction module (DCM). FEM makes full use of short- and long-scale texture information by applying numerate kernels to input maps and combining all those output maps in ways of channel stacking. Subsequently, DCM transmits these concatenated maps to several parallel paths, when each of them is an independent convolutional structure to render much more complex nonlinear mapping. convoluion+bias+prelu Details Construction Module, DCM
deconvoluion+bias+prelu
Subpixel
Concatenate
3×3 5×5
3×3
1×1
1×1 1×1
3×3
3×3
Concatenate
3×3
Planar composition Branch, PCB
3×3
Feature Extraction Module, FEM
3×3
Residual Learning Brahch, RLB
Fig. 19.1 The proposed dual branches structure for super resolution (DBSR)
19 Single Image Super Resolution via Deep Convolutional …
169
At last, tensor is delivered to the subpixel layer for further processing and pixels along 4 channels (when scaling factor is 2) are sequentially chose as one of corner pixel in SR on specific position, its output is the desired abundant high-frequency component. Adding to the output of transposed convolutional layer from PCB, the estimated SR image attained definitively.
19.3.2 Residual Learning Branch This branch takes LR image (a gray or color object) as input, treated by module FEM and DCM described above and then generate the upscaling high-frequency residual composition, this branch is extraordinarily vital in our SR reconstruction. Module FEM has 15 deep convolutional blocks, each of which consists of one convolutional layer, a bias term, and one parametric rectified linear unit. He et al. [13] consider that the entire convergence rate relates significantly to proper parameter selection in the model establishment. Therefore, we try to decrease regularly the filters applied in FEM from 196 to 48, this novel attempt not only ensures normally transitional consistency of information in image, but importantly extracts the local feature such as edges and texture without loss of efficiency and accuracy. DCM takes the concatenating feature maps from FEM as input. Several (3, to be exact) independent paths are adopted inside and they all can be seen as a single sub-network which is commonly named network in network (NIN). Each path in NIN maps the bulky channels to a relatively small size with filters 1 × 1 primarily. And for the subsequent 3 × 3, 5 × 5 filters, they broaden the receptive fields of each pixel in the current layer. What’s more, concatenating those three nesting networks is a practical undertaking that validates the case study, that is, the sophisticated network topology trying to approximate a sparse structure could be achieved with filter level sparsity and at the same time utilize computations on dense matrices. After remapping by NIN, residual information can be attained. Subpixel layer handles this information by rearranging pixels and restoring multiple channels into one channel for the gray image.
19.3.3 Planar Component Branch Alike RLB, LR images are fed into PCB patchwise. Then the transposed convolutional layer is prone to convey the planar message from LR to HR domain through learning an upsampling filter adaptively (in a way dislike subpixel layer). Intention for PCB intervention is to ease the training burden of RLB and to distribute the computing resource reasonably. By adding the results of two branches (outputs from DCM and PCB, to be specific) elementally, the network could eventually output the expected HR image.
170
Z. Liang et al.
19.3.4 Mathematical Formulation The structure takes LR image x ∈ R H ×W ×C as input, and ultimately outputs the corresponding estimated HR yˆ = Rr H ×r W ×C , where H, W, C denote the height, weight, and channel(s) of the input, respectively, and r the expected scaling factor. The entire network can be divided into three parts (modules/sub-networks) and formulated separately. FEM, DCM, and PCB correspond to functions f FEM , f DCM , f PCB respectively. Specifically, FEM inputs LR images x and output tensor T 1 . We use φi that indicates the mapping process of ith convolutional building block in FEM and totally D1 blocks are employed here. Since blocks inside are related to each other front and back, for example, φi interrelates φ1 , . . . , φi−1 intrinsically, this relation can thereby be formulated as φ d (x) = (φ ⊕ φ ⊕ · · · ⊕ φ)
(19.1)
where φ d (x) is the interaction of d building blocks, symbol ⊕ connects blocks in chain. In endmost FEM, merging all its former maps into one-fold tensor through recursive connection, T1 =
Di
φi
(19.2)
i=1
Later on, details construction module DCM takes T 1 as input and through the paths inside, outputs the high-frequency residual composition yres . This module consists of one NIN module and an upsampling layer, where NIN can be denoted as T2 =
D NN
φkdk (T1 ) = yres
(19.3)
k=1
where k means the kth path, DNIN is the number of path, while d k the layer numbers in that path. After the integration of extracted feature maps, the subpixel layer is applied to obtain the residual SR images in HR domain. As for planar component branch PCB, input LR image x just as branches FEM. Since involving merely the transposed convolutional layer, we could express simply the mapping as f 3 (x) = f deconv (x) = yup
(19.4)
while f deconv denotes the transposed operation and yup as output. At last, total structure could then expressed as y = f DBSR (x) = yup + yres = f DCM ( f FEM (x)) + f PCB (x)
(19.5)
19 Single Image Super Resolution via Deep Convolutional …
171
19.4 Experimental Results 19.4.1 Datasets Following the training data used by [15], we use 91 flower images and the Berkeley segmentation dataset BSD200. Timofte et al. [22] confirmed that the fitting capacity of a trained model could be enhanced in limited data by implementing data augmentation. Therefore, we apply the flipping and rotation operation in the horizontal and vertical direction in 291 images during the data preparation period, producing a dataset approximately eight times larger than before. For model training period, Set5 is applied to validate the loss value on each individual epoch, and then evaluate the performance as well as observe the convergence status accordingly. Batches input into model contains only chrominance channel converted from color training data, for the sake of low RAM consumption. In the model testing stage, Set14, BSD100, and Urban100 are chosen and data augmentation is proceeded for every single image at the expense of a larger duration. To evaluate the performance of the designed structure over all datasets, peak signalto-noise ratio (PSNR) and structural similarity (SSIM) are employed as a metric.
19.4.2 Parameters Settings In the data preparation period, cutting 48 × 48 image patches with stride 16 for an image in training data, 1e5 patches are input for each training epoch and the batch size is fixed to 64. Referring to the practice work of He et al. [13], all bias terms and parametric activation functions are initialized to zeros. Carrying out the training procedure randomly throws away the neuro unit to avoid overfitting by setting up dropout rate p = 0.8. Mean square error is used to measure the differences or the distances between the generative estimated HR patches and those real ones, so as to calculate the loss values eventually. We employ a relatively little high learning rate 2e−3 as a starting point when considering the application of recursive connections and gradient clipping. Each 10 epochs are treated as one stage and the learning rate would be decreased in half for each stage until its value is less than 2e−5. Arriving at the last stage, 60 epochs have reached for each scaling factor and the mean time is 28 h under the platform GeForce GTX Titan Xp, 12G. In FEM, we use D1 convolutional blocks and decide the number of feature maps on each block based on Mi = M1 − M D1 (i/(D1 − 1))1/r , i = 0, . . . , D1 , where M1 = 196, M D1 = 148 points to the number of the first and the final layer, respectively, and r controls the rate of change between neighboring layers (the larger the r, the faster it declines into M D1 ). In this work, we empirically set r = 1.5. Table 19.1 shows the inner feature numbers in FEM when D1 = 7 (though we apply 15 in practice). For DCM in branch RLB, all three paths are using 1 × 1 filters at first to reduce the bulky dimension. Referencing to the second path, 64 filters with size 1 × 1 are
172
Z. Liang et al.
Table 19.1 Number of feature maps for each block in FEM when depth inside is 7
Depths in FEM
Feature maps
Depths in FEM
Feature maps
0
196
4
94
1
155
5
77
2
131
6
62
3
111
7
48
firstly employed, then the 32 filters whose kernel size is 3 × 3 and so forth. In this way, this network can map a stronger representation, even with a lighter parameters consumption.
19.4.3 Results and Comparisons Generally, we evaluate model performance from three aspects: the objective quantitative index, the subjective visual effect, and the average running time on Set5. For SRCNN [5], the kernel size is 9, 5, and 5 in all three layers, respectively, and for DRRN [17], one recursive block stacks 9 residual units. Table 19.2 lists calculating PSNR/SSIM values in natural scenes like Set5, Set14, BSD100 and in human-made building Urban100, at scaling factor ×2, ×3, ×4 separately. From the available statistics, we conclude that our DBSR method is superior to the state of the art in SISR for the most part, especially on Set5 and Set14. However, DBSR tends to lose its lead in a large magnification, like ×4 in dataset Urban100. Table 19.2 Performance comparison (PSNR/SSIM) between the proposed DBSR and the other state-of-the-art works on several standard testing datasets. Bold black indicates the best performance among Datasets
Scale
Bicubic
FSRCNN
VDSR
DRRN
Ours
Set5
×2
33.64/0.9299
37.05/0.9562
37.56/0.9587
37.66/0.9589
37.74/0.9592
Set14
×2
30.22/0.8688
32.66/0.9095
33.02/0.9124
33.19/0.9133
33.20/0.9136
BSD100
×2
29.55/0.8431
31.53/0.8920
31.89/0.8960
32.01/0.8969
32.02/0.8969
Urban100
×2
26.66/0.8403
29.88/0.9027
30.76/0.9140
31.02/0.9164
31.09/0.9171
Set5
×3
30.39/0.8682
33.18/0.9143
33.67/0.9213
33.93/0.9234
33.98/0.9237
Set14
×3
27.53/0.7742
29.37/0.8240
29.77/0.8314
29.94/0.8339
29.95/0.8342
BSD100
×3
27.20/0.7385
28.53/0.7917
28.82/0.7976
28.91/0.7992
28.86/0.7987
Urban100
×3
24.46/0.7349
26.43/0.8081
27.13/0.8279
27.38/0.8331
27.39/0.8333
Set5
×4
28.42/0.8104
30.72/0.8664
31.35/0.8838
31.58/0.8864
31.65/0.8868
Set14
×4
25.99/0.7027
27.61/0.7552
27.99/0.7674
28.18/0.7701
28.19/0.7703
BSD100
×4
25.96/0.6675
26.98/0.7153
27.28/0.7251
27.35/0.7262
27.34/0.7261
Urban100
×4
23.14/0.6577
24.62/0.7281
25.17/0.7524
25.35/0.7576
25.30/0.7572
19 Single Image Super Resolution via Deep Convolutional …
173
Figures 19.2, 19.3 and 19.4 compare “mainstream” algorithms with DBSR in visually and content-specifically. The second row in Fig. 19.2 cutting from the leopard print is specially used to validate the construction performance on intricate structures. References [5, 23, 24] happen to texture distortion seriously for their shallow depth and oversimplified structure. In VDSR and DRRN, construction result gets some improvement apparently, but obvious separation also exists inside the white line and the black blocks when comparing with DBSR though much more convolutional blocks are invested. Figures 19.3 and 19.4 focus on the comparisons in serried stripes and sharpened edge, respectively, and further validate the visual superiority of our method.
Original Bicubic SRCNN (PSNR/SSIM) (21.17/0.6030) (23.79/0.7087)
ESPCN (24.10/7224)
Ours(DBSR) FSRCNN VDSR DRRN (25.18/0.7762) (27.02/0.8451) (27.18/0.8485) (27.27/0.8493)
Fig. 19.2 Super resolution comparison (Image ‘134035’ from BSD100, scaling factor: ×3)
Ours(DBSR) Original Bicubic SRCNN ESPCN FSRCNN VDSR DRRN (PSNR/SSIM) (21.10/0.7046) (21.77/0.7540) (21.76/0.7542) (22.14/0.7738) (22.48/0.7919) (23.74/0.7999) (23.84/0.8120)
Fig. 19.3 Super resolution comparison (Image ‘059’ from Urban100, scaling factor: ×3)
Original Bicubic SRCNN (PSNR/SSIM) (30.33/0.7193) (31.08/0.7469)
ESPCN (32.32/7814)
Ours(DBSR) FSRCNN VDSR DRRN (31.56/0.7631) (32.64/0.7884) (32.75/0.7888) (32.76/0.7915)
Fig. 19.4 Super resolution comparison (Image ‘head’ from Set14, scaling factor: ×4)
174
Z. Liang et al.
19.4.4 Ablation Analysis Figure 19.5a measures how situations of different depth with full recursive connections and of numerate connections for a specific depth in FEM impose on PSNR value, when the paths in NIN is a constant (3, to be exact). As illustrated on line “Full Connection”, stacking continuously the basic blocks in FEM has a positive impact while depth is set to be in range [5, 20], but reversal arises once depth exceeds 20. Therefore, we consider that stacking the blocks with the help of recursive connection is somehow distinct from that by the use of skip connection in terms of avoiding vanishing gradient. Network consisting of global residual connection can handle that problem effectively, like VDSR, but the recursive block is preferred for its high efficiency in qualitative improvement. For the curve “Partial Connection”, uniformly only the last five feature maps are concatenated for different depths. We could observe that the gap between full and partial connection is widened from 10 to 25, which indicates that the coarser feature maps matter as with those elaborate ones. Arriving at the depth 25, PSNR value decreases severely, it may attribute to the vanishing gradient like the one in the plain network. As for Fig. 19.5b, we describe the influence of path in DCM on PSNR briefly. Strictly speaking, the more paths we utilize, the remarkable on performance promotion in range [1, 3]. However, performance becomes unstable when paths are in [3, 5], we hold that further block should be stacked behind NIN (not just only one 1 × 1 filter) as concatenation procedure synthesizes complicated multiscale information in time.
Fig. 19.5 Alternative parameter settings in RLB diversify the model’s performance in different ways. a Testing on dataset Set5, the curve “Full Connection” surveys the relation between various depths in FEM and PSNR values, when “Partial Connection” dedicates to observing situation of varying building blocks with identical recursive connection (we use 5 hereafter). b Relation between paths in DCM and PSNR values
19 Single Image Super Resolution via Deep Convolutional …
175
In a nutshell, depth in FEM and the recursive connection is a relatively important factor for performance improvement, when the NIN strategy takes the second place, Fig. 19.6 validates this trend visually and developmentally. Figure 19.7 draws a scatter diagram to measure the average running time (in platform 16 GB 3.5 GHz, Intel i7-7800K CPU) on datasets Set5 when the scaling factor is 3. As a reference, DBSR1 (marked with red circle) employed 15 layers in FEM, costs less time, and has prominent PSNR compared with other algorithms processing on LR domain, such as VDSR and DRRN. The number of building blocks is 20 for DBSR2 , DBSR3 uses identical 128 filters for each all 15 layers and for DBSR4 , all recursive connections are discarded. Without loss of generalization, the alleged proposed algorithm pointing to DBSR1 balances efficiency and performance reasonably.
Fig. 19.6 Different parameter settings cause various performance in DBSR. a Image ‘092’ from Urban100; b HR patches; c D1 = 10 and the last five recursive connection (RC) employed in FEM and DNIN = 3 in DCM; d D1 = 15 and the last five RC employed in FEM, DNIN = 3; e D1 = 15 and has a full RC, DNIN = 1; d the ultima DBSR (D1 = 15 with full RC, DNIN = 3)
Method
34.0
DBSR1
DBSR2
DRRN[12] DBSR3 VDSR[6]
DBSR4 PSNR(dB)
Fig. 19.7 Average running time on dataset Set5. DBSR1 (marked with red circle) consists of 15 convolutional blocks in feature extraction module (FEM), while DBSR2 has 20; DBSR3 does not have a feature maps’ reduction, while DBSR4 cut off all recursive connections compared with DBSR1
33.5 ESPCN[24]
33.0
TNRD[9]
FSRCNN[23] SCN[8]
32.5 0
1
2
SRCNN[5] 3
The Average Running Time (second)
176
Z. Liang et al.
19.5 Conclusions We propose a dual upscaling branches deep convolutional network for super resolution. A relatively impressive results have attained both in qualitative and quantitative evaluation index, and recursive connection and nesting network abbreviated to NIN in RLB hold the key in this method. To simplify and optimize the structure, two kinds of upsampling layers are employed, which results in shorter running time and highly efficient resource occupation. Thereafter, other types of connections, such as dense connection, could be embedded tentatively for extracting much more high-frequency information in a limited number of layers. And the network here could be extended into the problem of denoising as well as image deblocking. Acknowledgements This work is supported by Astronomical Big Data Joint Research Center, cofounded by National Astronomical Observations, Chinese Academy of Sciences and Alibaba Cloud.
References 1. Aghajan, H.K., Kailath, T.: Sensor array processing techniques for super-resolution multi-linefitting and straight edge detection. IEEE Trans. Image Process. 2, 454–465 (1993) 2. Zhang, H., Zhang, L., Shen, H.: A super-resolution reconstruction algorithm for hyper spectral images. Signal Process. 92, 2082–2096 (2012) 3. Gao, X., Zhang, K., Tao, D., Li, X.: Joint learning for single-image super-resolution via a coupled constraint. IEEE Trans. Image Process. 21, 469–480 (2012) 4. Chang, K., Ding, P.L.K., Li, B.: Single image super resolution using joint regularization. IEEE Signal Process. Lett. 25(4), 596–600 (2018) 5. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: ECCV, pp. 184–199 (2014) 6. Kim, J., Kwon Lee, J., Mu Lee, K.: Accurate image super-resolution using very deep convolutional networks. In: CVPR, pp. 1646–1654 (2016) 7. Kim, J, Kwon Lee, J., Mu Lee, K.: Deeply-recursive convolutional network for image superresolution. In: CVPR, pp. 1637–1645 (2016) 8. Wang, Z., Liu, D., Yang, J., Han, W., Huang, T.: Deep networks for image super-resolution with sparse prior. In: ICCV, pp. 370–378 (2015) 9. Chen, Y., Pock, T.: Trainable nonlinear reaction diffusion: a flexible framework for fast and effective image restoration. IEEE TPAMI. 39, 1256–1272 (2017) 10. Radu, T., Vincent, D.S., Luc, V.G.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: ACCV, pp. 111–126 (2014) 11. Huang, H., He, R., Sun, Z., Tan, T.: Wavelet-SRNet: a wavelet-based CNN for multi-scale face super resolution. In: ICCV, pp. 1698–1706 (2017) 12. Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (2014) 13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp. 1026–1034 (2015) 14. Lim, B., Son, S., Kim, H., Nah, S., Lee, K.M.: Enhanced deep residual networks for single image super-resolution. In: CVPR Workshops, pp. 1132–1140 (2017) 15. Lai, W., Huang, J., Ahuja, N., Yang, M.: Deep Laplacian pyramid networks for fast and accurate super-resolution. In: CVPR, pp. 5835–5843 (2017)
19 Single Image Super Resolution via Deep Convolutional …
177
16. Mao, X., Shen, C., Yang, Y.: Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In: International Conference on Neural Information Processing Systems, pp. 2810–2818 (2016) 17. Tai, Y., Yang, J., Liu, X.: Image super-resolution via deep recursive residual network. In: CVPR, pp. 2790–2798 (2017) 18. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015) 19. Tang, P., Wang, H., Kwong, S.: G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition. Neurocomputing. 225, 188–197 (2016) 20. Huang, G., Liu, Z., Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 2261–2269 (2017) 21. Li, S., Fan, R., Lei, G., Yue, G., Hou, C.: A two-channel convolutional neural network for image super-resolution. Neurocomputing 275, 267–277 (2018) 22. Timofte, R., Rothe, R., Gool, L.V.: Seven ways to improve example-based single image super resolution. In: CVPR, pp. 1865–1873 (2016) 23. Dong, C., Chen, C., Tang, X.: Accelerating the super-resolution convolutional neural network. In: ICCV, pp. 391–407 (2016) 24. Shi, W., Caballero, J., Huszar, F.: Real-time single image and video super-resolution using an efficient subpixel convolutional neural network. In: CVPR, pp. 1874–1883 (2016)
Chapter 20
Human Heart Model Rendering Based on BRDF Algorithm Wenjing Xiao, Zhibao Qin, Zhaoxiang Guo, Haoyuan Shi, and Yonghang Tai
Abstract With the development of science and technology, the medical level is becoming more and more advanced. Virtual medical technology is increasingly used in doctor training and surgical simulation. The core of rendering in virtual medical is that the rendering effect of the virtual target used for simulation should match the lighting effect of the object of the real environment. The traditional virtual medical training uses the organ or organ obtained by using the empirical model or the video image. The two rendering methods do not conform to the physical laws and the effect is greatly deviated from the real situation. The Brookfield model of Cook–Torrance is improved by combining the diffuse reflection model of Disney’s Bidirectional Reflectance Distribution Function(BRDF) algorithm, and the parameters of the algorithm model are adjusted. Continuously adjust the performance parameters on Unity 3D’s custom shader for rendering tests, resulting in a more realistic rendering result.
20.1 Introduction With the combination of computer graphics and virtual reality technology, virtual surgical systems are valued by more and more scientific research institutions. Due to limitations of operating space, range of viewing angles, and illumination conditions, traditional medical procedures are prone to major trauma to the organs, nerves, aorta, and bones where the surgical site is located, affecting the patient’s post-operative recovery. Virtual medical surgical systems are increasingly used in pre-operative simulations, intraoperative navigation, and post-operative review. The virtual medical surgical system can reduce the training cost of doctors, shorten the training time, improve the correct rate of disease diagnosis, and reduce the wounds of surgery in patients. W. Xiao · Z. Qin · Z. Guo · Y. Tai Yunnan Normal University, Kunming 650500, China H. Shi (B) Yunnan Nengtou Information Industry Development Co., Ltd., Kunming 650500, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_20
179
180
W. Xiao et al.
With the rapid development of today’s social game industry, traditional coloring models for game rendering are gradually being replaced by physics-based rendering, and the range of applications extends for game to animation and film. However, there are very few rendering applications for physics-based human tissue and organ in the virtual medical field. The content studied in this paper are based on physical human heart rendering. Most of the models used in traditional surgical systems are based on empirical models or methods based on captured video images [1]. The empirical model tends to use a specific probability formula to match a set of surface types, so the empirical models are mostly concise, the effects are biased towards idealization, do not satisfy the laws of physics, can only approximate the propagation of light, and need to measure reflections. Material parameters such as rate, normal map, and specular map, but there are huge individual differences in the medical field, and it is impossible to actually measure the real data onto each experimental body. The method of using video images is greatly limited by the quality of video images [1–5]. The above two rendering methods are used to indicate that the illumination effect of the internal organs of the human body under real illumination has great limitations.
20.2 Theoretical Research and Experimental Methods 20.2.1 Theoretical Research PBR refers to the use of realistic coloring models to accurately represent the realworld material, which can truly reflect the reflection characteristics of the object and the physical properties of the surface of the material, and establish a lighting model that conforms to the optical properties of the surface of the object in accordance with the real physical world. The proposed BRDF function is in good agreement with the requirements of expressing the real physical world. This function can be used to express the physical properties of the object’s reflection characteristics, surface reflectivity, and surface roughness. The BRDF function, the bidirectional reflection distribution function, was first proposed by Fred Nicodemus in 1956 to define the way light is reflected on opaque surfaces and is used in computer graphics to describe the reflection equation [6]. f (θi , φi , θr , φr ) =
d L(θr , φr ) d E(θi , φi )
(20.1)
As shown in the Formula (20.1), BRDF can also be defined as the ratio of the exit radiance to the incident irradiance, where θi and θr are the zenith angles of the incident light and the outgoing light, respectively, and ∅i and ∅r are the incident light and the output, respectively. The azimuth of the light is shown in Fig. 20.1.
20 Human Heart Model Rendering Based on BRDF Algorithm
181
Fig. 20.1 Schematic representation of the BRDF definition
From Eq. (20.1), we can see that the illumination effect of an object is determined by the irradiance of the surface of the light that is directed at the object.
20.2.2 Experimental Method Just mentioning the physics-based BRDF rendering will inevitably involve the Cook– Torrance model; the BRDF algorithm rendering studied in this paper is a modified Cook–Torrance model [7–9] combined with the Disney BRDF model. Disney has developed a model for diffuse reflection edge light processing, by modifying the grazing angle retro reflection, to determine the specific value achieved by roughness, thereby completing the smooth surface diffuse reflection of the Fresnel shadow and the rough surface. Diffuse reflection follows Fresnel’s law of refraction, and light to undergo two refractions as it enters the surface of the object and leaves the surface of the object. fd =
C 1 + (FD90 − 1)(1 − (n · l))5 1 + (FD90 − 1)(1 − (n · v))5 π
(20.2)
FD90 = 0.5 + 2(v · h)2
(20.3)
f d is diffuse reflection [10, 11], where C is the color value, → is the surface n normal vector, → is the vector pointing to the camera (observation point), → is the v
l
vector pointing to the light source, roughness is the surface roughness, and FD90 is the Fresnel reflectivity when the incident angle is 90°. For specular reflection f s , its Cook–Torrance form can be described as
182
W. Xiao et al.
fs =
DFG 4(n · v)(n · l)
(20.4)
where D is the Normal Distribution Function (NDF). The NDF defines the specular reflection range, the spot size, and other parameters that determine the appearance of the surface of the object. F is the Fresnel reflection coefficient, where Schlick approximation is used, and G is the geometric attenuation term [10], which describes the occlusion relationship of the surface of the object. In summary, the Cook–Torrance model can be expressed as two parts: diffuse f d and highlight reflection f s : f = kd f d + ks f s
(20.5)
The first term represents the diffuse reflection portion, the second term represents the highlight reflection portion, and kd and ks are the coefficients of f d and f s , respectively.
20.3 Experimental Results 20.3.1 Experimental Conditions The software used in this experiment is Unity 3D 2018. The hardware uses Intel(R) Core(TM) i7-6700 K CPU 4.00 Hz GPU NVIDIA GEFORCE 1070. We use Unity’s own shader, remove the native rendering model, enter our custom algorithm model, then render all the models.
20.3.2 Rendering Results and Analysis As shown in Fig. 20.2, the results of the two rendering models are significantly different. The effect of the standard rendering model rendering in the shader that comes with Unity 3D on the left is used in the custom shader of Unity 3D. The right side of Fig. 20.2 is the effect of rendering with the BRDF algorithm through the Unity 3D custom shader. The white arrow marks on the picture are the obvious differences, and the better ones are rendered on the right. We can see where the upper four arrows point; the right heart model is more hydrate than the left heart model, more in line with the visual effects of real heart organs under natural light; the middle and lower half of the arrow points to the rendering of the heart model in the shaded part, you can see that the right heart model has a clear advantage over the left model.
20 Human Heart Model Rendering Based on BRDF Algorithm
183
Fig. 20.2 Comparison of rendering results
20.4 Conclusions and Discussion From the rendered renderings, we can clearly see that the physics-based BRDF algorithm is closer to the visual effects of tissue and organs under real lighting conditions in the rendering of human tissues and organs, and has more advantages than other classical models that are not based on physics. It is more conducive to the simulated surgical training of the virtual surgical system, providing a similar real sense of surgical immersion, so as to get a better training effect. And by using open-source software such as Unity 3D, all source programs for custom shaders can be easily migrated to other platforms, and can be widely used in classes such as embedded platform robotics [12, 13]. However, this rendering effect is still insufficient, and cannot fully simulate the visual effect of internal organs in the real operating environment; because the amount of data rendered is relatively large, the performance requirements of the computer are relatively high; this rendering is only about visual simulation. There is no simulation involving touch. It is foreseeable that the next research direction is based on a combination of visual and tactile sensitivities that are closer to real tissue organs.
References 1. Jacques, S.L.: Optical properties of biological tissues: a review. Phys. Med. Biol. 58(11), R37–R61 (2013) 2. Lim, Y.J., Jin, W., De, S.: On some recent advances in multimodal surgery simulation: a hybrid approach to surgical cutting and the use of video images for enhanced realism. PRESENCE: Teleoperators Virtual Environ. 16(6), 563–583(2007) 3. ElHelw, M.A., Lo, B.P., Darzi, A., Yang, G.Z.: Real-time photo-realistic rendering for surgical simulations with graphics hardware. In: MIAR. LNCS, vol. 3150, pp. 346–352 (2004) 4. SP Imageworks: Physically-based shading models in film and game production (2010) 5. McAuley, S., Hill, S., Hoffman, N.: Practical physically-based shading in film and game production. ACM, 1–7(2012)
184
W. Xiao et al.
6. Nicodemus, F.E.: Directional reflectance and emissivity of an opaque surface. Appl. Opt. 4(7), 767–775 (1965) 7. Cook, R.L., Torrance, K.E.: A reflectance model for computer graphics. In: Computer Graphics (SIGGRAPH ’81 Proceedings), vol. 15, no. 3, July 1981, pp. 301–316 (1982) 8. He, X.D., Torrance, K.E., Sillion, F.X., Greenberg, D.: A comprehensive physical model for light reflection. In: Computer Graphics. Annual Conference Series, vol. 25, 175–186 (1991) 9. Marschner, S.R., Westin, S.H., Lafortune, E.P.F., Torrance, K.E., Greenberg, D.P.: Image-based BRDF measurement including human skin (1999) 10. Oren, M., Nayar, S.K.: Generalization of Lambert’s reflectance model. In: SIGGRAPH, July, pp. 239–246 (1994) 11. Torrance, K.E., Sparrow, E.M.: Theory for off-specular reflection from roughened surfaces. J. Opt. Soc. America 57, 1105–1114(1967) 12. Dey, N., Mukherjee, A.: Embedded Systems and Robotics with Open Source Tools. CRC Press (2017) 13. Mukherjee, A., Dey, N.: Smart Computing with Open Source Platforms. CRC Press (2019)
Chapter 21
Level Set Segmentation Algorithm for Cyst Images with Intensity Inhomogeneity Jinli Fang, Yibin Lu, Yingzi Wang, and Dean Wu
Abstract A new improved Chan-Vese model is given. We introduce local grayscale information to ensure better localization effect and extract target edge. Compared with the typical active contour models, the improved model can better segment cyst image with intensity inhomogeneity, the Dice Similarity Coefficient (DSC) values are higher than 0.9.
21.1 Introduction Level set methods based on curve evolution theory are widely used in segmenting medical images. Chan et al. proposed C-V model in [1]. It’s robust to noise, but can’t get good segmentation for heterogeneous images. In order to improve the accuracy of C-V model segmentation, Li et al. proposed a local binary fitting model in [2] (Local Binary Fitting, LBF), construct a local fit energy using a Gaussian window function and the difference between the pixel value and the mean. Later, Li et al. [3] further studied the choice of kernel function and the size of the local area and proposed the RSF model. Wang et al. [4] constructed local energy and added this local energy to the energy functional of the C-V model, and proposed a local C-V (Local C-V, LCV) model. In this paper, we give an improved model to segment heterogeneous cyst images. Experiments indicate that the algorithm can accurately and efficiently segment the images with intensity inhomogeneity.
J. Fang · Y. Lu (B) · Y. Wang Kunming University of Science and Technology, Kunming 650500, Yunnan, China e-mail: [email protected] D. Wu University of Electronic Science and Technology of China, Chengdu 611731, Sichuan, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_21
185
186
J. Fang et al.
21.2 C-V Model In this model, the energy functional was defined:
E C V (c1 , c2 , C) = λ1
|I (x) − c1 |2 d x + λ2 inside(C)
|I (x) − c2 |2 d x + μ · Length(C) outside(C)
(21.1) where I is the original image, C represents the evolution curve, c1 , c2 are the intensities of target and background. λ1 > 0, λ2 > 0, μ ≥ 0 are energy weights.
21.3 Improved C-V Level Set Model 21.3.1 Local Information The global variable is an arithmetic mean to represent the grayscale average of the target and the background. The image with intensity inhomogeneity maintains gray uniformity in local region, the introduction of local statistical information is advantageous. The local fitting function f in [2] can approximate the original image, reduce noise, and enhance edge features. The fitting gray value of each pixel depends on local region pixels by using the ergodicity and controllability of the Gaussian kernel function. The local term | f (x) − d1 |2 dx + | f (x) − d2 |2 dx (21.2) EL = inside(C)
outside(C)
where d1 , d2 represent the average intensities of the target and the background: d1 = d2 =
Ω f (x)H (φ(x))d x Ω
H (φ(x))d x
Ω f (x)(1
− H (φ(x)))d x (1 − H (φ(x)))d x Ω
where H is the Heaviside function.
(21.3) (21.4)
21 Level Set Segmentation Algorithm for Cyst Images …
187
21.3.2 Speed Function Construct a speed function V by the characteristics of the image f (x). When image gray changes gently, |∇ f | is smaller or even 0, V larger; when the curve is close to the edge, the gray changes greatly, |∇ f | is larger and V smaller. Defined as: V =
1 1 = |∇ f |2 + 1 |∇[Hε (φ) f 1 + (1 − Hε (φ)) f 2 ]|2 + 1
(21.5)
where ∇ is gradient operator.
21.3.3 Overall Energy Functional The overall energy equation can be defined ⎡
⎢ E = λ1 V ⎣
⎤
⎥ |I (x) − c2 |2 d x ⎦
|I (x) − c1 |2 d x +
inside(C)
⎡
⎢ + λ2 V ⎣
outside(C)
| f (x) − d1 |2 d x +
inside(C)
⎤ ⎥ | f (x) − d2 |2 d x ⎦ + μL(C) + ν R(C)
(21.6)
outside(C)
where the energy penalty term R(C) in [5] can avoid the process of reinitialization; and λ1 > 0, λ2 > 0, μ ≥ 0, ν > 0 are energy weights. Then the energy functional of Eq. (21.6) is ⎡ E(c1 , c2 , d1 , d2 , φ) = +λ1 V ⎣
|I (x) − c1 |2 Hε (φ(x))d x +
Ω
⎤ |I (x) − c2 |2 (1 − Hε (φ(x)))d x ⎦
Ω
⎡ ⎤ + λ2 V ⎣ | f (x) − d1 |2 Hε (φ(x))d x + | f (x) − d2 |2 (1 − Hε (φ(x)))d x ⎦
Ω
δε (φ(x))|∇φ(x)|d x + ν
+ μ· Ω
Ω
Ω
1 (|∇φ| − 1)2 d x 2
where δ is the Dirac delta function and also the derivative of H . Using the gradient descent, the curve evolution equation can be obtained:
(21.7)
188
J. Fang et al.
∂φ = λ1 V δε (φ) −(I (x) − c1 )2 + (I (x) − c2 )2 +λ2 V δε (φ) −( f (x) − d1 )2 + ( f (x) − d2 )2 ∂t
∇φ ∇φ + ν φ − div (21.8) + δε (φ) μdiv |∇φ| |∇φ|
where div is divergence operator, ∇ is gradient operator, is Laplacian operator.
21.4 Experimental Results and Analysis The experiments were made by Matlab R2018a on a computer with Lenovo-PC Intel(R) Core(TM) i3-4000M CPU 2, 40 GHz, 8G RAM, Windows 10 operating system. Firstly, segment cyst images by the improved model, C-V model, and the three models were constructed with local information, including LBF model, RSF model, and LCV model. The iteration number of all experiments was 30, and the results are shown in Fig. 21.1. For all experimental images by the improved model, the parameters are t= 0.1, ε= 1, ν= 1, μ= 0.01 × 2552 , λ1 = 0.1, λ2 = 1. The segmentation results are quantitatively compared with the expert manual segmentation results, using two indicators in [6, 7] area-based DSC (Dice Similarity Coefficient) and edge-based MSSD (Mean Sum of Square Distance): DSC = MSSD =
2 · (R A ∩ R B ) RA + RB
N 1 2 D (C A , C B (xn )) N n=1 2
(21.9)
(21.10)
where R A , R B are the reference mask region and the result mask region, respectively; C A , C B are the reference contour and result contour, respectively; N is the size of the result contour. The experiments show that improved model is closer to the target boundary. From Table 21.1, it’s superior to other models in segmentation accuracy and segmentation efficiency. The DSC values of this method are higher than 0.9, and the MSSD values are much smaller than other models. The method can be derived from Table 21.2 that the average segmentation efficiency is 60.1 % higher than the traditional C-V model. Then the proposed model is compared with several representative level set models, including the Caselles et al. [8], Lankton and Tannenbaum [9], Shi and Karl [10]; results are shown in Fig. 21.2. The iteration number of all experiments was 50. Figure 21.2 shows that the improved model outperforms the other models. From Table 21.3, it can be seen that the segmentation accuracy of the improved method is better.
21 Level Set Segmentation Algorithm for Cyst Images …
a. Initial contour
b. C-V model
c. LBF model
d. RSF model
189
e.LCV model
f. Improved model
Fig. 21.1 Comparison of results. The first to third rows are renal cysts. The fourth to sixth rows are uterine cysts
21.5 Conclusion A synergetic level set algorithm based on global and local terms is proposed. Global term is from traditional C-V algorithm. And local fitting term is introduced to allow the local gray-scale information to embed into driving force, which can reduce the phenomenon that the expansion force and the contraction force are mutually restricted at the non-boundary especially for images with intensity inhomogeneity. A speed function is presented to adjust the evolution speed and make the evolution curve smoother. Experiments demonstrate the improved algorithm is better for heterogeneous cyst images. In the following work, we will involve denoising heterogeneous medical images; research segmentation methods about other theories [11–13]; large-scale datasets to enhance robustness.
190
J. Fang et al.
Table 21.1 DSC and MSSD values for each segmentation result of Fig. 21.1 Images
C-V
Renal cyst1
DSC
Renal cyst2
DSC
MSSD MSSD Renal cyst3
DSC MSSD
Uterine cyst 1
DSC
Uterine cyst 2
DSC
MSSD MSSD Uterine cyst 3
DSC MSSD
LBF
RSF
LCV
Our model
0.732
0.845
0.472
0.870
0.920
319.209
35.449
1039.970
23.896
15.135
0.906
0.604
0.262
0.855
0.931
69.563
159.966
1685.598
22.308
9.026
0.887
0.901
0.353
0.856
0.926
28.160
20.404
1427.840
34.173
13.652
0.898
0.769
0.099
0.866
0.943
98.070
365.899
2007.754
185.528
54.949
0.787
0.869
0.364
0.876
0.931
373.440
47.183
1863.133
30.490
7.692
0.703
0.821
0.344
0.797
0.907
466.432
209.302
3023.079
268.667
53.968
Table 21.2 Comparison of running time (unit: s) Images
C-V
LBF
RSF
LCV
Renal cyst 1
14.73
15.39
11.12
14.85
Our model 5.63
Renal cyst 2
15.80
14.97
16.03
15.63
5.91
Renal cyst 3
15.24
13.67
15.28
13.90
5.46
Uterine cyst 1
16.71
17.01
17.11
14.93
10.98
Uterine cyst 2
14.86
10.16
16.23
14.10
5.70
Uterine cyst 3
28.18
22.45
24.62
23.13
6.76
a. Initial contour
b. Caselles
c. C-V model
d. Lankton
e. Shi
f. Improved model
Fig. 21.2 Comparison of results. The first to second rows are uterine cysts. The third row is hepatic cyst
21 Level Set Segmentation Algorithm for Cyst Images …
191
Table 21.3 DSC values for each segmentation result of Fig. 21.2 Images
Caselles
C-V
Lankton
Shi
Our model
Uterine cyst 4
DSC
0.897
0.929
0.720
0.915
0.922
Uterine cyst 5
DSC
0.741
0.821
0.352
0.908
0.937
Hepatic cyst
DSC
0.857
0.904
0.522
0.895
0.961
Acknowledgements Project 11461037 supported by National Natural Science Foundation of China.
References 1. Chan, T.F., Vese, L.A.: Active contours without edges. IEEE Trans. Image Process. 10(2), 266–277 (2001) 2. Li, C., Kao, C.Y., Gore, J.C., Ding, Z.: Implicit active contours driven by local binary fitting energy. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007) 3. Li, C., Kao, C.Y., Gore, J.C., Ding, Z.: Minimization of region-scalable fitting energy for image segmentation. In: IEEE Transactions on Image Processing, vol. 17, no. 10, pp. 1940–1949 (2008) 4. Wang, X.F., Huang, D.S., Xu, H.: An efficient local Chan-Vese model for image segmentation. Pattern Recogn. 43(3), 603–618 (2010) 5. Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005) 6. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945) 7. Dietenbeck, T, Alessandrini, M., Friboulet, D., Bernard, O.: CREASEG: a free software for the evaluation of image segmentation algorithms based on level set. In: Proceedings of the IEEE 17th International Conference on Image Processing, Hong Kong, pp. 665–668 (2010) 8. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vision 22, 61–79 (1997) 9. Lankton, S., Tannenbaum, A.R.: Localizing region-based active contours. IEEE Trans. Image Process. 17(11), 2029–2039 (2008) 10. Shi, Y., Karl, W.C.: A real-time algorithm for the approximation of level-set based curve evolution. IEEE Trans. Image Process. 17(5), 645–656 (2008) 11. Hore, S., Chakraborty, S., Chatterjee, S., Dey, N., Le, D.N.: An integrated interactive technique for image segmentation using stack based seeded region growing and thresholding. Int. J. Electr. Comput. Eng. 6(6), 2773–2780 (2016) 12. Sharma, K., Virmani, J.: A decision support system for classification of normal and medical renal disease using ultrasound images: a decision support system for medical renal diseases. Int. J. Ambient Comput. Intell. 8(2), 52–69 (2017) 13. Nilanjan, D., Venkatesan, R., Amira, A., Manuel, T.J.: Social group optimization supported segmentation and evaluation of skin melanoma images. Symmetry 10(2) (2018)
Chapter 22
A Robust and Fast Template Matching Algorithm Guoxi Chen, Changyou Li, and Wensu Xu
Abstract Template matching is an important method in the target recognition process. In order to improve the matching speed, recognition accuracy and algorithm robustness, a robust and fast template matching algorithm is proposed. The algorithm uses the composite morphological filters to filter out noise in gray scale images, and uses a novel low pass filter to further filter out noise in the image. The improved weighted Hamming Distance is used to detect the similarity of the image to reduce the complexity of the algorithm; the threshold is used to speed up the algorithm. Compared with other fast template matching algorithms, the experimental results show that the algorithm has good recognition accuracy in the target recognition process, which not only speeds up the running speed of the algorithm, but also has good robustness to noise, thus proving the effectiveness of the algorithm in the target recognition.
22.1 Introduction Template matching is an important way to target recognition, image understanding, and motion tracking [1]. Template matching is usually to detect the similarity between the template image and the target image [2]. Template matching is particularly sensitive to noise, so filtering out noise interference is an important step [3]. Due to traditional ways has certain defects for computational efficiency and robustness. In recent years, many scholars have conducted research. Li et al. [4] proposed an image correlation method related to Kalman prediction, and established a target motion model for predicting trajectories. Salih et al. [5] proposed a template matching using fast FFT transform, combined with region growing segmentation to perform image target recognition. Ye et al. [6] proposed to use the local features of the pixels for matching, and then use the three-dimensional fast Fourier transform to measure the G. Chen (B) · C. Li · W. Xu School of Mechanical and Power Engineering, Henan Polytechnic University, Jiaozuo 454000, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_22
193
194
G. Chen et al.
similarity in the frequency domain. Shin et al. [7] proposed a fast robust template matching based on gray index table, which can find a specific region under a given template query image in a noisy environment. Kim et al. [8] proposed the improved Hausdorff distance as a similarity measure function, which reduces the complexity of the algorithm, and improves the speed of algorithm matching. Sahani et al. [9] proposed a two-stage search method for quickly identify images, and effectively determined the accuracy and robustness of related points. In the above methods, the template matching effect is often not ideal in a strong noise environment. The matching accuracy will decrease, due to the similarity detection function of the above algorithm has certain complexity, which reduces the running speed of the algorithm. In order to solve the problem of robustness, running speed and matching accuracy, a robust and fast template matching algorithm is proposed. The algorithm uses the composite morphological filters to filter out noise in gray scale image, and a novel low pass filter filters out noise in image again to reduce the influence of noise. The improved weighted Hamming Distance is used to detect the similarity of the image to reduce the complexity of the algorithm. The threshold is used to speed up the movement of the template image in the target image, thus speeding up the algorithm.
22.2 Proposed Method 22.2.1 Composite Morphological Filter Design Composite morphological filter includes erosion, dilation, opening and closing, and operations on the original image to filter out the salt and pepper noise in the gray scale image. Suppose the target image function is S(x, y) and the structural element is W (x , y ). The mathematical models are as follows: Erosion of S by W is defined as: G(x, y) = SW = min{S(x + x , y + y ) − W (x , y )|(x , y ) ∈ Dw}
(22.1)
Dilation of S by W is defined as: G(x, y) = S ⊕ W = max{S(x − x , y − y ) + W (x , y )|(x , y ) ∈ Dw} (22.2) Opening of S by W is denoted as: G(x, y) = S · W = (SW ) ⊕ W
(22.3)
Closing of S by W is denoted as: G(x, y) = S · W = (S ⊕ W )W
(22.4)
22 A Robust and Fast Template Matching Algorithm
195
In this paper, a composite morphology filter is used to filter out the salt and pepper noise in gray scale image. The mathematical model is: G(x, y) = {((((S ⊕ W ) · W )W ) · W )}
(22.5)
where G(x, y) represents the image after the operation. W represents a disk template with a radius of 1 in this paper.
22.2.2 The Principle of a Novel Lowpass Filters In order to further reduce the influence of noise; and increase the robustness of the algorithm, this paper proposes a novel lowpass filter to filter out noise in the image. The principal of the novel lowpass filter using the n × n same element template A and the original image f (x, y) under the template coverage to obtain the sub-image f 1 (i, j), and the central pixel is E, and then the sub-image f 1 (i, j) and the column vector of the identity matrix I can be operated to obtain a new image f 2 (i, j), f 2 (i, j) and the column vector of the identity matrix I are recalculated to obtain the image f 3 (i, j), the pixel average value of the image f 3 (i, j) is compared with the central pixel E, and selecting an appropriate value as the center pixel, thus achieving the purpose of smoothing and denoise. The mathematical model is: ⎤ x11 · · · x1n · · · x1 j ⎢· · · · · · · · · · · · · · ·⎥ ⎥ ⎢ ⎥ ⎢ f 1 (i, j) = f (i, j) ∗ A = ⎢ xn1 · · · xnn · · · xn j ⎥ ⎥ ⎢ ⎣· · · · · · · · · · · · · · ·⎦ xi1 · · · xin · · · xi j ⎡ ⎤ ⎡ a11 a12 · · · a1n a11 x11 ⎢ a21 a22 · · · a2n ⎥ ⎢ a21 x11 ⎥ ⎢ ∗⎢ ⎣· · · · · · · · · · · ·⎦ = ⎣ · · · an1 an2 · · · ann an1 xn1 ⎡
β1 = f 1 (i, j)i 1 ⎡ a11 x11 a12 x12 ⎢ a21 x11 a22 x11 =⎢ ⎣ ··· ··· an1 xn1 an2 xn2
a12 x12 a22 x11 ··· an2 xn2
⎤ · · · a1n x1n · · · a2n x2n ⎥ ⎥ ··· ··· ⎦ · · · ann xnn (22.6)
⎤⎡ ⎤ · · · a1n x1n 1 ⎢ ⎥ · · · a2n x2n ⎥ ⎥⎢ 0 ⎥ = a11 x11 a21 x11 · · · an1 xn1 T (22.7) · · · · · · ⎦⎣ · · · ⎦ · · · ann xnn 0
β n+1 = ( f 1 (i, j)i n+1 )T I (i 1 , i 2 , . . . , i n+1 , i n+1 , . . . , in ) 2 2 2 −1 2 +1
196
G. Chen et al.
⎧⎡ ⎤⎡ ⎤⎫T a11 x11 a12 x12 · · · a1n x1n 0 ⎪ ⎪ ⎪ ⎪ ⎨⎢ ⎢ ⎥⎬ a21 x21 a22 x22 · · · a2n x2n ⎥ ⎥⎢ · · · ⎥ [i 1 , . . . i n+1 −1 , i n+1 +1 . . . , i n ] = ⎢ ⎣ ··· 2 2 ⎪ · · · · · · · · · ⎦⎣ 1 ⎦⎪ ⎪ ⎪ ⎩ ⎭ an1 xn1 an2 xn2 · · · ann xnn ··· n+1 x n+1 , . . . , a( n+1 = a( n+1 x n+1 n+1 , 2 )1 ( 2 )1 2 )( 2 −1) ( 2 )( 2 −1) n+1 n+1 n+1 n+1 n+1 (22.8) x a( n+1 x , . . . , a +1 +1 n n ( )( ) ( )( ) ( ) ) 2 2 2 2 2 2 βn = f 1 (i, j)i n ⎧⎡ a11 x11 ⎪ ⎪ ⎨⎢ a 21 x 21 = ⎢ ⎣ ··· ⎪ ⎪ ⎩ an1 xn1
a12 x12 a22 x22 ··· an2 xn2
⎤⎡ ⎤⎫ 0 ⎪ · · · a1n x1n ⎪ ⎢ ⎥⎬ · · · a2n x2n ⎥ ⎥⎢ · · · ⎥ = [a1n x1n , a2n x2n , . . . , ann xnn ]T · · · · · · ⎦⎣ · · · ⎦⎪ ⎪ ⎭ · · · ann xnn 1 (22.9)
The remaining formulas can be written by Eq. (22.7). The image f 2 (i, j) is: f 2 (i, j) = [β1T , β2T , . . . , βnT ]
(22.10)
Then the partial image f 3 (i, j) is: f 3 (i, j) = {sort[ f 2 (i, j)]}I (i 3 , . . . , i n×n−2 ) = [ f 2max (i, j), f 2 (i + 1, j) . . . , f 2min (i + (n × n), j)][i 3 , . . . , i n×n−2 ] = [ f 2 (i + 2, j), . . . , f 2 (i + (n × n) − 2, j)] E1 =
E=
(22.11)
f 3 (i, j)/(n × n − 4)
(22.12)
E E > E1 E1 E < E1
(22.13)
where (i 1 , i 2 , . . . , i n ) represents the column vector of the identity matrix I, sort represents the ranking function, E is the center pixel value of the image. The same element matrix A uses the 3 × 3 matrix in this paper.
22.2.3 Improved Weighted Hamming Distance Traditional template matching is too complicated for the similarity detection function, which affects the speed of matching. Such as the classic matching algorithm NCC [10] and SSDA [11], they spend much time in the template matching process. Aiming
22 A Robust and Fast Template Matching Algorithm
197
at this kind of defect, this paper proposes to use the weighted Hamming Distance as the similarity measure function. Hamming Distance [12] is used to calculate the number of positions with different corresponding digits in two equal-length binary sequences. Suppose g and v are two equal-length binary sequences. The Hamming distance Hd formula is defined as: Hd =
(g ∧ v)
(22.14)
where the symbol “∧” represents the two point sets are XOR operated. In order to better reduce the impact of noise, this paper uses the weighted Hamming distance to measure the similarity of the binarized edge image. The formula is defined as: (sort(G template ) ∧ sor t (Vsub-image )) (22.15) Hd = round 4(m × n) where round represents the rounding function. G template represents the template point sets, m × n represents the template size. Vsub-image represents the point set of the sub-image under the template coverage.
22.2.4 Determination of Threshold Tk Template matching uses the template image to move pixel by pixel on the entire target image. The process of moving the template image the pixels of the nontarget area take part in the operation, which reduces the running speed of the algorithm. In order to improve the running efficiency of the matching, the template divided into two blocks to compare with the set threshold Tk during template matching, thus improving the operating efficiency of the algorithm. Suppose the template image size is m × m, divide the template image into two blocks and take out R = round(m × m/2) pixel sets, and use the similarity detection function to calculate the distance value of the set R and comparison with the threshold Tk , thus reducing the running time of the algorithm, and improve operational efficiency. Many experiments have proved, the principle of selecting the threshold Tk is: Tk =
0.25(m × m) (m × m) is even number 0.25(m × m + 2m − 11) (m × m) is odd
The algorithm flow of this paper is shown in Fig. 22.1:
(22.16)
198
G. Chen et al.
Fig. 22.1 Algorithm flow chart
22.3 Experiment and Result Analysis In order to verify the effectiveness of the algorithm, and use the MatlabR2016a software platform, and use the Intel Core i5-2450 M CPU @ 2.5 GHz processor in a salt and pepper noise with σ1 = 0.08 and a Gaussian noise with mean m = 0.02 and variance v = 0.004. Algorithmic robustness and computational efficiency are detected for different images. In the others template matching process, the traditional median filter and the mean filter are used to filter out noise. The improved Hausdorff uses Canny operator to obtain the edge information of the image. The high threshold of the Canny operator used is set to 0.4, and the σ = 0.9 is used of Gaussian filter in the Canny operator. The algorithm uses Otsu [13] to find out the high and low thresholds of the canny operator, in terms of efficiency and robustness, compared with the others fast template matching algorithm. The comparison of experimental results is shown in Fig. 22.2. It can be easily found that the proposed algorithm has good robustness. In order to better illustrate the effectiveness and robustness of the proposed algorithm, Table 22.1 shows the algorithm operation time and recognition accuracy, According to the evaluation method of image quality in [14], Table 22.2 evaluates the ability of the novel filter to filter out noise; and the similarity between the image after filter out noise and the target image, thus demonstrating the robustness of the proposed algorithm to noise. The first column of Fig. 22.2 represents the target image and the template image, and the red mark is the template image. The second column of Fig. 22.2 is an effect image of the conventional SSAD [11] algorithm. Figure 22.2 shows that the algorithm is robust to different noise images, and can identify the target area well. However, Table 22.1 shows that the algorithm runs slowly and the recognition accuracy of
22 A Robust and Fast Template Matching Algorithm
199
(a) Target image (b) SSDA[11] (c) FNNPROD[4] (d) Improved hausdorff [8] (e) Proposed method
Fig. 22.2 Comparison of experimental results Table 22.1 Algorithm operation efficiency and precision comparison Image
Template size
HD-999 256 × 79
19 × 20
PCB 365 × 550
Component 306 × 545
Nut 533 × 469
Chip 394 × 601
67 × 33
41 × 42
57 × 37
39 × 39
SSDA [11]
FNNPROD [4]
Improved hausdorff [8]
Proposed method
IP
(92,36)
(92,36)
(92,36)
(92,36)
RP
(92,35)
(91,36)
(11,149)
(92,36)
T(s)
0.2926
0.5114
0.3862
0.1736
IP
(181,239)
(181,239)
(181,239)
(181,239)
RP
(180,239)
(181,240)
(40,184)
(181,239)
T(s)
6.8701
3.6465
4.0364
3.4362
IP
(231,136)
(231,136)
(231,136)
(231,136)
RP
(232,137)
(239,368)
(84,139)
(231,136)
T(s)
4.6320
2.3971
3.1215
2.3326
IP
(63,244)
(63,244)
(63,244)
(63,244)
RP
(63,244)
(11,359)
(211,258)
(63,244)
T(s)
8.5346
3.7119
4.9785
3.8336
IP
(309,265)
(309,265)
(309,265)
(309,265)
RP
(309,265)
(309,265)
(320,149)
(309,265)
T(s)
6.2608
3.4597
4.5821
3.4556
Note IR represents the initial position of the upper left of the template image, RP represents the recognition position of the upper left of the template image, and T represents the running time of the program
0.075
SSIM
0.087
84.96
12.83
8.801
0.076
75.74 0.078
81.47
11.29
HF
0.088
86.46
13.42
TF
Component
Note TF represents the conventional median filter, HF represents the hybrid filter
10.59
79.81
PSNR
TF
SNR
PCB
TF
HF
HD-999
Table 22.2 Algorithm robustness comparison
0.114
98.92
18.83
HF
0.168
82.28
12.83
TF
Nut
0.216
91.47
15.66
HF
0.064
73.01
7.604
TF
Chip
0.068
79.87
10.59
HF
200 G. Chen et al.
22 A Robust and Fast Template Matching Algorithm
201
the target area is relatively poor. The third column in Fig. 22.2 is an effect image of the FNNPROD algorithm in [4]. The robustness to noise is poor, and there is a certain mismatch. However, Table 22.1 shows that the algorithm runs fast. The fourth column of Fig. 22.2 is the effect image of the improved Hausdorff algorithm in [8], although the robustness of the algorithm is relatively poor, and the algorithm has a mismatch. Table 22.1 shows that the algorithm runs faster. The fifth column in Fig. 22.2 is an effect image of the algorithm. Figure 22.2 shows that the proposed algorithm has good robustness to noise, high position recognition accuracy, and fast calculation speed. Table 22.1 further illustrates the superiority of the algorithm. In order to further evidence the robustness of the proposed algorithm to noise, according to the evaluation method of image quality in [14], the signal to noise ratio (SNR), peak signal to noise ratio (PSNR), and Structural Similarity Index Measure (SSIM). Table 22.2 shows that the filter designed in this paper is larger than the SNR value of the traditional filter, which indicates that the image quality is better after filter out the noise, the noise point is lesser in the image, and the PSNR of the algorithm is larger, indicating that the algorithm has very strong denoising ability. The image is clear after filtering out the noise; the SSIM value of the similarity index is relatively large, indicating that the image distortion is relatively small after filter out noise.
22.4 Conclusion The template matching algorithm proposed in this paper can well identify the target position of different noise images. The traditional SSDA in [11], the FNNPROD algorithm in [4], and the improved Hausdorff algorithm in [8] the robustness is better in the less noise environment. If the noise is strong in the image, the traditional template matching algorithm will have low recognition accuracy, slow operation speed, and even mismatch. In the paper, the proposed robust fast template matching algorithm is robust to different noise affects, the recognition accuracy of the target position is relatively high, and the algorithm runs faster, thus greatly improves the operating efficiency of the algorithm. It can make up for the shortcomings of traditional SSDA in [11], the FNNPROD algorithm in [4], and the improved Hausdorff algorithm in [8], improve the robustness of the algorithm and the recognition accuracy of the target position, and provide some reference for target recognition, defect detection, and other industrial applications.
References 1. Rana, A., Valenzise, G., Dufaux, F.: Learning-based tone mapping operator for image matching. IEEE Trans. Multimed. 21(1), 256–268 (2018)
202
G. Chen et al.
2. Satish, B., Jayakrishnan, P.: Hardware implementation of template matching algorithm and its performance evaluation. In: International Conference on Microelectronic Devices, pp. 1–7. IEEE, India, Vellore (2017) 3. Xiu, C., Pan, X: Tracking algorithm based on the improved template matching. In: 2017 29th Chinese Control and Decision Conference, pp. 483–486. IEEE, Chongqing (2017) 4. Li, C., Yu, F., Lin, Z., et al.: A novel fast target tracking based on video image. In: Control Conference, pp. 10264–10268. IEEE, Chengdu, China (2016) 5. Salih, N.D., Saleh, M.D., Eswaran, C., et al.: Fast optic disc segmentation using FFT-based template-matching and region-growing techniques. Comput. Methods. Biomech. Biomed. Eng. Imaging. Vis. 6(1), 101–112 (2017) 6. Ye, Y., Bruzzone, L., Shan, J., et al: Fast and robust structure-based multimodal geospatial image matching. In: IEEE International Geoscience & Remote Sensing Symposium, pp. 5141–5144. IEEE, Fort Worth, TX, USA (2017) 7. Shin, B.G., Park, S.Y., Lee, J.J.: Fast and robust template matching algorithm in noisy image. In: International Conference on Control, pp. 6–9. IEEE, Seoul, South Korea (2007) 8. Kim, J Y., Choi, H R., Kwon J H., et al.: Recognition of face orientation by divided hausdorff distance. In: International Conference on Telecommunications & Signal Processing, pp. 564– 567. IEEE, Prague, Czech Republic (2015) 9. Sahani, S K., Adhikari, G., Das B K.: Fast template matching based on multilevel successive elimination algorithm. In: 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE, Bangalore, India (2012) 10. Li, Y., Zhang, P.: Fast location and segmentation of character in annular region based on normalized cross-correlation. In: International Conference on Systems & Informatics, pp. 455– 459. IEEE, Shanghai, China (2017) 11. Wang, Z., Wang, B., Zhou, Z., et al.: A novel SSDA-Based block matching algorithm for image stabilization. In: International Conference on Intelligent Human-Machine Systems & Cybernetics, pp. 286–290. IEEE Computer Society, Hangzhou, China (2015) 12. Blum, H.: An Associative Machine for Dealing with the Visual Field and Some of its Biological Implications. In: Biological Prototypes and Synthetic Systems. Springer, US (1962) 13. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979) 14. Horé, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: 2010 International Conference on Pattern Recognition, pp. 2366–2369. IEEE, Istanbul, Turkey (2010)
Chapter 23
Infrared and Visible Image Fusion Algorithm Based on Threshold Segmentation Wang Yumei, Li Xiaoming, Li Congyong, Wang Dong, Cheng Long, and Zheng Chen Abstract For advancing the fusion algorithm quality of infrared and visible images, an image fusion algorithm of infrared and visible images was presented by threshold segmentation. The visible image was turned to HSV color space, and the V component of visible image and the infrared image were decomposed multi-resolution by Laplacian pyramid transform. Using the least squares principle, low-frequency (LF) coefficients of the infrared image were segmented by Otsu threshold segmentation to enhance infrared thermal targets in the fused image. Binary image, which was generated by Otsu threshold segmentation from LF coefficients of infrared image, was an important fusion rule. Multi-scale coefficients of Laplacian pyramid transform were fused by rules of region mutual information, matching degree, and region energy. Lastly, the fusion image reconfiguration was realized through Laplacian pyramid inverse transform. The experimental result shows that the image fusion algorithm presented by this paper retains clear scenes of the visible image adequately, meanwhile highlights the thermal targets of the infrared image, which enriches information and object characteristics in the fusion image.
23.1 Introduciton Infrared images have strong thermal target recognition ability, which is not limited by illumination conditions, and can penetrate smoke to recognize hidden targets, but the image signal-to-noise ratio is low and background information is insufficient [1, 2]; visible images conform to human visual characteristics, have high resolution, can reflect the details of the target scenes, and are higher in image clarity, but in the case of occlusion or limited illumination conditions, the target features are not obvious. The infrared and visible images may be fused and their information complementarity W. Yumei (B) · L. Xiaoming · W. Dong · C. Long · Z. Chen PLA Army Academy of Artillery and Air Defense, Hefei 230031, China e-mail: [email protected] L. Congyong China Telecom Corporation Limited Anhui Branch, Hefei 230001, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_23
203
204
W. Yumei et al.
may be utilized so as to make the fused images have better target characteristics and scene information synchronously, to provide detailed and accurate image data for recognition, location, and other applications [1, 3]. The fusion algorithms can be roughly divided into two categories: the spaceregion-based and the transform-region-based. The space-region-based is the fusion directly over image pixels with the common method including: maximum and minimum method, weighted mean method, and PCA (principal component analysis) and lower in the contrast of fused images. The transform-region-based fusion algorithm transforms the source images first, then fuses the post-transform coefficients as certain criteria, and finally obtains the fused images by inverse transformation [4–6]. At present, most of the research on the transform-region-based image fusion is based on multi-scale transformation. This method is more in line with the characteristics of human visual system, and better in fusion effect [7, 8]. A threshold-segmentation-based algorithm is proposed; a between-class variance threshold segmentation method used for extraction of infrared image targets with Laplace transform, combined with regional energy, matching degree and mutual information of coefficients in different scales, set with different fusion rules, conducting fusion coefficient restructure and color/space transform finally obtaining the fused images. The experimental results show that the fusion images under it can fully enhance the target information while retaining clear visible scenes.
23.2 Laplacian Pyramid The Laplacian pyramid is based on the Gaussian pyramid 19-20. Given an image I, its Gaussian pyramid is a set of images {G K } called levels, representing progressively lower resolution versions of the image, in which high-frequency details progressively disappear. In the Gaussian pyramid, the bottom-most level is the original image {G 0 = I }. The levels of the Laplacian are recursive as follows [9]: G k (i, j) =
2 2
w(m, n)G k−1 (2i + m, 2 j + n)
m=−2 n=−2
(1 ≤ k ≤ N , 0 ≤ i ≤ Rk , 0 ≤ j ≤ Ck )
(23.1)
where k indexes the level of the pyramid and w is the Gaussian generating kernel. L Pk−1 = G k−1 − G ∗k G ∗k (i,
j) = 4
2 2 m=−2 n=−2
w(m, n)G k
i −m j −n , 2 2
(23.2)
(1 ≤ k ≤ N , 0 ≤ i ≤ Rk−1 , 0 ≤ j ≤ Ck−1 )
(23.3)
23 Infrared and Visible Image Fusion Algorithm …
Gk
i −m j −n , 2 2
=
Gk
i−m j−n , 2 2
0
205
if i−m , 2
j−n 2
are int
(23.4)
otherwise
23.3 Fusion Algorithm 23.3.1 Fusion Ideology After Laplacian decomposition, the LF coefficients and HFcoefficients at all levels are obtained, respectively. The final fusion quality is determined by selecting appropriate fusion rules. HF sub-band coefficients represent the spatial information and the targets’ edge features, and their fusion rules have two categories: the pixel-based and the region-based. The pixel-based fusion rules only take a single pixel as object, and do not take into account the adjacent pixels correlation [10]. Although the generality is strong, the fusion effect is not ideal. One should be taken into account the target features of an image, which are often represented by multiple pixels, so they can achieve better vision, more details, and more obvious fusion effect. The fusion rules of HF are based on regional features [4, 11, 12]. The LF sub-band represents the background information of the source images, and the weighted averaging method is generally adopted as the fusion tactic [13–15]. Although the information of infrared and visible light is taken into account, it reduces the contrast of the image, resulting in the infrared target not being prominent and the background is not clear. In order to preserve the target features in low-frequency images, the fusion rules of LF are formulated, guided by the infrared target images extracted by inter-class threshold segmentation, and then combined with the region fusion to complete the fusion of low-frequency coefficients. Figure 23.1 shows the ideological block diagram of the fusion algorithm in this paper. Firstly, the visible images are transformed into HSV color space [16] and the V components and the infrared images are transformed by Laplace transform to have LF and HF at all levels, respectively. High-frequency information are fused according to degree and energy of regional mutual information, and low-frequency coefficients are fused according to binary image obtained by between-class threshold segmentation and low-frequency region mutual information. Laplacian inverse is applied to the HF and LF and the results are taken as the V components of the visible images. Finally, RGB color space is used to get the final images.
206
W. Yumei et al.
Visible image Fusion
HSV color space transform
image
High frequency coefficient High frequency coefficient LP1
V component
Low frequency coefficient
…
RGB color space
LPk-1 LPk
LP
transform
decomposition Image fusion algorithm
LP1
V component after fusing
adopting area mutual
…
information Image fusion algorithm
LPk-1
LP reconstruction
adopting area mutual information Image fusion algorithm LP
adopting area mutual
decomposition
information
Infrared
LP1
image
LPk-1
…
LPk
LPk
Low frequency coefficient High frequency coefficient High frequency coefficient Target extraction algorithm Target
LP1
image
LPk
LPk-1
…
Low frequency coefficient
LP
High frequency coefficient decomposition High frequency coefficient
Fig. 23.1 Diagram of fusion algorithm
23.3.2 Infrared Image Segmentation The background of an infrared image is usually simple, and the contrast between a target and a background is large. When a source image is transformed by Laplace transform, the HF coefficients contain the edges in the image, while the LF coefficients contain the background information, but still retain a large amount of target
23 Infrared and Visible Image Fusion Algorithm …
207
information. If the fusion is carried out by simple mean method, the important target information cannot be highlighted in the final fused image. This image is an important rule of LF fusion algorithm.
23.3.3 High-Frequency Coefficient Fusion Rules • Regional mutual information According to the concept of mutual information in information theory, regional mutual information are shown in Eq. (23.5). RMI(LPl,V I (R), LPl,I R (R)) = I (LPl,V I (R)) + I (LPl,I R (R)) − I (LPl,V I (R), LPl,I R (R))
(23.5)
Equation (23.5), RMI(LPl,V I (R), LPl,I R (R)) is the regional mutual information between the V component on the layer l of the region R’s Laplace pyramid. I (LPl,V I (R)) represents the information entropy of the visible image, as shown in Eq. (23.7). I (LPl,V I (R), LPl,I R (R)) is the joint information entropy of visible light and infrared image in region R, as shown in Eq. (23.8). The selected R is 5 × 5. I (LPl,V I (R)) = −
p(LPl,V I (x, y)) log p(LPl,V I (x, y))
(23.6)
p(LPl,I R (x, y)) log p(LPl,I R (x, y))
(23.7)
(x,y)∈R
I (LPl,I R (R)) = −
(x,y)∈R
I (LPl,V I (R), LPl,I R (R)) = −
p(LPl,V I (x, y), LPl,I R (x, y))
(x,y)∈R
× log p(LPl,V I (x, y), LPl,I R (x, y))
(23.8)
Regional mutual information amount represents the degree of information correlation in the R. The larger the value is, the greater the image correlation in the region will be. • Regional energy The weighted sum of the gray squares in the region R is used to represent the energy of the region. The large energy value indicates that the region has a large gray value and contains a large amount of information. The calculation of energy is shown in Eq. (23.9). E k (i, j) =
(x,y)∈R
w(x, y)[LPk (i + x, j + y)]2
(23.9)
208
W. Yumei et al.
wherein E k (i, j) denotes the regional energy at the center of (i, j) on the layer k of the Laplacian pyramid; LPk denotes the image of the layer k of the Laplacian pyramid; w(x, y) denotes the weight coefficients corresponding to L Pk ; and R defines the size of the local region (5 × 5 hereunder). • Region matching degree The matching degree K of the corresponding region is calculated as Eq. (23.10). 2 Mk (i, j) =
(x,y)∈R
w(x, y)LPi,V (i + x, j + y)LPi,I (i + x, j + y) E k,V (i, j) + E k,I (i, j)
(23.10)
wherein Mk (i, j) represents the matching degree of layer K of Laplacian pyramid, and E k,V (n, m) and E k,I (n, m) are calculated according to Eq. (23.9). When the matching degree is less than the set threshold, the coefficients corresponding to the images with larger energy value are taken as the fusion coefficients. Otherwise, the weighted means are taken as the fusion coefficients.
23.3.4 LF Coefficient Rules The LF obtained from V components after Laplacian transformation mainly contains background features and target information. When the mutual information of the region R is less than the set threshold, the correlation between the images in the region is small. If the gray value of the binary image corresponding to the betweenclass threshold segmentation is 255, it is more likely to be an infrared target. If the value of the binary image is 255 in this case, with the purpose to take into account both the background and target information, the weighted mean method would get the LF fusion coefficients. Otherwise, the LF coefficients of the visible V components are chosen.
23.4 Results and Analysis With the purpose of the fusion image quality of the algorithm (LP_MI_SEGT) based on between-class threshold segmentation, a great number of experiments have been carried out, and two groups of them are selected for illustration. Comparing the three algorithms with the ones proposed in this paper, HSV color space and Laplace transform are used in the comparison.
23 Infrared and Visible Image Fusion Algorithm …
E=−
n
209
pi log2 pi
i=1
−1 M−1 N (F(i + 1, j) − F(i, j))2 + (F(i + 1, j) − F(i, j))2 1 G= (M − 1)(N − 1) 2 i=1 j=1
SF = R F 2 + C F 2 M N 1 ¯ 2 SD = (F(i, j) − F) MN i=1 j=1 M N ¯ ¯ i=1 j=1 R(i, j) − R F(i, j) − F CC = M N M N ¯ 2 ¯ 2 i=1 j=1 R(i, j) − R · i=1 j=1 F(i, j) − F
The source and fused images of the first group are shown in Fig. 23.2, and the objective evaluation indexes of the algorithm are shown in Table 23.1. According to the fused images and objective evaluation indexes, LP_MAX_AVG algorithm loses the spectral information of visible light seriously with blurred edges and indistinct infrared targets. Compared with LP_MAX_AVG, LP_MI_AVG and LP_MAX_AVG have smaller spectral information distortion and richer edge details, but infrared targets are not prominent. The LP_MI_SEGT contains the largest amount of information, smaller background spectral distortion, more edge details, and more prominent infrared targets. The source and fused images of the first group are shown in Fig. 23.3, and the evaluation are shown in Table 23.2.
(a) Visible image
(b) Infrared image
(c) LP_MI_SEGT fused image
(d) LP_MI_MAX fused image (e) LP_MI_AVG fused image (f) LP_MAX_AVG fused image Fig. 23.2 The first group experiments
(b) Original infrared image (c) LP_MI_SEGT fused image
Fig. 23.3 The second group experiments
(d) LP_MI_MAX fused image (e) LP_MI_AVG fused image (f) LP_MAX_AVG fused image
(a)Original visible image
210 W. Yumei et al.
23 Infrared and Visible Image Fusion Algorithm …
211
Table 23.1 Objective evaluation indexes for the first group fusion experiments LP_MI_SEGT Entropy E
7.1839
LP_MI_MAX
LP_MI_AVG
7.1282
6.4756
LP_MAX_AVG 6.4983
Mean gradient G
11.5298
11.4816
11.2683
10.9875
Spatial frequency SF
20.0828
20.0160
18.0249
16.8658
Standard deviation SD
52.4587
49.0988
26.1478
26.1582
Correlation coefficient CC
0.9281
0.9217
0.5152
0.5487
Universal image quality UIQI
0.8897
0.8655
0.3395
0.3622
Based on the above two fusion experiments, LP_MI_SEGT not only preserves the visible background and detail information, but also highlights the infrared thermal targets, making the fused images to have both clear visible scene information and better infrared target characteristics, which is conducive to the detection of potential targets.
23.5 Conclusion With the purpose to enhance infrared thermal targets an infrared and visible fusion algorithm based on threshold segmentation is proposed in this paper. Firstly, the algorithm transforms the visible image into HSV color space, uses Laplace transform to decompose the V component with multi-resolution, and uses the least square principle to segment the LF coefficients of the infrared image using the between-class threshold segmentation method to generate a binary image to fuse the LF coefficients; and fuses the HF coefficients of the visible V component and the infrared image based on RMI. The experimental results show that the fusion algorithm proposed in this paper not only preserves the clear scene information of visible light to the full extent, but also highlights the infrared target information as much as possible, and effectively enhances the target features. Table 23.2 Objective evaluation indexes for the second group experiments LP_MI_SEGT Entropy E Mean gradient G
LP_MI_MAX
LP_MI_AVG
LP_MAX_AVG
6.6973
6.4976
6.3507
6.3597
10.4771
7.8002
7.7393
6.4785
Spatial frequency SF
25.8123
14.6609
17.5968
16.3742
Standard deviation SD
36.5151
31.5658
25.3255
26.1065
Correlation coefficient CC
0.4924
0.3174
0.1977
0.2445
Universal image quality UIQI
0.4609
0.1681
0.1333
0.1605
212
W. Yumei et al.
References 1. Yang, G., Tong, T., Lu, Y., et al.: Fusion of infrared and visible images based on multi-features. Opt. Precis. Eng. 22(2), 489–496 (2014) 2. Zhou, Y., Geng, A., Wang, Y.: Contrast enhanced fusion of infrared and visible images. Chin. J. Lasers 41(9), 0909001 (2014) 3. Chen, T., Wang, J., Zhang, X., et al.: Fusion of infrared and visible images based on feature extraction. Laser & Infrared 46(3), 357–362 (2016) 4. Bilodeau, G.A., Torabi, A., Morin, F.: Visible and infrared image registration using trajectories and composite foreground images. Image Vis. Comput. 29(1), 41–50 (2011) 5. Song, Q., Xu, M., Yu, S.: HIS Image fusion algorithm based on target extraction and wavelet transform. J. Henan Univ. (Natural Science) 44(2), 232–235 (2014) 6. Xu, Q., Chen, Y., Liu, J.: A fast fusion approach for remote sensing images with enhanced edge feature. Electron. Opt. Control 20(9), 43–47 (2013) 7. Xu, L., Wei, R.: An optimal algorithm of image edgy detection based on Canny. Bull. Sci. Technol. 29(7), 127–131 (2013) 8. Lv, Z., Wang, F., Chang, Y.: An improved Canny algorithm for edge detection. J. Northeast. Univ. (Natural Science) 28(12), 1681–1684 (2007) 9. Burt, P.J., Adelson, E.H.: The Laplacian pyramid as a compact image code. IEEE Trans. Commun. com-31(4) (1983) 10. Yu, H., Lin, C., Tan, G., et al.: A computational model of visual attention mechanism and edge extraction algorithm of Canny in contour detection. J. Guangxi Univ. Sci. Technol. 27(2), 88–92 (2016) 11. Hu, Y., Zhang. X.: An improved HIS fusion for high resolution remote sensing images. SPIE, 7546:754635-1-754635-6 (2010) 12. Li, C., Ju, Y., Bovic, C.: No-training no-reference image quality index using perceptual features. Opt. Eng. 52(5), 057003 (2013) 13. Kim, Y.S., Lee, J.H., Ra, J.B.: Mulit-sensor image registration based on intensity and edge orientation information. Pattern Recogn. 41(11), 3356–3365 (2008) 14. Ye, Y., Nie, J.: Fusion of IR/PMMW image based on Laplacian pyramid transform. J. Proj. Rocket. Missiles Guid. 34(2), 165–171 (2014) 15. Xiao, J., Rao, T., Jia, Q.: An image fusion algorithm of Laplacian pyramid based on graph cutting. J. Optoelectron. Laser 25(7), 1416–1423 (2014) 16. Xu, G., Jiang, D.: Adaptive multi-scale Canny edge detection. J. Shandong Jianzhu Univ. 21(4), 360–363 (2006)
Chapter 24
Improved K-Means Algorithm for Optimizing Initial Centers Jianming Liu, Lili Xu, Zhenna Zhang, and Xuemei Zhen
Abstract K-means clustering algorithm is a clustering analysis algorithm. In practical clustering applications, clustering results often fluctuate greatly. This fluctuation will affect the stability of clustering and even the efficiency of clustering. This paper presents an improved K-means clustering algorithm, which retains the advantages of the traditional algorithm. The method of selecting initial clustering centers is improved, and the shortcomings of traditional algorithms are eliminated effectively. The improved algorithm effectively absorbs the advantages of the grid-based clustering algorithm and combines the concept of grid density. Through the selection of initial clustering centers, the number of iterations of the improved algorithm decreases significantly. It effectively reduces the complexity of clustering, improves the efficiency and accuracy of clustering, improves the convergence of the algorithm, and achieves better clustering results.
24.1 Introduction K-means clustering algorithm is a clustering analysis algorithm. It is widely used. In practical clustering applications, clustering results often fluctuate greatly [1]. This fluctuation will affect the clustering stability and even efficiency. Therefore, the accuracy of clustering results is not high, which eventually leads to wrong clustering results. The main reason is that it adopts a relatively centralized central point selection method. This clustering center selection method can select edge points or noise data as clustering centers, and it is difficult to ensure the correctness of clustering results. To solve these problems, an improved algorithm is proposed in this paper. Lili Xu, Zhenna Zhang, Xuemei Zhen—Co-First Author. J. Liu (B) · Z. Zhang · X. Zhen Computer Department, Weifang Medical College, Weifang 261053, China e-mail: [email protected] L. Xu Weifang Vocational College, Weifang 261042, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_24
213
214
J. Liu et al.
24.2 Clustering Process of the Algorithm K-means clustering algorithm is necessary to determine the prototype clustering algorithm of data [2]. Generally, the number of categories of data is K, and then to cluster the data set and iterate step by step in the direction of reducing the objective function, that is to say, to obtain the optimal clustering results [3]. The final clustering results need to ensure the optimal value of the objective function [4]. K-means clustering has good scalability and clustering speed. It can effectively reduce the response time of clustering, which is described as follows: (i) Random selection of K clustering centers, μ1 , μ2 , . . . , μk Rn . (ii) Calculate the range between cluster center and sample. Then, the samples are allocated to the nearest cluster center according to the calculation results. (iii) The average value of the cluster center sample is calculated, and the result is regarded as a new cluster center. (iv) Repeat steps 2 and 3 until the results show convergence characteristics.
24.2.1 Traditional Algorithm The traditional K-means clustering algorithm has gradually become an influential clustering algorithm in data mining technology [5]. The salient features of the algorithm need to distinguish the classification of each data in the sample data set. After completing the correct classification of all data objects, the clustering center will be adjusted, and the next iteration will not adjust the clustering center, indicating that the clustering algorithm is completed iteratively. On the contrary, in each iteration process, there are no errors in the classification of each data until the classification of each data is correct. The algorithm process is shown as follows (Fig. 24.1).
24.3 Analysis of the Algorithm K-means clustering algorithm can describe text features according to the text vocabulary in dictionary, and each clustering object can be represented by a word frequency distribution vector according to the distribution of keywords in text information. Thus, the advantages of K-means clustering algorithm include: (i)
It can process image and text features with high stability and scalability and can process data sets in numerical form with good clustering effect. (ii) Compared with other data analysis methods, the results are more intuitive, easier to understand and have more clear geometric meaning. When using K-means clustering algorithm to complete numerical object processing and
24 Improved K-Means Algorithm for Optimizing Initial Centers
215
Fig. 24.1 K-means algorithmic process
image and text clustering processing, the clustering effect is better and easy to understand. (iii) Before applying this algorithm, we need to know whether the shape of clustered data sets is convex or not, and we can judge the shape of data sets. (iv) This algorithm is used to process data sets. As long as the target data sets are independent of each other, there is no need to restrict the range of data sets. In addition to the above advantages, the algorithm has many shortcomings compared with other algorithms in application, including the following aspects: (i)
The algorithm has limitations in practical application. The value of K, which must be set in advance, will directly affect the effect of clustering analysis. (ii) K initial centers need to be randomly selected, so the clustering results and accuracy of the algorithm are closely related to the selection of initial clustering centers. Therefore, when solving practical problems, what the clustering results obtained are usually not the global optimal solution, but the local optimal solution when the clustering centers are fixed. So the clustering results will form a certain range of fluctuations. Because of this randomness, the clustering results will vary greatly. (iii) Iteration process is to average the values of all data points in the class, which is easily disturbed by local noise and single distributed data points. If there are independent points and interference noise points, there will be a large range of
216
J. Liu et al.
deviations between clustering results and theoretical clustering values, which then affect the clustering results. (iv) When solving practical problems by K-means algorithm of Euclidean distance, spherical shape data can be found intelligently. It is difficult to find other types of data except spherical shape. Data points with other shapes cannot be found.
24.4 Improvement of K-Means Algorithm 24.4.1 Improvement on the Selection of Initial Cluster Centers In order to improve the stability of clustering results obtained by K-means clustering algorithm, repeated searches are carried out among lots of data combining with the principle of shooting target. When the search efficiency is not high, samples are selected according to the characteristics of data distribution. Initial clustering centers are similar to target centers. The more arrows deviated from target centers, the more targets are determined. In the process of determining the initial clustering center, the initial data features can still be reflected, and the data can not be distorted after being sampled at the same time.
24.4.2 Rule Initial Clustering Center The improved initial clustering center parameter value is centered on the sample object and takes the mean distance from the nearest J data objects as the initial clustering center. Data objects are negatively correlated with the density of data objects. If the data range is shorter, there will be more data objects and their adjacent J objects. The distance dispersion between the object and its adjacent object is greater. The original K-means clustering algorithm is run uniformly for any sample data, and J sub-samples are selected for the original data and J sample objects S1 , S2 , . . . , S j . The function that K-means first (Si, K) in the improved algorithm is consistent with that in the traditional algorithm. K clustering centers are generated for the formation of Si clusters. Functional distortion (M i , F) determines an optimal solution by combining error squares and criterion functions. Once the element is empty, the initial clustering center is selected again. After clustering, the state of some elements must be judged. The improved K-means algorithm is as follows: Input: Assuming that the current data set S contains n data objects, it needs to be divided into K clusters. K is the number of data classifications. J is the number of sampling times. Output: K cluster centers and data elements in corresponding classes.
24 Improved K-Means Algorithm for Optimizing Initial Centers
217
(i)
According to Euclidean range, the range D(X, Z) between samples is calculated, and the initial clustering centers (S, K, J) are set. (ii) Select K objects in high density region from density set C as initial clustering centers. (iii) Calculate the average deviation range after all data objects distances are calculated, as follows: D X i, Z k (I ) = min D X i , Z j (I ) , i = 1, 2, . . . , n
(24.1)
If satisfied D X i, Z j (I) = min D X i , Z j (I) , i = 1, 2, . . . , n
(24.2)
That X i ∈ ωk . (iv) The set C of density values of all data objects in the data set is obtained. For the selected K initial clustering centers, the traditional K-means clustering algorithm is used to cluster the data objects, as shown below. Z j (I + 1) =
n 1 ( j) x , j = 1, 2, . . . , k n i=1 i
(24.3)
(v) Improve initial value judgment: Z j (I + 1) = Z j (I ), j = 1, 2, . . . , k that, I = I + 1 Return (24.2), otherwise, the algorithm ends.
24.4.3 Experiments and Analysis of Improved Algorithms K-means algorithm is susceptible to the influence of initial centers, and the final clustering accuracy varies with different initial centers. Due to space constraints, only the details of K-means algorithm and improved algorithm on datasets are given. As shown in Table 24.1, the three algorithms run 20 times, respectively, and the selection of each run time is given. The initial center point (expressed by its number), iteration times, accuracy, and running time (The unit is Ms.) are taken. Table 24.1 shows that the average accuracy of K-means algorithm is 67.87%, while the average accuracy of improved algorithm is 20% higher than that of K-means algorithm, reaching 87.87%. From the results of five runs, the accuracy of K-means algorithm is 52.67% at worst and 88.67% at the highest, and the clustering results are very unstable. The results of five runs of the improved algorithm are relatively stable, and the highest accuracy is 89.33%. Therefore, the improved algorithm has higher clustering accuracy and better stability in terms of clustering accuracy. In terms of
218
J. Liu et al.
Table 24.1 Results of the two algorithms on the datasets test Serial number
K-means algorithm Initial center
Iteration times
Accuracy rate (%)
Time
Initial center
Iteration times
Accuracy rate (%)
Time
1
(53,75,109)
10
57.33
406
(4, 56, 102)
7
89.33
321
2
(10,14,127)
5
52.67
209
(54, 63, 103)
3
86.49
196
3
(48,49,53)
7
83.33
284
(8, 47, 64)
5
87.28
215
4
(1,19,120)
13
57.33
503
(31, 114, 124)
7
86.12
368
5
(36,85,143)
9
88.67
372
(47, 51, 122)
5
88.36
278
8.8
67.87
354.8
5.4
87.87
275.6
Average
K-means of improved algorithm
running time, the improved algorithm has a slightly higher execution time because of the optimization of the initial center selection and the average iteration times of K-means algorithm. The number of iterations is 8.87, and the average number of iterations of the improved algorithm is 5.1. The improved algorithm has a smaller number of iterations. The average running time of the improved algorithm is almost the same as that of the other two algorithms, less than one second. At the same time, data sets such as Glass, Ecoli, Irisdata, and Rice are implemented. Experiments show that the improved semi-supervised k-means algorithm is more effective in selecting initial clustering. Compared with the traditional K-means algorithm and semi-supervised K-means algorithm, the proposed algorithm has better performance in centrality. The efficiency of the algorithm is greatly improved by introducing and processing the labeled data.
24.4.4 Improved Algorithm Analysis Most of the calculation process and methods of improved K-means clustering algorithm are based on traditional algorithm [6]. The research proves that the improved algorithm combines probability sampling method to optimize the selection of clustering centers, and greatly reduces the calculation speed and the number of iterations of the algorithm. It has good clustering reliability and running stability. Only in this way, the improved algorithm can be proved correct. The improved clustering algorithm has high sensitivity to the difference between different data objects in application,
24 Improved K-Means Algorithm for Optimizing Initial Centers
219
and assigns a weight to different clustering objects. When calculating the distance between different data objects, the object attributes have the same importance as the weight values. Both of them determine the difference between different data objects. The traditional algorithm can be improved by combining the attributes of data objects with the weights of data objects. Therefore, the improved algorithm can find clusters with large differences in size, which overcomes the shortcomings of the traditional algorithm.
24.5 Conclusion The improved K-means algorithm combines the advantages of grid-based clustering method. The efficiency of clustering algorithm is improved by optimizing the initial clustering center. The risk of deviation from the expected value of traditional algorithm is eliminated. Although the efficiency of the improved algorithm has been improved, it is necessary to determine the cluster number K before running the clustering algorithm to complete the initial cluster center selection or outlier detection. Therefore, in the next study, because there are many initial parameters, the parameters of the improved algorithm still need to be pre-set, it is necessary to further improve the clustering effect and enhance the practicability of the algorithm. Acknowledgements The research work was supported by School Visitor Program of NO. 2017–19.
References 1. Han J., Kamber M., Pei, J.: Data Mining: Concepts and Techniques, vol. 7, 3rd edn, pp. 288–289. China Machine Press (2012) 2. Dean, J., Ghemawat, S.: Map reduc: simplifier date processing on large clusters. Commun. ACM 51(1), 107–113 (2008) 3. Caselitz, P., Giebhardt, J.: Rotor condition monitoring for improved operational safety of offshore wind energy converts. Solar Energy Eng. 127–253 (2005) 4. Tsai, C.S.: Enhancement of damage detection of wind turbine blades via cwt-based approaches. IEEE Trans. Energy Convers. 21(3), 776–781 (2006) 5. Jing, Y.S., Baluja, S.: Visual rank: applying page rank to large-scale image search. J IEEE Trans. Pattern Anal. Mach. Intell. 11, 1877–1890 (2008) 6. Anil, K.J.: Data clustering: 50 years beyond K-Means. J Pattern Recognit. Lett. 31(08), 651–666 (2010)
Chapter 25
An Improved AdaBoost Algorithm for Handwriting Digits Recognition Chang Liu, Rize Jin, Liang Wu, Ziyang Liu, and Mingming Guo
Abstract This paper focuses on the improvement of accuracy and efficiency of the traditional AdaBoost algorithm for numerical recognition that updates the weights both for the observations and weak classifiers on each iteration. In this study, the authors use an improved version of AdaBoost algorithm with softmax, SVM, CNN, NN, and Random forest as its base classifiers to recognize the handwriting digits with higher accuracy in a more efficient way. Compared with the traditional AdaBoost algorithm, the improved algorithm could reach the same accuracy with fewer runs. A closer analysis of the algorithm reveals that this kind of mixing structure has the virtue of being flexible, open-ended, and without over-fitting problems for the out-of-sample observations.
C. Liu · M. Guo School of Futures and Securities, Southwestern University of Finance and Economics, Chengdu, China R. Jin School of Computer Science and Technology, Tiangong University, Tianjin 300160, China L. Wu Sichuan University, Chengdu, China Z. Liu (B) Department of Global Business, Kyonggi University, Suwon, South Korea e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_25
221
222
C. Liu et al.
25.1 Introduction Handwriting digits recognition is one of the most important fields of image processing and pattern recognition. It plays a role of great significance in almost every field in our daily life. However, it has encountered a great many challenges, due to the fact that it is not an isolated application technology, including some basic and common problems in other branches of pattern recognition field. Why do we choose the Arabic numeral as our object of study? Because the Arabic numeral is the only symbol that is commonly used by all the countries. Research on handwritten digit recognition basically has nothing to do with the cultural background, so it offers a big stage for researchers in all regions of the world to show their talents, which contributes to the analysis of new theory. Many areas, including postal encoding, statistical reports, financial statements, bank tickets, etc. have a huge demand for handwritten digits recognition technology. Handwriting recognition generally has two types, including online and offline. As draw sequence can be detected, there is a relatively higher accuracy in online recognition. Offline characters seem to be more random, owing to some personal factors of the writer, such as strokes, fonts, gradient, brightness, all of which give rise to a lower accuracy in recognition. Since high accuracy is required in handwriting digit recognition, we chose the online characters. Classification is always used for mode recognition. Common methods are as follows [1, 2]: classifiers based on current distance, classifiers based on neural networks [3–5], classifiers based on binary neural networks [6], and classifiers based on support vector machine (SVM) [7, 8]. The steps of handwriting digit recognition using traditional method are as follows [1, 9]: (1) image preprocessing; (2) image feature extraction; (3) training process; and (4) identification process. There is always a large quantity of calculation in the training process by using a complex algorithm. The AdaBoost algorithm has a great many advantages, providing us with an effective classifier of high accuracy. When using a simple classifier, the calculated results are very easy to understand and the method of building the weak classifier is extremely simple. Furthermore, when using the AdaBoost algorithm, one has no problem of over fitting. Nevertheless, the traditional AdaBoost algorithm has many disadvantages. In learning classification, if we use the weak classifier first, the efficiency and the time it takes can never live up to our expectation. Although the AdaBoost algorithm improves on the weak learning classifier, the ultimate classier is not so ‘strong’ as we expect. Therefore, in this study, we improve the classifier
25 An Improved AdaBoost Algorithm for Handwriting Digits Recognition
223
in the AdaBoost algorithm, which makes the algorithm more efficient and take a shorter time to converge. In other words, the traditional AdaBoost algorithm is to “improve from weak to strong,” and the improved AdaBoost algorithm is to “improve from strong to stronger.” Based on such idea, with some efficient traditional learning algorithms such as softmax, NN, CNN, SVM, Random Forests, we develop a creative AdaBoost algorithm different from the traditional method.
25.2 Adaboost Algorithm 25.2.1 Boosting Method The boosting method [10] is a common statistical learning method which is widely used and effective. In the classification problems, it can improve the classification performance by changing the weight of the training samples, learning multiple classifiers, and combining these classifiers. As an old saying goes, two heads are better than one. The boosting method is based on the idea that for a complex task, the judgment of different kinds of experts speaks louder than that of any one of them Kearns and Valiant [11] first proposed the concept of “strongly learnable” and “weakly learnable.” They pointed out that in the system of PAC (probably approximately correct) if there is a polynomial learning algorithm whose accuracy is very high, it can be called a strongly learnable algorithm. On the contrary, if there is a polynomial learning algorithm that is slightly better than random guess, it will be called the weakly learnable algorithm. It is interesting to note that later Schapire proved that a strongly learnable algorithm is to some extent equal to a weakly learnable algorithm. In other words, in the system of PAC, a strongly learnable algorithm can be boosted into a strongly learnable algorithm. Therefore, how to concretely “boost” becomes a problem of great significance, drawing the attention of many scientists in their research. AdaBoost is one of the most representative boosting methods. For a data set of a type 2 classification, T = {(x1 , y1 ), (x2 , y2 ),…, (xN , yN )}, each sample point consists of an instance Xi and a tag yi . Xi ∈χ ⊆ Rn , yi ∈γ = {-1, + 1}(χ refers to instance space and γ refers to tag set). Based on the following algorithm of AdaBoost, we can attain some weak classifiers or basic classifiers from the training data and combine these weak classifiers into a strong classifier.
224
C. Liu et al.
25.2.2 Algorithm: AdaBoost
(25.1)
(25.2)
(25.3)
(25.4)
(25.5) (25.6)
(25.7)
The basic property of AdaBoost is that it can continuously reduce the training error (the classification error rate in the training data set) in the learning process. Theorem 1 Training error bound of AdaBoost In the AdaBoost method, the training error of the ultimate classifier is
25 An Improved AdaBoost Algorithm for Handwriting Digits Recognition N 1 1 I (G(xi = yi ) ≤ exp(−yi f (xi )) = Zm N i=1 N i m
225
(25.8)
Theorem 2 Training error bound of two classifications M m=1
Zm =
M M 1 [2 em (1 − em )] = 1 − 4γm2 ) ≤ exp(−2 γm2 ), ym = − em 2 m=1 m=1 m=1 M
(25.9) If it exists γm > 0, for all γm > γ, then 1 I (G(xi ) = yi ) ≤ exp(−2Mγ 2 ) N i=1 N
(25.10)
This indicates that the training error of AdaBoost in this condition decreases at an exponential rate. This property is fairly attractive to us. Meanwhile, we should note that the AdaBoost algorithm does not need to take the lower bound into account. That was what Freund and Schapire considered when designing AdaBoost. Unlike some early ascension methods, AdaBoost is adaptive, that is, it can adapt to the weak classifier’s respective training error rate, so that is why the origin of its name has the word ‘Ada’, which is the abbreviation of Adaptive shorthand. There is another explanation for the AdaBoost algorithm. First, we should know the forward stagewise algorithm.
25.2.3 Forward Stagewise Algorithm For data set of a type 2 classification, T = {(x1 , y1 ), (x2 , y2 ),…, (xN , yN )}, each sample point consists of an instance Xi and a tag yi . Xi ∈χ ⊆ Rn , yi ∈γ = {−1, + 1}(χ refers to instance space and γ refers to tag set), Loss function L(y,f(x)), the set of basis function {b(x:γ )}, Learning addition model f (x) has the Forward stagewise algorithm as follows.
226
C. Liu et al.
In this way, the forward stagewise algorithm will simplify the optimization problem of all parameters to the optimization problem of each βm , γm . The AdaBoost algorithm is an exception of the forward stagewise algorithm. At this point, the model is an additional model consisting of the basic classifier and the loss function is an exponential function.
25.2.4 The Feature of AdaBoost Now, we can see the learning process of the AdaBoost algorithm. By iterative learning of basic classifiers, each iteration improves the weight of the data that is incorrectly classified by the previous round classifier and reduces the weight of the data that is correctly classified. Finally, the AdaBoost algorithm combines the basic classifiers linearly into strong classifiers, which provide large weights for basic classifiers with little error weights and provide little weights for basic classifiers with large error weights. The training error analysis of the AdaBoost algorithm shows that each iteration of AdaBoost can reduce its classification error rate in the training data set, which indicates its effectiveness as a boosting algorithm.
25.3 Experiment and Result In our experiment, our AdaBoost is different from the traditional one, due to the fact that we chose relatively stronger classifiers as our basic classifiers, rather than
25 An Improved AdaBoost Algorithm for Handwriting Digits Recognition
227
weak classifiers. Softmax, SVM, CNN, NN, and Random forest are among the basic classifiers we chose. The purpose of our experiment is to compare the efficiency of our AdaBoost, the traditional AdaBoost, and other algorithms. There are 70,000 samples in MNIST. We take 24,000 of them as the final test data and 30,000 of them for the first layer training, and the rest can be the test data for the first layer. After the first layer training, we use 15,000 of the 16,000 samples tested as the training data. In the traditional training, we input the data and train it, then this model can output the prediction. For example, In MNIST, we input the images and after training, we can recognize the image of digits written by people. But in our experiment, the output becomes the combination of algorithms. Considering that there exists a great many combinations of strong classifiers and we do not know which one is our optimal combination for our AdaBoost, we try to test the efficiency of all the algorithm. Usually, one algorithm in machine learning can only output one result. To solve this problem, we use the method of Decimal Conversion. For example, we have chosen A, B, C, D, E, and F. We can use 0–63 to stand for the combination. If we want to choose algorithm A, B, and D, their right will be 1, 2, and 8. So the digit for this combination is 11(1 + 2+8). The algorithms used in our experiment are softmax, SVM, CNN, NN, and Random Forests. In Fig. 25.1, we chose the algorithm with the highest score as the optimal algorithm and used the results (combinations) above to optimize prediction from the main algorithm. Then, we experimented with the traditional AdaBoost algorithm and other single algorithms to make comparisons, as shown in Fig. 25.2. From the comparison results, we find that the accuracy rate of our AdaBoost algorithm is 98.6%. It is seen that compared with the traditional AdaBoost algorithm and other single algorithms, our AdaBoost algorithm is better. According to our experimental results, the accuracy rates are as shown in Table 25.1.
Fig. 25.1 The comparison of AdaBoost, CNN, and SVM with regard to accuracy rate
228
C. Liu et al.
Fig. 25.2 The comparison of New AdaBoost and the traditional AdaBoost with respect to accuracy rate
Table 25.1 The accuracy rate of different algorithm
Algorithm
Accuracy rate (%)
Our AdaBoost
98.60
CNN
98.10
SVM
97.70
Random Forests
96.30
Traditional AdaBoost
94
Softmax
91.00
NN
90.70
25.4 Conclusion In our study, we created a better AdaBoost algorithm with stronger classifiers including softmax, SVM, CNN, NN, and Random Forests. Results indicated that our AdaBoost algorithm with relatively stronger classifiers had a higher accuracy rate than the traditional AdaBoost algorithm and other single algorithms. However, this AdaBoost algorithm requires very large samples, even 70,000 was not enough for training. Moreover, another limit of our study was the fact that we could not attain the optimal combination of the basic classifiers, since each algorithm had its own features. We hope future research can help us find the optimal combination of stronger classifiers in every specific circumstance.
25 An Improved AdaBoost Algorithm for Handwriting Digits Recognition
229
Acknowledgements Financial supports from China National Natural Science Foundation (Grant No. 71473204, 61806142), Natural Science Foundation of Tianjin (16JCYBJC42300, 18JCYBJC44000), National Social Science Foundation of China (16CJL042), and Fundamental Research Funds for the Central Universities (JBK1802013) are gratefully acknowledged.
References 1. Bottou, L, Cortes, C, Denker, J.S., Drucker, H., Guyon, I., Jackel, L.D., Lecun, Y., Muller, U.A., Sackinger, E., Simard, P., Vapnik, V.: Comparison of classifier methods-a case study in handwritten digit recognition. In: International Conference on Pattern Recognition. IEEE, Computer Society Press, Los Alamitos, pp. 77–82 (1994) 2. Babaioff, M., Feldman, M., Nisan, N.: Mixed strategies in combinatorial agency. J. Artif. Intell. Res. 38, 339–369 (2010) 3. Ben Hessine, M., Ben Saber, S.: Accurate fault classifier and locator for EHV transmission lines based on artificial neural networks. Math. Probl. Eng. (2014) 4. Raval, P.D.: ANN based classification and location of faults in EHV transmission line. Lecture Notes in Engineering and Computer Science. International Association of Engineers, Hong Kong, pp. 60–63 (2008) 5. Mateescu, R., Dechter, R., Marinescu, R.: AND/OR multi-valued decision diagrams (AOMDDs) for graphical models. J. Artif. Intell. Res. 33, 465–519 (2008) 6. Kim, J.H., Ham, B., Lursinsap, C., Park, S.K.: Handwritten digit recognition using binary neural networks. Lawrence Erlbaum Associates Publisher, Mahwah, pp. 245–248 (1993) 7. Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Mach. Learn. 48, 253–285 (2002) 8. Kash, I.A., Friedman, E.J., Halpern, J.Y.: Multiagent learning in large anonymous games. J. Artif. Intell. Res. 40, 571–598 (2011) 9. Liu, C.L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recogn. 36, 2271–2285 (2003) 10. Mammen, D.L., Hogg, T.: A new look at the easy-hard-easy pattern of combinatorial search difficulty. J. Artif. Intell. Res. 7, 47–66 (1997) 11. Kearns, M., Valiant, L.: Crytographic limitations on learning Boolean formulae and finite automata. In: 21st Symposium on Theory of Computing, ACM, pp. 433–444 (1989)
Chapter 26
FCM Based on Improved Artificial Bee Colony Algorithm An-Xin Ye and Yong-Xian Jin
Abstract To solve the problem of premature convergence and sensitivity to the accuracy of the initial clustering center in original fuzzy C-means clustering (FCM) algorithm, a new fuzzy C-means clustering based on Improved Artificial Bee Colony algorithm is proposed. Under the guidance of the best solution of the whole situation, a new mutation operation scheme is proposed. The exploitation and exploration ability of Improved Artificial Bee Colony method is improved. Experiments show that this method is more effective for complex clustering problems than traditional clustering methods.
26.1 Introduction Cluster analysis is one of the main directions of current algorithm research. Fuzzy C-means clustering is an important branch of cluster analysis, which was first proposed by Bezdek and Ehrlich in 1984 [1, 2]. In FCM, according to the minimization criterion function, the similarity of data elements and clustering centers is examined, and the membership degree of data elements and clustering centers is calculated. However, there are some shortcomings in the fuzzy clustering algorithm, such as premature convergence and sensitivity to the accuracy of the initial clustering center, many of which are the extension of the fuzzy clustering algorithm [3–5]. For example, Chen, Shen, and Long have improved the fuzzy clustering algorithm, which is based on the soft partition scheme and integrated many FCM clustering results [6]. In this method, the clustering number is used to FCM clustering the data, and the cumulative adjacency matrix is established by using the membership degree information. Finally, the graph cutting method is used to iteratively solve the cumulative adjacency A.-X. Ye (B) Xingzhi College Zhejiang Normal University, Jinhua 321003, Zhejiang, China e-mail: [email protected] Y.-X. Jin College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua 321003, Zhejiang, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_26
231
232
A.-X. Ye and Y.-X. Jin
matrix, and the clustering results are obtained. Liu, Zhang, and Wang proposed a new unsupervised FCM image segmentation method [7]. This method introduces local information of image region into the process of fuzzy clustering, and adaptively controls the range and intensity of interactive pixels. Compared with synthetic noise image, natural color image, and synthetic aperture radar image, experiments show that the method can segment images accurately. Wen and Zhan proposed a kernel fuzzy clustering method, which adjusts the distance function by Gaussian mapping and local search strategy [8].
26.2 Fuzzy Clustering Fuzzy clustering is grouping data object according to its membership degree. The values of this membership are between 0 and 1, with high membership values, indicating the high similarity between objects and groups. The most typical of these clusters is Fuzzy C-means clustering [9, 10].
26.2.1 Fuzzy C-Means Clustering X (X = {x1 , x2 , . . . , xn }) is divided into c groups (2 ≤ c < n), and V = {v1 , v2 , . . . , vc } are cluster centers. In fuzzy partition, each data can not be strictly divided into a certain group, but with a certain degree of membership belongs to a certain group. u ikindicates that the K data belongs to the membership of group i, c u ik = 1. Then the objective function of FCM is u ik ∈ [0, 1], and i=1 J (U, V ) =
n
c
k=1
i=1
m
(u ik ) (dik )2
(26.1)
U = [u ik ] is a similarity matrix, and it denotes the degree of membership. (dik )2 = ||xk − vi ||2 represents the distance between the kth data point and the center point i, and m is a fuzzy index. The purpose of FCM algorithm is to obtain a segmentation matrix U = [u ik ] and a set of central points. V = {v1 , v2 , . . . , vc }, the goal is to minimize the function J (U, V ). Update u ik and vi with the following equation in each partition: 1 2 m−1
(26.2)
n (u ik )m xk vi = k=1 n m k=1 (u ik )
(26.3)
u ik =
c j=1
dik d jk
26 FCM Based on Improved Artificial Bee Colony Algorithm
233
26.2.2 Fitness Function Each data point corresponds to a center. According to the FCM algorithm, the data set can be divided into C groups. The fitness function of the definition of clustering is as follows: fit =
1 1 + J (U, V )
(26.4)
26.3 The ABC Algorithm The ABC algorithm is based on group intelligence theory [11]. It is a popular bionic evolutionary iterative algorithm; the employed bees account for half of the population, and onlookers account for the other population in this algorithm. An employed bee corresponds to a food source. The higher the quality of the food source, the more likely the onlookers are to choose this food source [12, 13].
26.3.1 Initialization Stage At the beginning of the algorithm, the ABC algorithm first produces multiple randomly distributed food source locations. According to Eq. (26.5), the number of positions is SN, where SN represents the number of employed bees. Each solution is X i , i = 1, 2,…, SN. The maximum iterations are Maxcycle, and the swarm terminates the cycle times limit. + rand (0, 1) ∗ (x max − x min ) xi, j = x max j j j
(26.5)
n n n represents the mapping position of food source i at the , xi,2 , . . . xi,D xin = xi,1 min max and x are the upper and lower boundaries of values. nth iteration, x j
j
26.3.2 Employed Bees’ Stage During the employed bees stage, each employed bee discovers a new food source vi near based on its current source x i information. New food sources are derived from the following expressions: vi, j = xi, j + ϕi, j (xi, j − xk, j )
(26.6)
234
A.-X. Ye and Y.-X. Jin
In the expression k∈ 1, 2, …, SN and j ∈ 1, 2, …, D, and k = i, ϕi, j is a random number with values between −1 and 1. The employed bee compares the new solution with the existing ones, and chooses a better solution for preservation.
26.3.3 Onlooker Bees’ Stage According to the information of honey source shared by employed bees, onlooker bees inspect the quality of honey source by fitness value, and calculate the probability of onlooker bees choosing honey source by fitness value. Pi = fiti /
SN
fit j
(26.7)
j=1
The fiti indicates the fitness value of the i source, and the corresponding Pi refers to the probability of the choice of the i source, and the choice of the employed bee based on the roulette principle. The higher the value of Pi , the greater the probability of being selected by onlooker bee.
26.3.4 Scout Bees’ Stage If the limit iteration cycle is passed, the honey source can not be improved, the employed bee can be turned into a Scout bee. Scout bee finds a new random food source by means of Formula (26.5).
26.4 Fuzzy C-Means Clustering Algorithm Based on Improved Artificial Bee Colony (IABC) Algorithm There are two shortcomings in the ABC algorithm: (1) the ABC algorithm is a singledimensional search mode, and the convergence speed is slow; (2) the ABC algorithm has premature convergence defects. Therefore, some scholars have also extended and improved ABC algorithm, and made some progress [14, 15]. Inspired by Gbest-guided Artificial Bee Colony (GABC) algorithm, this paper improves the ABC algorithm and search method, and uses the optimal solution to guide the search direction. Through the change of parameters, the search process is dynamically adjusted.
26 FCM Based on Improved Artificial Bee Colony Algorithm
235
26.4.1 Some Improvements of Search Strategy In GABC algorithm, employing the current optimal solution, the employed bee uses Formula (26.8) to iterate. vi, j = xi, j + φi, j (xi, j − xk, j ) + ϕi, j (xbest, j − xi, j )
(26.8)
φi, j is a random factor in [−1, 1], ϕi, j is a random factor in [0, 1.5], xbest, j is the jth dimension of the current global best solution. i is the number of current iterations of ABC algorithm, Maxcycle is the maximum number of iterations for ABC algorithm. At the beginning of the improved artificial bee colony algorithm, because the value of i is small, the current global optimal search of bee colony has great influence on the employment bee. With the increase of search times, the effect of the current global optimal nectar source on the search direction of all hives is reduced in the later period, and the step length of the random search is increased to enhance the ability of the later algorithm to jump out of the local optimal. In this paper, the search equation of Eq. (26.8) is modified, and a new mutation operation is proposed, in which the global optimal solution is formed. Inspired by de’s mutation and crossover operation, this paper proposes four ways to update individuals based on individual information: global optimal individuals, random individuals in two groups, and present individuals, and obtains the mathematical model of crossover algorithm of Formula (26.9). vi, j = rand(−1, 1)(xi, j − xk1, j ) + rand(0, 1.5)(xbest, j − xk2, j )
(26.9)
where rand (−l, l) and rand (0, 1_5) are the adjustment coefficients. Random numbers from [1] and [0, 1.5] determine the direction of search, and the size determines the step size of search, x k1 and x k2 are two food sources randomly selected from the food source group. They satisfy the differences between i, K1, and K2. Gbest is the best solution at present. Through this mutation and crossover operation, which make full use of global optimal information, self-confidence information (current individual) and two randomly selected individual information. It balances the development and exploration functions of the algorithm and avoids premature convergence. The complete algorithm steps are described as follows. Description of IABC algorithm 1: When the termination condition is not satisfied; /* Initialization stage */; 2: Random generation of SN food sources using Eq. (26.5); /*Employed bee stage*/ 3: Using Eq. (26.8), employed bee conducts field searches around the corresponding food sources to find new food sources vi ; 4: The greedy algorithm is used to select new and old food sources and to preserve good quality food sources;
236
A.-X. Ye and Y.-X. Jin
5: If it is a new food source, the calculator value is set to 0, otherwise the value of the count is added to 1; /*Onlooker stage */ 6: The onlooker shares the honey source information of the employed bees, and calculates the quality of the honey source. The quality is expressed by the fitness function. Formula (26.7) calculates the quality of the honey source and chooses a kind of food source according to the roulette principle. /* Crossing and mutation stage*/ 7: By using Eq. (26.9), all food sources were crossed and mutated to find new food sources vi ; 8: The greedy algorithm is used to select new and old food sources and to preserve good quality food sources; 9: If it is a new food source, the calculator value is set to 0, otherwise the value of the count is added to 1; /*Scout bee stage */ 10: When the food source counter reaches the pre-set threshold limit, a food source is reproduced by formula (26.5); 11: End
26.4.2 Improved Search Method for Scout Add an additional judgment to the basic model of scout bees search, i.e., make a judgment before giving up a food source. If it is the current optimal solution, do not give up the food source. Else, abandon food source and turn employed bee into scout bee.
26.4.3 Basic Steps of Fuzzy C-Means Clustering Based on IABC Algorithm Step 1: The initial swarm of bees with SN nectar sources is generated randomly, the maximum iteration number is maxcycle, and the number of cluster is C, the minimum allowable error is ε. Step 2: A fuzzy cluster is carried out for each honey source. According to Eq. (26.7), the fitness of each honey source is calculated, the first 50% are employed bees and the latter 50% are onlookers. Go into the iteration stage until maxcycle’ times iterations is reached. Step 3: The employed bees perform neighborhood search based on Eq. (26.8) to generate new honey sources, calculate their fitness values and evaluate them, and to choose a new source of honey by greedy principle.
26 FCM Based on Improved Artificial Bee Colony Algorithm
237
Step 4: The onlookers choose employed bees according to the principle of roulette, and search within its neighborhood according to Eq. (26.8). The honey source is used as the clustering center, and then a fuzzy clustering is used to update the bee colony with the new clustering center formed after division. Step 5: f the honey source XI reaches the limit iteration, the current solution is not improved, then the current solution is discarded, the corresponding employment bee is transformed into the scout bee, mutation phase according to Eq. (26.9), and the new solution is produced in the search space to replace the current solution according to Eq. (26.5). Step 6: Record the optimal nectar source so far as to determine whether or not to meet the cycle termination condition: if the loop ends, output the final result; else, jump to step 2.
26.5 Experimental Simulations and Analysis There are two main steps to verify the algorithm: (1) verify the validity of IABC; (2) FCM clustering performance based on IABC.
26.5.1 The Performance of the IABC Algorithm In order to test three different algorithms, six typical functions [5] are selected as test data to compare the performance of ABC, GABC [15], and IABC algorithms. We examine the performance of various algorithms synthetically. The names, expressions, and trade-offs of functions are shown in Table 26.1. Among them, F1–F2 is Table 26.1 Six benchmark functions Function Sphere Quartic Griewank
Expression n f 1 (x) = i=1 xi2 n f 2 (x) = i=1 i xi4 + rand() f 3 (x) =
n
xi2 i=1 4000
n
2 i=1 (x i
−
n
x i range [−50, 50] [−1.28, 1.28] [−600, 600]
cos( √xi ) + 1
i=1
i
Rastrigin
f 4 (x) =
− 10 cos(2π xi ) + 10)
Ackley
f 5 (x) =
Schaffer
20 + e − 20 exp(−0.2 n i=1 xi2 ) − exp n1
n 2 sin2 i=1 x i −0.5 f 6 (x) = 0.5 +
n 2 xi2 1+0.001 i=1
n 1
[−5.12, 5.12]
n
[−32, 32]
cos(2π xi2 )
[−100, 100]
238
A.-X. Ye and Y.-X. Jin
a regional single-mode function, which is mainly used to test the accuracy of the algorithm and show the optimization efficiency of the algorithm. F3–f6 is a complex nonlinear multimodal function with multiple local extremum points. It has many different extremum points and can be used to test the global optimization ability of the algorithm.
26.5.1.1
Parameters’ Settings
The experiment is carried out in the environment of MATLAB 7.1. The computer uses the operating system of Windows 7, and the CPU is Pentium 4 (2.8 GHz). We set the runtime parameters as follows: population size SN = 100, Limit = 200, max iteration Maxcycle = 10,00. For each experiment, we performed 50 independent trail runs on each algorithm starting from random initial values.
26.5.1.2
Analysis of Experimental Results
The results of the test suite are shown as Figs. 26.1, 26.2, 26.3, 26.4, 26.5 and 26.6. 1. As shown in Fig. 26.1, for complex single-mode function spheres, the convergence speed of ABC algorithm is slower. GABC algorithm and IABC algorithm converge faster than ABC algorithm, and almost linearly converge to the optimal solution. As can be seen from Fig. 26.2, the convergence speed of these three
Fig. 26.1 Convergence curve of average fitness value of spherical function running 50 times
26 FCM Based on Improved Artificial Bee Colony Algorithm
239
Fig. 26.2 Convergence curve of average fitness value of quartic function running 50 times
Fig. 26.3 Convergence curve of average fitness value of griewank function running 50 times
algorithms is very fast in the initial stage of evolution, but it slows down obviously in the later stage of evolution, but IABC is still better than ABC and GABC. Therefore, IABC has better stability compared with the other two algorithms. 2. Multimodal functions Griewank, rastragin, Ackley, and Schaffer are all complex nonlinear global optimization problems that are mainly used to test the global
240
A.-X. Ye and Y.-X. Jin
Fig. 26.4 Convergence curve of average fitness value of rastrigin function running 50 times
Fig. 26.5 Convergence curve of average fitness value of ackley function running 50 times
search performance of the algorithm. From Figs. 26.4 to 26.6, we can observe that the convergence speed and accuracy of GABC and IABC are higher than ABC. The GABC algorithm and the IABC algorithm dynamically adjust the parameters so that they jump out of the local optimality and improve the convergence
26 FCM Based on Improved Artificial Bee Colony Algorithm
241
Fig. 26.6 Convergence curve of average fitness value of schaffer function running 50 times
accuracy of the algorithm. Because the IABC algorithm uses dynamic parameter tuning, the convergence speed is faster than the GABC algorithm.
26.5.2 Analysis of FCM Clustering Performance Based on IABC Algorithm In order to compare the performance of the three algorithms, we select four typical data sets from UCI database as test data (http://archive.ics.uci.edu/ml/index.php). The specific information is shown in Table 26.2. Table 26.2 Dataset information
Dataset
Data volume
Iris
150
Attributes
Glass
214
9
6
Vowel
990
10
11
Wine
178
13
3
4
Cluster number 3
242
26.5.2.1
A.-X. Ye and Y.-X. Jin
Parameters’ Settings
The experiment is carried out in the environment of MATLAB 7.1. The computer uses the operating system of Windows 7 and the CPU is Pentium 4 (2.8 GHz). We set the runtime parameters as follows: population size SN = 100, Limit = 200, max iteration Maxcycle = 5,00. For each experiment, we performed 50 independent trail runs on each algorithm starting from random initial values.
26.5.2.2
Experimental Testing and Results Analysis
In order to compare the performance of FCM, ABC-FCM and the algorithm proposed in this paper, four typical data sets in UCI database are clustered by fuzzy clustering, each algorithm runs 100 times. Comparing the clustering accuracy and running time of the four databases, the operation of the algorithm is shown in Table 26.3. In Table 26.3, three clustering algorithms have good clustering effect on simple data sets and achieve satisfactory clustering accuracy, while FCM in highdimensional data is vulnerable to the impact of the initial clustering center, and the clustering effect is poor. The accuracy of ABC-FCM algorithm and IABC-FCM algorithm is improved. The reason for the improvement of the accuracy of the algorithm is that the intelligent search behavior of artificial bees is introduced, which avoids premature convergence, and the clustering accuracy is higher than the other two algorithms. For the Wine dataset, IABC-KFCM is slightly inferior to the ABCFCM algorithm, but for other high-dimensional datasets, the clustering effect is also improved to varying degrees. We can observe that IABC-FCM improves the clustering accuracy and robustness greatly. Table 26.3 Cluster statistical results of various algorithm
Algorithm
Dataset
Accuracy (%)
IABC-FCM
Iris
98.25
Glass
98.87
528
Vowel
95.54
1269
Wine
90.38
4345
Iris
94.18
321
Glass
92.55
345
Vowel
85.38
1432
Wine
94.17
2412
Iris
90.56
634
Glass
81.57
367
Vowel
71.24
1563
Wine
81.23
1896
ABC-FCM
FCM
Run time (ms) 356
26 FCM Based on Improved Artificial Bee Colony Algorithm
243
26.6 Conclusion A clustering algorithm based on IABC algorithm is proposed. Combined with the current global optimal solution, a new crossover mutation operation is added to the artificial bee colony algorithm, which improves the development and exploration ability of IABC method and the efficiency of the algorithm. Experiments show that IABC-FCM algorithm runs longer than other similar algorithms, but its clustering accuracy has obvious advantages, especially in the aspects of multidimensional and complexity. In addition, the complexity of the algorithm is increased and the running time is increased due to the adoption of local optimum solution in the generation of new population and the adoption of cross and mutation factors in the evolution process. These problems are to be solved step by step in the follow-up work.
References 1. Shao, P., Shi, W., He, P.: Novel approach to unsupervised change detection based on a robust semi-supervised FCM clustering algorithm. Remote Sens. 8(3), 264–288 (2016) 2. Hore, S., Chakraborty, S., Chatterjee, S.: An integrated interactive technique for image segmentation using stack based seeded region growing and thresholding. Int. J. Electr. Comput. Eng. 6(6), 2773–2780 (2016) 3. Raja Kishor, D., Venkateswarlu, N.B.: Hybridization of expectation-maximization and k-means algorithms for better clustering performance. Cybern. Inf. Technol. 16(2), 1–17 (2016) 4. Nilanjan, D., Pamela, M.C., Amira, A.S.: Convolutional neural network based clustering and manifold learning method for diabetic plantar pressure imaging dataset. J. Med. Imaging Health Inform. 7(3), 1–23 (2017) 5. Anima, N.: S Suresh Chandra, A Amira S, D Nilanjan: Social group optimization for global optimization of multimodal functions and data clustering problems. Neural Comput. Appl. 30, 271–287 (2018) 6. Chen, H.P., Shen, X.J., Long, J.W.: Fuzzy clustering algorithm for automatic identification of clusters. ACTA Electronica Sinica 45(3), 687–694 (2017) 7. Liu, G., Zhang, Y., Wang, A.: Incorporating adaptive local information into fuzzy clustering for image segmentation. IEEE Trans. Image Process. 24(11), 3990–4000 (2015) 8. Wen, C.: Zhang Y: Gauss-induced kernel fuzzy C-means clustering algorithm. Comput. Appl. Softw. 34(8), 257–264 (2017) 9. Karaboga, D., Ozturk, C.: A novel clustering approach: artificial Bee colony(ABC) algorithm. Appl. Soft Comput. 11(1), 652–657 (2009) 10. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) 11. Li, G., Cui, L., Fu, X., et al.: Artificial bee colony algorithm with gene recombination for numerical function optimization. Appl. Soft Comput. 5(3), 146–159 (2017) 12. Cui, L., Zhang, K., Li, G.: Modified Gbest-guided artificial bee colony algorithm with new probability Model. Soft Comput. Simul. 21(1), 1–27 (2017) 13. Pan, Q.K., Wang, L., Li, Q.K.: A novel discrete artificial bee colony algorithm for the hybrid flow shop scheduling problem with makespan minimization. Omega 45(6), 2–56 (2014) 14. Shi, Y., Pan, C.M., Hu, H.: An improved artificial bee colony algorithm and its application. Knowl.-Based Syst. 107(9), 14–31 (2016) 15. Gao, W., Zhao, B., Zhou, G.T., Wang, Q.Y.: Improved artificial Bee colony algorithm based gravity matching navigation method. Sensors 14, 12968–12989 (2014)
Chapter 27
Airplane Trajectory Planning Using Artificial Immune Algorithm Lifeng Liu
Abstract This paper presents a path planning algorithm for aircraft in charge of autonomous Terrain-Following, Terrain-Avoidance, and Threat-Avoidance (TF/TA2 ). In TF/TA2 problem, the aircraft is required to search for a collide-free path on its own flyability and operation before any of them reaches the target. The problem formulation is based on Isaacs’ Target Guarding problem, but extended to the case of multiple evaders. The proposed path planning method is based on the idea of dynamic programming methods and is capable of producing trajectories within minutes to an accomplished task. Simulations are carried out to demonstrate that the resulting trajectories approach the optimal solution produced by nonlinear programming based Artificial Immune Algorithm (AIA). Experiments are also conducted on aircraft to show the feasibility of implementing the proposed path planning algorithm on physical applications.
27.1 Introduction The importance of Terrain-Following, Terrain-Avoidance, and Threat-Avoidance (TF/TA2 ) guidance for cruise missiles and aircraft is a well-known aspect of military operations [1]. Increasing demand has brought into focus several technologies associated with TF/TA2 , such as path planning. Currently, to successfully conduct a complex mission, different aircraft need to cooperate with each other to improve the performance of the group as a whole. In the battlefield, aircraft operates in a highly dynamic and uncertain environment and must quickly bypass threats from any direction. Aircraft has to operate in environments filled with threats (i.e., hills, buildings, and radar), which seriously influence its ability to maneuver safely. By cooperating with other aircraft, their ability to complete a mission is improved greatly. Furthermore,
L. Liu (B) University of Shandong Technology, Zibo 255049, Shandong, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_27
245
246
L. Liu
the planned path for navigation should be smooth to improve the aircraft flyability and operations. Path planning methods have been studied in previous work. A Voronoi Graph method can quickly plan paths in 2D conditions, but the method’s prerequisites include the addition of known threat information [2–9]. A* search algorithm is a heuristic algorithm and is a fast method for calculating the shortest path algorithm, but it is generally only used to find better solutions, rather than optimal ones. As a result, new methods are applied [10–14]. In this paper, we provide a thorough presentation of the AIA-based algorithms developed for this problem. The previous work on the aircraft path planning problem is first expanded into different scenarios to develop other artificial intelligent methods, such as GA and ANN, as mentioned above. With this method in place, online path planning algorithms are developed to produce one or more paths that are shown to be the best (or near-optimal) through a designated airspace. To demonstrate the method’s feasibility and robustness compared to GA, experiments were conducted in several complicated environments, where flying and maneuvering constraints were added to the trajectory.
27.2 Artificial Immune Algorithm (AIA) and Its Application on TA/TF2 Tracking 27.2.1 An Introduction to AIA Artificial Immunity Algorithm (AIA) is a type of computation model that is combined with a project application and uses the concept and theory of a biological organism’s immune system as a reference. It treats the question to be solved as an antigen, and the solution to the question is an immune body. The AIA is capable of high-level parallel processing, and does not only have learning and memory capabilities but also has related repair, distribution, and self-organization abilities. Therefore, the AIA provides a new way to yield intelligent control in this particular research field. Major steps in Artificial Immunity Algorithm for TA/TF2 path planning: (1) Parameter hypothesis input—aircraft’s constraint conditions are input, such as maximum turning angle, airspeed, maximum voyage, and so on. The population scale is expressed by pop-size, crossover probability by Pe and variation mode by Mv. (2) Antibody initialization—performed to recognize the antigen and withdraw the lower limit of the optimized variable’s empirical value from the anamnestic database. The immune body group is initialized by superimposing a random variable. (3) Affinity function—this defines the approximate degree of immune bodies to an antigen.
27 Airplane Trajectory Planning Using Artificial Immune Algorithm
247
The antibody’s affinity function depends on both the similarity to the antigen and the similarity to all the other antibodies in solution space. The affinity function is defined as the antibody’s comprehensive path length, which is equivalent to the cost over the whole path. Various types of threats that are involved are transformed into terrain threats so that the affinity function can comprehensively express the scope of their influence and the size of all the threats: affinity(li ) =
n−1 (xi+1 − xi )2 + (yi+1 − yi )2
(27.1)
i=1
where affinity(li ) is the affinity function value of li , and (x i , yi ), (x i+1 , yi+1 ) are the coordinate of point p(i) to p(i + 1) over a certain flight path. Since all of the threats discussed in this paper are transformed to terrain threats, the affinity function is very easy to deal with static threat (i.e., terrain threats) and dynamic threat (i.e., pop-up threat) or the presence of no-fly-zones. “n” indicates the number of flight points. (4) Terminal condition judgment—used to judge whether the terminal condition has been satisfied. If yes, the best immune body is added into the memory pool before terminating the program. Otherwise, the program is continued. (5) Selecting operation—according to the results calculated by the affinity function, the individual antibody chosen by the roulette-wheel method is placed into the next generation. (6) Crossover and mutation operation—on the basis of the selecting operation and the immune body selected, according to the establishment crossover probability, Pe and the variation mode Mv to perform the conventional crossover and mutation operations. (7) Community renewal—after the above operations, return to step (3). Although we can find a minimal threat trajectory for an aircraft by AIA, which is defined as a sectional line (an unsmooth line), it is not a feasible one for the aircraft, due to its flyability constraints. The smoothing process for the three-dimensional track is performed as follows. Aircraft can gather limited information during a flight, and when it meets a mountain slope exceeding its maximum climb angle, it becomes very hazardous to flight. As part of smoothing processing, when a detected flight angle is larger than the maximum turn angle, the aircraft must adjust its coordinates in accordance with any slope and curvature restrictions. In conclusion, the maximum turn angle constraints should be addressed by the horizontal track, and an aircraft pitch angle should be limited by the largest climb angle and law overload by the vertical track. Finally, we use a third-order B-spline curve to smooth the 3D flight path. During the flight point’s angle surpasses the maximum turn angle; the process of turn angle adjustment should be associated with. This means that the coordinates must be re-calculated to satisfy the restrictions in [− 60°, 60°]. The path should also be within the pitch angle of [− 20°, 20°].
248
L. Liu
27.3 Experiments and Results In this paper, 60 × 60 km airspace is chosen as simulated threat environment including mountains blocking the flight path. The starting point was fixed at (1, 1 km) for a single aircraft and the ending point is fixed at (60, 60 km). For two aircraft, the initial coordinates are (1, 8 km) and (10, 1 km) and the destination coordinates are (60, 60 km). In this paper, it is assumed to be and the flight conditions are: Pitching angle: −20° ~+20°, Maximum turning angle: −60° ~+60°, Minimum flight segment length: 2 km, Velocity range: 61–77 m/s. The trails were run on a Pentium (R)2 with a CPU speed of 2.2 GHz. In the next section, we will examine the trial results for a single aircraft and multiple aircraft, respectively. The scenario environment was static during the single aircraft trials. To verify the AIA method’s efficiency and robustness, two simulated scenarios were created within a virtual battle-space. (1) Test Scenario #1 The test scenario examined four threats situated in close proximity to each other, and along the aircraft reference path, with one threat approach required to reach the destination, as observed in Fig. 27.1. Additionally, the new threats are set as no-fly-zones. The threats are represented as colored cylinders, whose threat zone is represented by a set of white rings (for a single airplane) in the 2D plane. Experimental results verify the feasibility and effectiveness of the proposed AIA for solving the aircraft trajectory planning problem in a complex environment with two air plans, as observed in Fig. 27.2. While GA cannot create a satisfactory path that means in a complex environment the AIA is more feasible than GA. (2) Test Scenario #2 This third scenario involves two aircraft from different origins flying to a single destination in Fig. 27.3. The conclusions are similar to those from test scenario #1. GA can only provide a path for a single aircraft, whereas the AIA can provide
(a)
(b)
Fig. 27.1 The 2D (a) and 3D (b) flight path planned by AIA in complicate circumstance
27 Airplane Trajectory Planning Using Artificial Immune Algorithm
(a)
249
(b)
Fig. 27.2 2D (a) and 3D (b) path by GA for two aircraft in complicate circumstance
(b)
(a)
(c) Fig. 27.3 2D (a), 3D (b) flight path and replaced sub-optimal paths c for two aircraft by AIA in complicate circumstance
250
L. Liu
feasible flight paths for two aircraft. This includes not only minimizing the flight voyage but also maximizing the margin of safety associated with avoiding ground obstacles. This experiment also shows that the AIA is more efficient than GA in a complicated environment. The differential value of the two flight path lengths is 0.374 km. The individual flight path length for aircraft 1 (denoted in red) is 79.340 km and is 79.714 km for aircraft 2 (denoted in black). Based on this, it is very easy to arrive at the target at the same time by slightly adjusting the velocities. (3) The influence of AIA parameters: iteration and antibody number These experiments are designed to analyze the influence of iteration and antibody number change on flight cost. The average value of every five experiments is calculated as the last result with iteration number from 10 to 150. The result shows there exit an iteration whose comprehensive cost is the least. So, experiments are carried out to find the iteration number is 100. Considering the influence of each factor on flight routes, a new comprehensive evaluation index Total Cost (TC for abbreviation) is proposed to comprehensively consider the advantages of planning flight routes under the influence of comprehensive factors, which contains above three relative parameters (time, cost, and Pd): TC = time * cost * Pd. Two experiments are carried out to check the comparative influence of the iterative number on flight path and chromosome number. The total value is first reduced as the iteration number increases, but is later enhanced. This implies that greater iteration numbers do not necessarily lead to a better total cost value, and there exists one iterative algebra to obtain the best flight path.
27.4 Discussions and Conclusions This research has successfully demonstrated the design of flight paths for single and multiple aircraft in simple and complicated circumstances. Additionally, a comparison was performed between GA method and the AIA method. The results show that the AIA performs better than the GA for complex circumstances. At the same time, the iteration and antibody numbers have a significant effect on the total value. The greater the number of iterations and antibodies used, the smaller the total cost that is incurred. The first step in any future work should be to increase the number of tasks and aircraft explored. Second, we will improve the dynamic abilities of the threat environment and add communication and information exchange between the aircraft. Third, information conveying details on the threats should also be considered. Fourth, we must increase the confidence in the safety of the path planner by testing our method with a realistic tracker algorithm within a more sophisticated simulation environment. There is also room to optimize the current method design to save computing time because our present design was oriented solely towards demonstrating the algorithm’s feasibility.
27 Airplane Trajectory Planning Using Artificial Immune Algorithm
251
References 1. Tang, Q., Zhang, X.G., Liu, X.C.: TF/TA2 trajectory tracking using nonlinear predictive control approach. J. Syst. Eng. Electron. 17(2), 396–401 (2006) 2. Hammouri, O.M., Matalgah, M.M.: Voronoi path planning technique for recovering communication in UAVs. In: Proceedings of the AICCSA 2008 IEEE/ACS International Conference on Computer Systems and Application, Doha, Qatar, pp. 403–406 (2008) 3. Flores, F.G., Kecskeméthy, A.: Time-Optimal path planning for the general waiter motion problem. Adv. Mech. Rob. Des. Educ. Res. MMS 14, 189–203 (2013) 4. Abu-Dakka, F.J., Rubio, F., Valero, F., Mata, V.: Evolutionary indirect approach to solving trajectory planning problem for industrial robots operating in workspaces with obstacles. Eur. J. Mech. A. Solids 42, 210–218 (2013) 5. Moon, S., Oh, E., Shim, D.H.: An integral framework of task assignment and path planning for multiple unmanned aerial vehicles in dynamic environments. J. Intell. Rob. Syst. 70(1–4), 303–313 (2012) 6. Muñoz, P., Barrero, D.F., Rmoreno, M.D.: Run-Time analysis of classical path-planning algorithms. Res. Dev. Intell. Syst. XXIX 12, 137–148 (2012) 7. Chrpa, L., Osborne, H.: Towards a trajectory planning concept: augmenting path planning methods by considering speed limit constraints. J. Intell. Rob. Syst. 12, 1–28 (2013) 8. Lau, G., Liu, H.T.: Real-Time path planning algorithm for autonomous border patrol: design, simulation, and experimentation. J. Intell. Rob. Syst. 7, 1–23 (2013) 9. Wang, G., Guo, L.H., Duan, H., Liu, L., Wang, H.Q.: A modified firefly algorithm for UCAV path planning. Int. J. Hybrid Inf. Technol. 5(3), 123–143 (2012) 10. Mukherjee, A., Kausar, N., Dey, N., Ashour, A.S.: A disaster management specific mobility model for flying Ad-hoc network. Int. J. Rough Sets Data Anal. 82–100 (2016) 11. Samanta, S., Mukherjee, A., Ashour, A.S., et al.: Log transform based optimal image enhancement using firefly algorithm for autonomous mini unmanned aerial vehicle: an application of aerial photography. Int. J. Image Graph. 18(4), 1850019 (2018) 12. Tyagi, S., Rana, Q.P., Som, S.: Trust based dynamic multicast group routing ensuring reliability for ubiquitous environment in MANETs. Int. J. Ambient Comput. Intell. 8(1), 70–97 (2017) 13. Zhang, W., Qi, Q., Deng, J., et al.: Building intelligent transportation cloud data center based on SOA. Int. J. Ambient Comput. Intell. 1–27 (2017) 14. Benadda, M., Bouamrane, K., Belalemm G.: How to Manage Persons Taken Malaise at the Steering Wheel Using HAaaS in a Vehicular Cloud Computing Environment, pp. 70–87. IGI Global (2017)
Chapter 28
A New Method for Removing Image Noise Suhua Wang, Zhiqiang Ma, and Rujuan Wang
Abstract A new denoising method is proposed to solve the problem that the noise of three-dimensional target images collected by photoelectric theodolites is varied and randomly distributed. Firstly, based on the universal law of gravity, the crosscorrelation degree is calculated by taking the image pixels as the particles. Secondly, the Otsu algorithm is used to calculate the threshold to determine whether the pixel is noise. Finally, the results of the denoising algorithm are compared with those of Wiener filter, mean filter, and median filter, and the four algorithms are compared and evaluated from both subjective and objective data [1–3]. The results show that when the salt and pepper noise density is less than 20% [4], the visual effect of the algorithm is better than the other four algorithms, and the peak signal-to-noise ratio (psnr) is between 30 and 45 dB. The PSNR is increased to more than 10 dB by using this algorithm instead of the optimal algorithm of the other three algorithms. The visual effect is consistent with the data. The pixel correlation filtering algorithm is feasible and meets the needs of practical engineering.
28.1 Introduction Using photoelectric theodolite to track the target in the sky, the image will be captured by noise or even lose the target due to the influence of weather conditions such as light and air, and the error of image signal in the process of circuit transmission. Therefore, finding an effective algorithm to remove noise and obtain useful signals becomes one of the important ways to improve the performance of photoelectric theodolites.
S. Wang · Z. Ma (B) · R. Wang (B) College of Humanities, Northeast Normal University, Changchun 130117, China e-mail: [email protected] R. Wang e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_28
253
254
S. Wang et al.
According to the probability density function, image noise can be divided into: Gauss noise, Rayleigh noise, impulse noise, and so on. In this paper, the impulse noise in the image is denoised. There are many [5–9] methods for image filtering. OTSU algorithm performs well in the field of image processing [10]. However, due to the diversity and randomness of noise in three-dimensional images, this method has its shortcomings. Although Wiener filtering has the ability to process three-dimensional images, it has a large computational complexity, which is not conducive to practical engineering. The law of gravitation is one of Newton’s great achievements [11, 12]. Its main idea is that force is proportional to its mass and inversely proportional to distance. And it is a law applicable in three-dimensional space. Inspired by this rule, a new image denoising method is proposed in this paper. The feature of this method is that the pixels in the image are regarded as a particle, and the correlation between the pixels is calculated by the law of the gravitational equation. A threshold segmentation algorithm based on Otsu algorithm for maximum interclass variance is proposed. Finally, the two methods are compared, and if the correlation degree is greater than the threshold, a pixel is determined to be at that point. Noise is filtered and removed.
28.2 Image Denoising Method Based on Pixel Relationship 28.2.1 Newton’s Law of Universal Gravitation The law of universal gravitation is described, two arbitrary centroid by connecting line direction of the force of attraction. Quality product with the force proportional to their, inversely proportional to the square of the distance between them, has nothing to do with the chemical nature of two objects or physical state as well as the mediator. It is proportional to the product of two centroids’ quality and inversely proportional to the square of the distance between them. And it has nothing to do with the chemical nature of two objects or physical state as well as the mediator. As shown in Fig. 28.1. Fig. 28.1 Law of universal gravitation
28 A New Method for Removing Image Noise
255
28.2.2 The Algorithm Principle The image is composed of a plurality of pixels with different gray value, and each pixel has its corresponding neighborhood pixels. Assuming that each pixel is a centroid, the gray value of each pixel as its quality, and then there is relationship between each pixel, similar gravitational interaction. The magnitude and direction of correlation vector between the pixel and its neighborhood pixels reflect the important features of the pixel, and can be used to measure whether the pixel image noise. When the noise does not exist in the image, a pixel and its neighboring pixels values did not change significantly; when there is noise in the image, noise generated near the step change, and the gray value of pixels with pixel neighborhood mutation. Assuming circular and square represents a pixel in an image, as shown in Fig. 28.2. For any pixel I (i, j) in the image, suppose its 3×3 neighborhood set is . We have: Φ = {P(m, n) and (m, n) = (i, j)}
(28.1)
From (28.1) we have the relationship between I(i, j) and any other pixel P(m, n) in : → 2 → → ˆ (28.2) F I,P = G · m I m P r P, I / r P, I 2
→ where, m I , m P : The gray level of pixel I (i, j) and P (m, n), respectively. r P, I : → 2 The coordinate distance between pixel I (i, j) and P (m, n), and r P, I = 2 (m − i)2 + (n − j)2 · rˆI, P : A unit vector between pixel I(i, j) and P(m, n) and → → → → rˆI, P = ( r P − r I )/ r P − r I . Assuming that the upper left corner of the image pixel is in rectangular coordinate system origin O (0, 0). The coordinates of the horizontal X-axis is to the right and the Y-axis vertically downward. As shown in Fig. 28.3.
(a) A gray pixel of no step change neighborhood
(b) Another gray pixel of no step change neighborhood
Fig. 28.2 The relationship between image pixels
(c) A gray pixel of step change neighborhood
(d) Another gray pixel of step change neighborhood
256
S. Wang et al. O(0,0)
Fig. 28.3 The image of the coordinate system
x
P(m,n)
I(i,j)
y
Assuming θ is the angle between the center line from P (I, J) to I (m, n) and X. From Eq. (28.2) we have F in the horizontal direction is: I,P
x
F
I, P
=F
I, P
3 3 m − i = Gm I m P (m − i)/ r 2 • cos θ = Gm I m P r / r 2 • r (28.3)
In the vertical direction is: y
F
I, P
=F
I, P
3 3 n − j • sin θ = Gm I m P r / r 2 • → = Gm I m P (n − j)/ r 2 r (28.4)
The direction of F is consistent with the positive and negative of cosθ. I,P
From Eqs. (28.3) and (28.4), we have the total correlation between I(i, j) and the pixels in : F=
2
x
F
I, P
+
2
y
F
I, P
(28.5)
Calculate the correlation from the 3 × 3 neighborhood of pixel I (I, J). Let be a threshold, we have:
F > ψ, I (i, j) is noise; F ≤ ψ, I (i, j) is not noise
(28.6)
28 A New Method for Removing Image Noise
257
28.2.3 The Threshold Determination According to the principle of Otsu algorithm, the image is divided into target ω1 and noise ω2 . It will use the correlation between the pixel and its neighborhood distinguishes the two categories. Let us denote the range of F by [ 1 , 2 ], the pixel number by Ni whose correlation in the image by Pi =Ni /N, then the number value is i and the probability of
appearing Ψ2 N . of pixels in the image is N = i=Ψ i 1 Let us denote threshold by , then the correlation degree of class ω1 range is [ 1 , ] and the correlation degree of class ω2 range is [, 2 ].
28.3 Experimental Results In order to verify the effectiveness of the algorithm, we compare the pixel correlation degree denoising algorithm with several classical denoising algorithms results. The experimental image is a type of fighter image whose size is 256 × 256. Adding 20% Salt and pepper noise in the original image. As can be seen from Fig. 28.4, it can draw the following conclusions. There is obvious noise after Wiener filtering and mean filter. Most of the noise can be removed by median filtering algorithm, but there is still small amount of noise. Pixel correlation degree algorithm has best results which is better than the other ones.
(a) Fighter artwork type
(d) Mean filter image
(b) 20% noise image
(e) Median filter image
(c) Wiener filter image
(f) Pixel correlation degree filter
Fig. 28.4 Comparison of different algorithms for the filtering results
258
S. Wang et al.
The peak signal-to-noise ratio (PSNR) was used to evaluate the quality of image reconstruction and the pros and cons of an algorithm by its values. The equation of PSNR is shown as (28.7). ⎛ PSNR = 10 × lg⎝2552 /{[1/(N × N )]
N −1 N −1
2
⎞
xmn − xmn ˆ }⎠
(28.7)
m=0 n=0
where xmn and xˆmn are the pixel values of original and corresponding filtered ones of the image, respectively. Add 0.05–1 density noises into the original image. The discrete diagram is shown in Fig. 28.5. It can draw the following conclusions. When the density of salt and pepper noise is less than 20%, the correlation degree of denoising effect is very good and better than other algorithms. The PSNR values are between 30 and 45 dB, which indicates that the image quality is good. This algorithm can improve PSNR 10 dB more than the median filtering, consistent with the subjective feeling. In order to further verify the experimental results, we use different scenarios to compare with BM4D algorithm, as shown in Fig. 28.6. The results show that our algorithm has a good effect.
Fig. 28.5 Comparison of PSNR of different algorithms
28 A New Method for Removing Image Noise
259
(a) original image
(b) noise image
(c) BM4D denoising
(d) our denoising
Fig. 28.6 Comparison of different algorithms for the filtering results
28.4 Conclusion Through the target image analysis of photoelectric theodolite to capture the prone to salt and pepper noise problems, put forward pixel correlation degree denoising algorithm. Through a lot of experiments, the correlation degree of pixels denoising algorithm has better effect in filtering the noise, and the experimental data is consistent with subjective feelings. The experimental results show that, when the density of salt and pepper noise is less than 20%, it has the best result by the algorithm in this paper, the PSNR is between 30 and 45 dB. Using this algorithm, the peak signal-to-noise ratio of the image is increased by more than 10 dB, which is better than the median filter. This correlation degree algorithm is feasible, and meets the actual needs of the project. Acknowledgements This work is funded by the Science and Technology Development Plan Project of Jilin Provincial Science and Technology Department (No. 20190302028GX) and Jilin IT Education and Research Base.
260
S. Wang et al.
References 1. Guang, W., Yunguo, G., Yakun, M., Xiangyao, X., Wenbao, Z., Shuai, S.: High accurate docking technology of laser and theodolite. Infrared Laser Eng. 47(7), 601–607 2018) 2. Dey, N., Ashour, A.S., Beagum, S., Pistola, D.S., Gospodinov, M., Gospodinova, E.P., Tavares, J.M.R.S.: Parameter optimization for local polynomial approximation based intersection confidence interval filter using genetic algorithm: an application for brain MRI image denoising. J. Imaging 1(1), 60–84 (2015) 3. Suhua, W, Xiang-heng, S., Lu, Y.: Calibration of contrast for adjustable contrast optical target equipment. Opt. Precis. Eng. 20(5), 950–956 (2012) 4. Hemalatha, S., Anouncia, S.M.: A computational model for texture analysis in images with fractional differential filter for texture detection. Int. J. Ambient Comput. Intell. 7(2), 93–113 (2016) 5. Wen, P., Wang, X., Wei, H.: Modified level set method with Canny operator for image noise removal. Chin. Opt. Lett. 8(12), 1127–1130 (2010) 6. Pad, P., Uhlmann, V., Unser, M.: Maximally localized radial profiles for tight steerable wavelet frames. IEEE Trans. Image Process. 25(5), 2275–2287 (2016) 7. Stefano, A., White, P., Collis, W.: Training methods for image noise level estimation on wavelet components. EURASIP J. Appl. Signal Process. 16(12), 2400–2407 (2004) 8. Moraru, L., Moldovanu, S., Dimitrievici, L.T., Shi, F., Ashour, A.S., Dey, N.: Quantitative diffusion tensor magnetic resonance imaging signal characteristics in the human brain: a hemispheres analysis. IEEE Sens. J. 17(15), 4886–4893 (2017) 9. Hu H., Wang, M., Yang. J.: Adaptive filter of fuzzy weighting mean value. Syst. Eng. Electron. Tech. 24(2), 15–18 (2002) 10. Rajinikanth, V., Dey, N., Satapathy, S.C., Ashour, A.S.: An approach to examine magnetic resonance angiography based on Tsallis entropy and deformable snake model. Future Gener. Comput. Syst. 85(3), 160–172 (2018) 11. Shyu, Y.: An ultimate renewable energy. In: Proceeding of Annual International Conference on Green Information Technology, pp. 65–68. Universal Gravitation, Singapore (2010) 12. Otsu, N.: A threshold selection method from graylevel histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
Chapter 29
Sparse Impulsive Noise Corrupted Compressed Signal Recovery Using Laplace Noise Density Hongjie Wan and Haiyun Zhang
Abstract In the typical compressed signal recovery methods, the noise involved in the measurement process is assumed to be normal distributed. However, there might be impulsive noise existing in the real-world environment. Therefore, the assumption of normal distribution may lead to inaccurate recovery result. This paper proposes the Laplace density to model the measured noise of compressive sensing. A hierarchical Bayesian model is built for the model of the compressive sensing. The signal is assumed to be sparse under some transformation, and each coefficient of the transformation is supposed to be normal distributed with each precision being modeled by a gamma distribution. The zero coefficients can be automatically switched off via the proposed model setting. To estimate the parameters of the model, Variational Bayesian method is adopted to the proposed model. The proposed method is applied on the synthetic signal and the image signal; experimental results demonstrate the validity of the method.
29.1 Introduction The compressive sensing theory has attracted great attention in the past ten years, which is known to be the advances in signal processing after the Nyquist sampling theory. The method has been used in speech signal processing [1], image compression [2], and is also used in 3D image processing, such as compressive spectral imaging where 2D projections of the data are acquired [3], and 3D laser imaging [4] where the 3D problem is converted to intensity and depth map estimation problem which has the same form as that in the following content. H. Wan (B) Department of Information Engineering, Beijing University of Chemical Technology, Beijing 100029, China e-mail: [email protected] H. Zhang College of Information Science and Engineering, Fujian University of Technology, Fuzhou 350118, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_29
261
262
H. Wan and H. Zhang
Most of the signal in the real world is sparse under some transformation, and this is the basis of compressive sensing. Assuming the signal vector is x with size N × 1, which can be transformed under a matrix A, and the coefficient is θ, then the relation can be described as x = Aθ. In compressive sensing, a measurement matrix B with size M × N is used to acquire the signal, where M < < N, hence the acquisition and compression can be obtained simultaneously, which is different from the traditional methodology of compression after acquisition. Substituting the transformation relation into the measurement procedure, the acquisition procedure can be formulated as [5] y = θ + e
(29.1)
where e in Eq. (29.1) represents the measurement noise, and the matrix = BA. The traditional transformation can be DCT, wavelet [6], or a dictionary learned for specific type of signals [7]. Each element in the noise vector e is assumed to be independent, identically distributed normal variable. Under this assumption, many methods have been proposed to the recovery of the measured signal following Eq. (29.1). The Lasso method which is used for linear regression problem [8] has the same oriented problem as Eq. (29.1), and can be subjected to l1 minimization approach. The orthogonal match pursuit (OMP) algorithm is one of the most popular methods, which greedily selects the item which best matches the signal [9]. There still exist several methods in the same category, such as CoSamp [10] and ROMP [11]. In this paper, the measurement noise is modeled by a Laplace distribution. The Laplace distribution has thicker tail compared to the normal distribution, which is robust to impulsive noise. In order to obtain tractable solution, the Laplace distribution is factorized as a product of normal distribution and exponential distribution. A normal distribution is placed over each item of the sparse coefficient vector, and a gamma distribution is placed over its precision. Such type of prior can automatically switch off the zero coefficients while recovering the signal. Variational Bayesian method is proved to have better performance, so it is adopted to estimate the parameters of the model [12]. The structure of the paper is organized as the following. The second section gives out the model of compressive sensing using Laplace distributed noise. The third section describes the Variational Bayesian method for the estimation of the model parameters. In the fourth section, experiments are carried out. The last section concludes the paper.
29.2 Model Building of the Compressive Sensing This section gives out the Bayesian modeling of the compressive sensing measurement equation as Eq. (29.1). According to the previous section, the size of the measurement vector y is M × 1, the size of is M × N, the size of θ is N × 1, and the size of e is M × 1. The unknown parameters are θ and e, which need to be estimated.
29 Sparse Impulsive Noise Corrupted Compressed Signal Recovery …
263
Each element of θ is assumed to be normal distributed with precision δ n as following p(θn |δn−1 )
=
(2π )−1/2 δn1/2
δn θn2 exp − 2
(29.2)
The precision is the inverse of the variance. A gamma distribution with shape aδ and rate bδ is placed on each precision, which is p(δn |aδ , bδ ) = (aδ )−1 bδaδ δnaδ −1 e−bδ δn
(29.3)
Laplace distribution is placed on each element of the noise e, which has the following form √ p(em ) =
√ λ exp − λ|em | 2
(29.4)
This can be factorized into two distributions as the following two equations p(em |γm ) = N (0, γm ) p(γm |λ) =
λγm λ exp − 2 2
(29.5) (29.6)
The factorization is based on the fact that the integral of Eqs. (29.5) and (29.6) yields a Laplace distribution p(en |λ) =
p(em |γm ) p(γm |λ) =
√ √ λ exp − λ|em | 2
(29.7)
In the following contents, the above factorization will replace Eq. (29.4), which will make the solution more feasible than before. The prior of the parameter λ is also a gamma distribution with shape aλ and rate bλ as Eq. (29.3).
T Let δ = [δ1 , · · · , δn , . . . , δ N ]T and γ = γ1 , · · · , γm , . . . , γ M , then all the parameters of the model are θ, δ, γ, λ, and the hierarchical Bayesian modeling of the compressive sensing is obtained from the above model setting. The parameters aδ , bδ , aλ, and bλ are hyper parameters of the model which do not need to be estimated, and all have the same value as 1e−6 in order to have uninformative prior of the parameters. The full probability of the model is p(y, θ, δ, γ, λ) = p(y|θ, γ)p(θ|δ) p(γ|λ) p(λ)p(δ)
(29.8)
264
H. Wan and H. Zhang
29.3 Parameter Estimation of the Model To estimate the parameters of the model, Variational Bayesian method is applied to the model. Each parameter is approximated by an auxiliary distribution, thus four distributions, q(θ), q(δ), q(γ), and q(λ) are used. In VB framework, the auxiliary distribution is obtained by minimizing the Kullback–Leibler (KL) distance between the auxiliary distribution and the conditioned distribution [13] KL =
q() log
p(y, ) d q()
(29.9)
where = {θ, δ, γ, λ}. After minimization KL distance, the parameters of auxiliary distributions of the model can be obtained. The first example is θ, of which the auxiliary distribution has the following form [14]
q(θ) ∝ exp log p(x|θ, γ)p(θ|δ) q(γ)q(δ)
(29.10)
In the above equation, •q(•) means the calculation of the expectation with regard to the distribution q. In the following contents, the subscript related with expectation computation will be omitted for brevity. After substituting the necessary equations into Eq. (29.10), the distribution of q(θ) is achieved as a normal distribution with variance and mean as the following two equations −1
= T diag(γ)−1 + δ
(29.11)
μ = T diag(γ)−1 y
(29.12)
The operator diag(x) means a diagonal matrix will be created with the diagonal elements equal to the vector x. Using the same procedure, the distribution of δ n is a gamma distribution with the following shape and rate parameters 1 2
(29.13)
1 < θn2 >, n = 1 . . . N 2
(29.14)
αδ = aδ + βδn = bδ +
As for γ and λ, it is easy to find that q(λ) is a gamma distribution with the following shape and rate parameters αλ = aλ + M
(29.15)
M 1 γm βλ = bλ + 2 m=1
(29.16)
29 Sparse Impulsive Noise Corrupted Compressed Signal Recovery …
265
However, it is very difficult to find tractable solution for q(γ). Fortunately, the expectation of each element of γ can be obtained. From Eqs. (29.11) and (29.12), it can be seen that the expectation of the inverse of γ is still required. The expectation of γ and that of its inverse is as following (ym − m θ)T (ym − m θ) 1 γm = + λ λ √ λ 1 = γm (ym − m θ)T (ym − m θ)
(29.17) (29.18)
In above equations, Fm is the m-th row of F. The expectation of λ is λ = αλ βλ
(29.19)
The expectation of δ n has the same form. And the expectation in Eqs. (29.17) and (29.18) has the following form
(ym − m θ)T (ym − m θ) = ym2 − 2ym m θ + m θθT mT
(29.20)
In order to have matrix form, each value of Eq. (29.20) can be calculated conveniently in simulation software such as MATLAB, the matrix form can be derived as
(ym − m θ)T (ym − m θ) ⇒ y.∗ (y − 2θ + diag θθT T
(29.21)
29.4 Experiment To demonstrate the performance of the algorithm, experiments are carried out on synthetic sparse signal and image signal. The OMP algorithm is used for comparison. In the first experiment, the sparse signal is generated by random selection of 20 positions from 512 points and the signal on these positions are randomly set as +1 or −1, the rest elements of the signal are zero. As for the matrix F, the measurement number is set to be 100. Each element of F is generated using a standard normal distribution. Then the magnitude of each row of the matrix is made to be one. To simulate the distribution with impulsive noise, 100 values are firstly extracted from a normal distribution with mean zero and standard deviation 0.05; secondly, 15 positions are randomly selected from 100 positions; thirdly, on each position a signal with amplitude of 7 times the standard deviation of the normal distribution is used to be the impulsive. The original signal, the recovered signal by OMP, and the proposed method is displayed in (a), (b), (c) of Fig. 29.1, respectively. The results show that the
266
H. Wan and H. Zhang
Fig. 29.1. The original signal, reconstruction results of OMP, and the proposed method
proposed method is better than OMP method, and there is much inaccurate estimation in the results of OMP. The recovery error of the impulsive point number 5, 10, 15 is listed in Table 29.1. In the second experiment, an image with size 32*32 is used for compressive sensing. The image is firstly decomposed using db1 wavelet and the level of decomposition is 3. Then the measurement matrix is built in the same way as in the first experiment, and the measurement is obtained by multiplication of the measurement matrix and the coefficient. After that, the noise is added to the measurement. The standard deviation of the normal noise is 0.5; 50 points are randomly selected as the impulsive position; the amplitude of the impulsive point is randomly selected within the range of 6 to 9 times of the standard deviation of the normal distributed noise. The recovery results are displayed in Fig. 29.2. The norm of the recovery error of the proposed Laplace method is 374.2, while that of OMP is 430.1. Table 29.1. Recovery error under different impulsive points
Impulsive number
OMP error
Laplace error
5
1.34
0.03
10
1.16
0.04
15
1.14
0.04
29 Sparse Impulsive Noise Corrupted Compressed Signal Recovery …
267
Fig. 29.2. The original image, reconstruction images of the proposed method with norm of the error = 374.2, and OMP with norm of the error = 430.1
29.5 Conclusion In this paper, the compressive sensing model using a thick-tailed noise distribution is proposed. Hierarchical Bayesian model is built for the model, and Variational Bayesian method is used to estimate the parameters of the model. The proposed method is compared with the typical OMP method. The experiments on synthetic sparse signal and image signal are carried out to demonstrate the performance of the proposed method. Although the method is only applied to 1D and 2D signal, it can also be used in 3D applications as there have already been such research examples that have extend the 1D and 2D method to 3D field.
References 1. Giacobello, D., Christensen, M.G., Murthi, M.N., et al.: Sparse linear prediction and its applications to speech processing. IEEE Trans. Audio Speech Lang. Process. 20(5), 1644–1657 (2012) 2. He, L., Carin, L.: Exploiting structure in wavelet-based Bayesian compressive sensing. IEEE Trans. Signal Process. 57(9), 3488–3497 (2009) 3. Hinojosa, C., Bacca, J., Arguello, H.: Coded aperture design for compressive spectral subspace clustering. IEEE J. Sel. Top. Signal Process. 12(6), 1589–1600 (2018) 4. An, Y., Zhang, Y., Guo, H., Wang, J.: Compressive sensing-based three-dimensional laser imaging with dual illumination. IEEE Access 7, 25708–25717 (2019) 5. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52, 1289–1306 (2006) 6. Mallat, S.: A Wavelet Tour of Signal Processing, 2nd edn. Academic, New York (1998) 7. Sakhaee, E., Entezari, A.: Joint inverse problems for signal reconstruction via dictionary splitting. IEEE Signal Process. Lett. 24(8), 1203–1207 (2017) 8. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996) 9. Tropp, J.A., Gilbert, A.C.: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 53, 4655–4666 (2007) 10. Needell, D., Tropp, J.A.: CoSaMP: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009) 11. Needell, D., Vershynin, R.: Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. IEEE J. Sel. Top. Signal Process. 4(2), 310–316 (2010)
268
H. Wan and H. Zhang
12. Tzikas, D.G., Likas, A.C., Galatsanos, N.P.: The variational approximation for Bayesian inference. IEEE Signal Process. Mag. 25(6), 131–146 (2008) 13. Bishop C M: Pattern Recognition and Machine Learning. Springer, New York (2006) 14. Wan, H., Ma, X., Li, X.: Variational Bayesian learning for removal of sparse impulsive noise from speech signal. Digit. Signal Proc. 73, 106–116 (2018)
Chapter 30
An Improved Criminisi’s Image Inpainting Algorithm for Priority and Matching Criteria Yao Li, Junsheng Shi, Feiyan Cheng, Rui Xiao, and Dan Tao
Abstract There are many applications of image inpainting. Criminisi algorithm is a classical sample-based inpainting method in image inpainting technology. Aiming at the distortion caused by unreasonable priority calculation and similarity matching criteria calculation in the Criminisi algorithm, in this paper, an improved Criminisi algorithm for priority and similarity matching criteria are presented. Experiments show that the improved algorithm has better performance.
30.1 Introduction Image inpainting is an important area in image processing, which has important application significance in the fields of cultural relic protection[1], inpainting of damaged old photographs, and removal of redundant objects [2] on images. Authors in [3] present an overview of an image inpainting. Image inpainting algorithms based on texture synthesis and PDE (Partial Differential Equation) are two basic methods of image inpainting. Criminisi algorithm [4] is a typical image inpainting algorithm based on texture synthesis, and the main purpose of this algorithm is to find the most similar target block in the known area of the image and fill it in the damaged area. Although Criminisi algorithm has good inpainting effect for large damaged areas, it also has limitations, for example, priority will be zero [5], Matching Criteria use fixed size inpainting blocks . Authors in [6] present an algorithm to improve the priority along with improved speed and accuracy. Zhu [7] carried out image inpainting experiments in color space L* a* b* with one luminance L* and two chroma factors a* and b* . Because the a* and b* components of color space L* a* b* are chromaticity information and the structural information is less, the structural part is not perfect to inpainting. Shu [8] carried out experiments in color space HIS, the algorithm uses the gradient feature to redefine the similar matching criterion, and it is not particularly Y. Li · J. Shi (B) · F. Cheng · R. Xiao · D. Tao Yunnan Normal University, Kunming 650500, China e-mail: [email protected] Yunnan Key Laboratory of Opto-Electronic Information Technology, Kunming, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_30
269
270
Y. Li et al.
good for image inpainting of different colors. This paper has improved the priority in color space HSV, because the V component can better reflect the characteristics of brightness in images, and it will not be zero. If the priority is zero in Criminisi algorithm, the algorithm will stop. On the other hand, H, S, and V components in color space HSV can improve the similarity matching criteria of sample blocks. It will not only reduce color mismatch, but also can be well controlled for the structure and details of the image texture. Also the problem of color mismatching in Criminisi algorithm is improved.
30.2 Improved Criminisi Algorithm 30.2.1 Criminisi Algorithm The schematic diagram of Criminisi algorithm is shown in Fig. 30.1. I represents an image with a damaged area; Ω is the damaged area to be repaired; δΩ is the boundary of the damaged area to be repaired; Φ is a known source region; P is a pixel on the boundary; ψ p is a P-central block to be repaired; n p is a unit vector perpendicular to the edge of the area to be repaired and point P, which is the normal direction of the damaged boundary. ∇ I p⊥ is the continuation direction of brightness change of point P. Criminisi algorithm consists of three steps: computing patch priorities, propagating texture and structure information, and updating confidence values. Priority is defined as follows: P( p) = C( p) · D( p)
(30.1)
where C( p) is confidence and D( p) is data item. After finding the patch ψ p with the highest priority, according to the SSD (Sum of squared differences) criterion, using global search method to find the pixel block ψq most similar to ψ p and replicate the information of ψq into the position of ψ p . Fig. 30.1 The principle of Criminisi algorithms [4]
Φ
δΩ ψp
p
np
I
Ip
Ω
30 An Improved Criminisi’s Image Inpainting Algorithm …
271
SSD = arg min d(ψ p , ψq )
(30.2)
ψq ∈Φ
where d(ψ p , ψq ) is the Euclidean distance between two pixel blocks, which is the difference square sum of two pixel blocks. After the patch ψ ∧p has been filled with new pixel values, the confidence C( p) is updated in the area delimited by ψ ∧p as follows: ∧
C( p) = C( p ) ∀ p ∈ ψ ∧p ∩ Ω
(30.3) ∧
where C( p) is the confidence of repaired point P and C( p ) is the confidence of new ∧ repaired block center point p .
30.2.2 Limitations of the Original Algorithm From the above algorithm, we can see the limitation of priority. Priority is decided by confidence item and data item, and confidence item represents the known information in the block to be repaired. With the experiment going on, the edge of the area to be repaired is constantly changing and the confidence level may be zero. Priority will also become zero. The image is composed of a single color and the isophote line is perpendicular to the unit vector, which leads data item to be zero. Therefore, as the experiment proceeds, the confidence and data items will appear to be zero, which leads priority to be zero. The SSD criterion is based on the difference of three primary colors in two pixel blocks to match. It does not fully consider other color features, which is easy to cause mismatching. It will affect the accuracy of the experiment and result in inpainting errors.
30.2.3 Improved Priority Because of the above problems in Criminisi algorithm [4], the improved priority formula proposed by Qi [9] is as follows: P( p) = λ1 e C( p)−1 + λ2 ((1 − ω) · D( p) + ω)
(30.4)
where C( p) and D( p) are the same in formula (30.1). When the image is an image with large gradient of pixels, there is abundant texture information near the point. In order to make the edge structure of the image smoother, we first repair the points with more complex structure λ1 and λ2 are nonnegative constants, representing the proportion, respectively, of confidence C( p) and data D( p) in the priority
272
Y. Li et al.
formula,λ1 + λ2 = 1. After many experiments, it is found that λ1 = 0.3, λ2 = 0.7 can better highlight the structure and texture information of the image and achieve better inpainting effect. At the same time, regularization function is introduced to smooth data item D( p), which can enhance the robustness of image edge repair. ω is the parameter of the regularization function. Here ω equals to 0.7. At the same time, changing the expression of P( p) from multiplication to addition can also avoid the direct result of zero priority caused by the zero confidence of C( p) or the zero data item of D( p). In this paper, improved priority formula proposed is shown as follows: P( p) = λ1 e C( p) + λ2 ((1 − ω)e D( p) + ω)
(30.5)
The improved priority formula (30.5) proposed in this paper introduces an exponent item based on the improved priority formula proposed by Qi [9], which makes the data items smoother. And introduces the exponent form can make the decline of confidence gentler, and changing the expression of P( p) from multiplication to addition can also avoid the direct result that zero will be priority.
30.2.4 Improved Similarity Matching Criteria Criminisi [4] algorithm matches two pixel blocks according to SSD criteria. When the image texture is relatively simple and the color information is relatively single, it has better inpainting effect. However, when faced with complex images, mismatching may occur and the repair effect is not ideal. Criminisi algorithm [4] carried out image inpainting experiments in color space RGB. Because the color disorder after the RGB model is repaired by different channels, it is easy to mismatch. The brightness and saturation of HSV model are more similar to human perception and vision. When the image texture is more complex and the color information is more diversified, it has a better inpainting effect. The calculation formula is as follows: d(ψ p , ψq ) = f SSD (ψ p , ψq ) + f avg (ψ p , ψq ) + f std (ψ p , ψq )
(30.6)
f avg (ψ p , ψq ) = |E(h( p)) − E(h(q))|3 + |E(s( p)) − E(s(q))|3 + |E(v( p)) − E(v(q))|3
(30.7) f std (ψ p , ψq ) = |D(h( p)) − D(h(q))|3 + |D(s( p)) − D(s(q))|3 + |D(v( p)) − D(v(q))|3
(30.8) where f SSD (·) is the Euclidean distance between two pixel blocks, which is the difference square sum of two pixel blocks in RGB color space. f avg (·) is the difference between the cubic sum of the mean values of two pixel blocks. f std (·) is the difference between the cubic sum of the variance values of two pixel blocks. h(·), s(·), v(·) represents the h component, s component, and v component of the block to be
30 An Improved Criminisi’s Image Inpainting Algorithm …
273
repaired, respectively, E(·) and D(·) represents the mean and variance. Formula (30.6) shows the difference is smaller which between the mean and variance of component h, s, v of the repaired block and the target block, the similarity is higher which of the corresponding matched block. In this paper, the improved similarity matching criteria (30.7) and (30.8) are calculated in color space HSV, because brightness and saturation of HSV model are more similar to human perception and vision. The difference of the mean value of component between the image blocks can well reflect the difference of hue, saturation, and lightness between image blocks in formula (30.7). The difference of variance of component between image blocks can well reflect the difference of the texture, smoothness, edge region, and other structural features of the image blocks in formula (30.8).
30.3 Experimental Results and Analysis This experiment will test the Criminisi algorithm [4] for image inpainting, using the version of MATLAB 2016a for experiment. In the experiment, the picture was coded artificially, and a color block (255, 0, 255) was attached using MATLAB code, the color in the coding area was pink. Image inpainting algorithms based on texture synthesis and PDE are two basic methods of image inpainting. TV model [10] and CDD model [11] are typical image inpainting algorithms based on PDE in image inpainting technology. The image inpainting algorithm based on PDE is mainly used to inpaint the small area of damaged image, because it cannot reconstruct the texture details. Criminisi algorithm [4] is mainly used to inpaint the large area of damaged area. This paper has improved priority and similarity matching criteria in Criminisi algorithm. It can be used to inpaint large areas of complex images. Comparison of inpainted effects using different algorithms shown as Fig. 30.2. The (a) shows original images, (b) shows damaged images, (c) shows mismatches of images with various colors and the structure of the images is not well controlled, (d) and (e) demonstrates that structure of the images is seriously missing, and (f) shows the improved algorithm making the inpainted images very close to the original images. From the experimental results, we can see that the improved algorithm can inpaint damaged images very well. From the experimental results, the algorithm can inpaint the missing color images better, the images with more detailed texture are also very close to original images, and the structural integrity of the images can be inpainted very well.
274
Y. Li et al.
Fig. 30.2 Effects comparison of inpainted images: a Original images; b damaged images; c Criminisi Algorithm [4]; d TV model [10]; e CDD model [11]; f The improved algorithm in this paper
30.4 Conclusion Because of the fact that the priority of Criminisi algorithm [4] may be zero in the process of inpainting, a new priority formula is proposed in this paper. At the same time, due to the mismatching phenomenon, the color component is added to solve this problem. The experimental results show that the algorithm has been optimized to a large extent. However, for images with more details and textures and more complex structures, there are still some shortcomings in this algorithm. For the existing problems, it will be the next step. Acknowledgements The work was supported by the National Natural Science Foundation of China under the Grants (61875171), the Education Department Program of Yunnan Province under the Grants (01000205020516032), and the Key Project of Applied Basic Research Program of Yunnan Province under the Grants(2018FA033). The work was also supported by Yunnan Key Lab of Opto-electronic Information Technology, Kunming, China.
30 An Improved Criminisi’s Image Inpainting Algorithm …
275
References 1. Hu, G., Xiong, L.: Criminisi-based sparse representation for image inpainting. In: 2017 IEEE Third International Conference on Multimedia Big Data (BigMM). IEEE, pp. 389–393 (2017) 2. Zhang, D., Liang, Z., Yang, G., et al.: A robust forgery detection algorithm for object removal by exemplar-based image inpainting. Multimed. Tools Appl. 77(10), 11823–11842 (2018) 3. Thanki, B.B.: Overview of an image inpainting techniques. Int. J. Technol. Res. Eng. 2(5), 388–391 (2015) 4. Criminisi, A., Pérez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2004) 5. Xi, X., Wang, F., Liu, Y.: Improved Criminisi algorithm based on a new Priority Function with the gray entropy. In: 2013 Ninth International Conference on Computational Intelligence and Security. IEEE, pp. 214–218 (2013) 6. Goyal, P., Diwakar, S.: Fast and enhanced algorithm for exemplar based image inpainting. In: 2010 Fourth Pacific-Rim Symposium on Image and Video Technology. IEEE, pp. 325–330 (2010) 7. Zhu, H., Wei, Z.: Research on Digital Image Repair Technology Based on Texture Synthesis. Hunan, National University of Defense Science and Technology (2010) 8. Shu, B., He, H., Chen, F., Jiang, J.: Image restoration algorithm based on HSI gradient statistical characteristics. Optoelectron. Laser 29(10), 1128–1135 (2018) 9. Qi, Z., Su, H.: Research on image restoration sequence based on Criminisi algorithm. Wirel. Interconnect. Technol. 03, 120–122 + 143 (2016) 10. Chan, T., Shen, J.: Mathematical models for local non-texture inpainting. SIAM J. Appl. Math 62, 1019–1043 (2002) 11. Chan, T., Shen, J.: Non-texture inpainting by curvature-driven diffusions (CDD). J. Vis. Commun. Image Represent. 12(4), 436–449 (2001)
Chapter 31
An Improved Image Restoration Algorithm Based on the Criminisi Algorithm and Least Squares Method Jia Xia, Feiyan Cheng, and Chao Li
Abstract Image restoration is a hot topic in the field of digital image processing. Criminisi algorithm is an image restoration algorithm based on texture synthesis. In this paper, the basic principle of Criminisi algorithm is analyzed, the limitations of the priority definition in the algorithm are analyzed, and a solution is proposed by combining least squares fitting. Experiments on artificially coded images show that the improved algorithm has better restoration effect.
31.1 Introduction Digital image restoration technology is a hot topic in the field of digital image processing [1]. It is mainly used in medical imaging, public security, film and television production, aerial image [2], ancient cultural relics protection [3], etc. Image restoration methods can be divided into two main categories: image restoration algorithm based on Partial Differential Equation (PDE), and image restoration algorithm based on texture synthesis. Criminisi algorithm [4] is a typical image restoration algorithm based on texture synthesis. The Criminisi algorithm has a good repair effect, but the limitation of the algorithm was found during the experiment, there are several main aspects of the limitation as follows. Priority may be zero, the standard of finding the best matching block is only the difference of color features, and the use of fixed size repair blocks. All of the above have greatly affected the sequence and results of the experiment, and even lead to repair errors. Many researchers also put forward their solutions to these problems. Huang [5] adds a boundary item to the priority formula, which can repair images with J. Xia · F. Cheng (B) · C. Li Yunnan Normal University, Yunnan Kunming City, China e-mail: [email protected] J. Xia e-mail: [email protected] C. Li e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_31
277
278
J. Xia et al.
different features by choosing the size of parameters, and enlarges the application scope of the algorithm. Zhang [6] introduced the Laplacian operator into the priority formula, which can repair the part with strong edge information in the image first. Peng [7] uses the fractal dimension of the pixel blocks to restrict the search range, which dramatically reduces the restoration time of the image. Wang [8] improved block matching criteria by matrix similarity and information entropy similarity. This paper improves the calculation formula of priority with least squares fitting, solves the problem that the algorithm stops when the priority is zero. It can not only solve the problem of texture extension overflow in Criminisi algorithm, but also solve the problem of repairing errors when the image texture information is strong.
31.2 Criminisi Algorithm 31.2.1 Basic Criminisi Algorithm Firstly, choose a pixel P with the highest priority, then let patch block p of the center pixel P becomes a block to be repaired. Secondly, searches for the best matching block q, copies the pixel information of q to p. Thirdly, updates the edge δ, repeats these steps until is 0. As shown in Fig. 31.1. I is the image to be repaired, is the damaged area, δ is the border region, is the known source area of the image, = I − . P is a unit pixel boundary, p is considered the center of P, the target block. np is the damaged boundary tangent vector method, ∇ I P1 is the tangential direction of the illumination line. Fig. 31.1 The principle of Criminisi algorithms
31 An Improved Image Restoration Algorithm Based …
279
31.2.2 Criminisi Algorithm Steps The Criminisi algorithm can be divided into three steps: calculating priority, finding the best matching block, and updating confidence. (1) Computation of Priority Priority is defined as P(p) = C(p) · D(p)
(31.1)
In formula (31.1), C(p) is confidence and D(p) is data. They are defined as C(p) =
C(q)
q∈ψ p ∩
D(p) =
ψ p
(31.2)
α
(31.3)
∇ I 1p · n p
A large C(p) corresponds to a higher priority, which reflects that a region containing more of the original information should be given priority. C(p) is the confidence value for a pixel point. Which must satisfy the initial conditions: for ∀ p ∈ , C(p) = 0; and for ∀ p ∈ , C(p) = 1. The smaller the angle between the illuminance line and its normal vector direction, the larger the value of D(p). The larger the value of D(p), the greater the priority, which helps the structure and lines of the image to be restored first. After calculating the priority of each point on the boundary ∂, the P point with the highest priority is automatically selected for processing. (2) Find the Best Matching Block After determining the target block p to be repaired, the Criminisi algorithm uses the global search method to find the best matching block q in the sample region, and the best matching block to be found needs to satisfy the sum of squared difference (SSD): SS D = arg min d( p , q ) ψq∈
(31.4)
We calculate the color sum of square differences using d( p , q ) =
(r(p) − r(q))2 + (g(p) − g(q))2 + (b(p) − b(q))2
r, g, b are the three primary colors.
(31.5)
280
J. Xia et al.
(3) Update Confidence The algorithm copies the found best matching block to the area to be repaired, and the edge of the area to be also repaired changes, and the newly filled block q will affect the confidence of the image.
C(p) = C(p), p ∈ ∩
(31.6)
We repeat these three steps until the repair is complete.
31.2.3 The Improved Algorithm (1) Limitations of the Original Algorithm It can be seen from the above Criminisi algorithm that the definition of the priority formula has limitations. It can be seen from Eq. (31.2) that the confidence mainly reflects the amount of known information contained in the block to be repaired. As the experiment progresses, a situation occurs that the confidence suddenly approaches zero in the edge of the area to be repaired, resulting in a zero priority. When the data D(p) is zero, it also causes the priority to be zero. For example, when the illuminance line is perpendicular to the unit normal vectors, or when a certain image color is single. Therefore, both the confidence level and the data item may be zero, which may affect the accuracy of the experiment and cause a repair error. (2) The Improved Priority Due to the above problems with the Criminisi algorithm, the priority formula can be changed to: P(p) = α · C(p) + β · D(p)
(31.7)
Among them, α and β are weighting coefficients. However, the research only defines α + β = 1, and does not combine the theoretical basis to deal with the values of α and β. In this paper, the least square method is used to solve the problem of coefficient selection. The least squares method is a mathematical optimization technique. It finds the best match function for the actual data by minimizing the sum of the squares of the errors. Two real coefficients can be quickly obtained by the least squares method. These two real coefficients reduce the amount of the squares of errors between the original image and the artificial coded image, and play a great role in the image restoration algorithm of this paper. We convert the original image and the artificial coded image into a onedimensional array, as shown in Fig. 31.2. Get the data set, record it as Z = {(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )}. Looking for a function y = ax + b to fit Z as much as possible:
31 An Improved Image Restoration Algorithm Based …
281
Fig. 31.2 Experimental image: a Coded image; b Binary image; c Original image
Q = (ax1 + b − y1 )2 + (ax2 + b − y2 )2 + · · · + (axn + b − yn )2 Q=
n
2
(yi − yi ) =
i=1
n
(axi + b − yi )2
(31.8) (31.9)
i=1
The one-dimensional array (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) is a known variable, so the problem is transformed into the minimum of Q: ∂ ∂ f (a, b) = ∂a
n i=1
(axi + b − yi )2 =0 ∂a
n ∂ i=1 (axi + b − yi )2 ∂ f (a, b) = =0 ∂b ∂b n (xi − x)(y ¯ i − y¯ ) n a = i=1 (x ¯ 2 i − x) i=1 b = y¯ − a x¯ x¯ =
n i=1
xi , y¯ =
n
(31.10) (31.11) (31.12) (31.13)
yi
(31.14)
i=1
Here xi , yi is the arithmetic mean. This paper considers the linear relationship between the original image and the coded image. We encourage the synthesis of the linear structure with higher priority and the information can be safely transmitted to the target area, and y i = ayi + b just reflects the linear relationship between the target and the reference. In the priority, P = α · C(p) + β · D(p) happens to correspond to the above linear relationship. C(p) represents the proportion of the known pixel, corresponding to the known pixel yi in the linear relationship. So the value of the obtained a is assigned to α, and the value of b is assigned to β.
282
J. Xia et al.
31.3 Experimental Results In our experiments, we use Matlab2012a on Windows PC with the Intel Core 3 Duo processor (3.3 GHz). Firstly, the image is manually coded by a yellow block, as shown in Fig. 31.2a. Through color detection, the area to be repaired is set to white, and the rest of the area is set to black. We can get a binary image, which can distinguish the area to be repaired from the rest of the area. By applying the least squares method, the difference between the original image y i and the artificial coded image yi can be obtained, and the two parameters α and β are substituted into the priority formula. Through the improved priority formula, the restoration of the image by the algorithm is completed. The set of experiments of Fig. 31.3 shows the results of the algorithm for image restoration. First, the improved algorithm solves the case where the priority is zero. Second, the processing of structural information is superior to the Criminisi algorithm, which guarantees the structural integrity of the image. Third, the details of the processing are more detailed than Criminisi algorithm. Furthermore, whether it is regular coded or irregular coded, the algorithm works effectively.
31.3.1 Analysis of Experimental Results From the experimental results, we can see that the improved algorithm can recover the damaged image better. In this paper, the data of the pictures before and after the experiment are compared, and the indicators of Table 31.1 are obtained. It can be seen that the algorithm can recover the loss of a single color image better, and there are more detailed textures. Some can be handled closer to the true value. The table below shows the values of α and β obtained by the least squares algorithm in the experiment. The experimental result from Table 31.2 shows that the value of α is stable between 0.5 and 1.0, and the value of β is more determined by the structure and brightness of the image. When the value of β is small, it means that the image structure information is not rich, and the difference between the two values is not big in this case. When substituting the priority formula, the weight of any one of them will not increase excessively. So the confidence item and the data item are balanced in weight. Secondly, there is no chance that the priority will become zero during the recovery process. When the value of β is larger, it means that the structural information of the image is missing more. In this case, the specific gravity of the data item is increased, and the image can be better restored in the structure.
31 An Improved Image Restoration Algorithm Based …
283
Fig. 31.3 The original algorithm and the algorithm restore comparison chart: a Original image; b Coded image; c Results of Criminisi algorithm; d Results of our algorithm
31.4 Conclusion Remarks In order to solve the phenomenon that the priority of the Criminisi algorithm may be zero during the restoration process, this paper proposes an improved algorithm; it combines the priority with the least squares method and solves this problem. Experiments show that our method is effective and achieve better results. The algorithm in this paper also has limitations, because the least squares method needs two pictures of the original image and the artificial coded image. If we do not have the original
284
J. Xia et al.
Table 31.1 Comparison of image data before and after the experiment
Table 31.2 Real values obtained by least squares method
Indicator 1 (Average Gradient)
Indicator 2 (Edge strength)
Indicator 3 (Contrast)
Experimental group1
7.4
72.9
380.1
Experimental group2
7.4
73.0
380.1
Experimental group3
9.9
96.2
422.6
Experimental group4
1.5
14.1
10.6
Experimental group5
8.8
77.2
390.5
Experimental group6
7.4
73.0
380.1
Value α
Value β
The first set of experiments
0.9933
0.9643
The second set of experiments
0.9909
0.7294
The third set of experiments
0.9823
1.6176
The fourth set of experiments
0.5898
97.5293
The fifth group of experiments
0.9773
3.2834
The sixth group of experiments
0.9798
1.1297
image as a reference, the priority formula cannot be obtained. For the existing problems, it still needs to be strengthened, and will be further improved in the future experiments, and strive to overcome the limitations of the algorithm.
References 1. Zhang, T.: Matlab Image Processing Programming and Application. Machinery Industry Press, Beijing, pp. 1–6 (2014) 2. Zhang, S.Y.: Aerial image restoration based on improved criminisi algorithm to thick cloud. Prog. Laser Electrophotonics 55(12), 121012 (2018) 3. Liu, X.N.: Research on Repair Algorithm of Digital Image of Rock Painting. Ningxia University, Ningxia (2014) 4. Criminisi, A., Perez, P., Toyama, K.: Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13(9), 1200–1212 (2014) 5. Huang, S.B.: Research on Digital Image Restoration Technology Based on Texture Synthesis. Hefei University of Technology, Heifei (2011) 6. Zhang, X.: Research and Application of Digital Image Restoration Technology. Shan Dong University, Shan Dong (2014)
31 An Improved Image Restoration Algorithm Based …
285
7. Peng, C.H: Criminisi Image Restoration Algorithm Based on Texture Synthesis. Hubei University of Minority, Hubei (2018) 8. Wang, L.: Improved Image Restoration of Criminisi Algorithm. Anhui University of Technology, Anhui (2018)
Chapter 32
Research on Stereo Image Synthesis Based on Oblique Photography of UAV Zheng Jinji and Wang Yuanqin
Abstract The acquisition of stereoscopic video source will be an attractive subject when the way of stereoscopic display sees dramatic breakthroughs. A stereo image synthesis based on the oblique photography of unmanned aerial vehicle (UAV) is proposed to solve the problems of high data acquisition, high cost, and low modeling efficiency existing in the traditional three-dimensional video synthesis method. First, orthoimage and oblique image are captured by UAV lens following a planned path. Next, three-dimensional model is constructed by using the three-dimensional point cloud generated by aerial triangulation, and the wall texture extracted from multiview image is mapped to the corresponding model. Then, taking Yangzhou building of Nanjing University as an example, the 3D model of the building was constructed and the 3D image pairs were synthesized. Experimental results show that this method not only meets the precision requirements of the model, but also greatly improves the efficiency of aerial image stereo synthesis and reduces the production cost.
32.1 Introduction Three-dimensional video synthesis technology began in the second half of 1990s, which refers to the conversion technology of making three-dimensional images based on two-dimensional images [1]. With the continuous breakthrough of stereoscopic display technology [2], 5G network, and transmission of Internet, the position of stereoscopic video source in stereoscopic display industry will be equivalent to the role of traffic service in mobile communication industry. At present, stereo video synthesis technology is gradually developing to automatically convert conventional planar images into natural stereo images in real time [3]. According to the mechanism of stereo synthesis, the current mainstream stereo image synthesis technology is separately based on depth map and structure reconstruction [4]. We can obtain the relative position relationship between objects from plane images by relying on the Z. Jinji · W. Yuanqin (B) Nanjing University, Nanjing 210046, People’s Republic of China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_32
287
288
Z. Jinji and W. Yuanqin
visual-experience formed in the process of long-term observation. These experiences may be the information of moving parallax, atmospheric scattering, focusing and defocusing, occlusion, shadow, size, and so on. This kind of stereoscopic vision of human eyes is also called psychological stereoscopic vision [5]. According to the characteristics of human eyes, the depth information in conventional images can be extracted and combined with the original left-view image to synthesize the right view. This method called depth-based stereo video synthesis technology [6]. Threedimensional synthesis technology based on structural reconstruction is the most widely used technology in computer vision at present. And it is also used in the fields of historic site reconstruction, film and television production, and urban modeling [7]. According to the three-dimensional acquisition approach, it can be divided into modeling, multi-camera three-dimensional photography technology, and laser point cloud reconstruction [8]. These three methods have the advantages of clear details and textures, fast modeling, but they all have high cost, some of them even need large workload and manual texture fitting [9]. Three-dimensional video synthesis technology based on UAV oblique photography technology has incomparable advantages in wide range of scenes and 3D reconstruction accuracy. Remote-sensing image is an orthoimage of limited timeliness and resolution, and they cannot play the maximum role in many small-range application scenarios. Therefore, ultra-low-altitude UAV orthophoto plays an important role, which is characterized by ultra-high resolution, timeliness, and simple acquisition [10]. Orthographic image can show rich details of the roof, but less texture information of the side. Oblique photography not only increases the side view texture information, but also simplifies the model, reduces the time of model reconstruction, improves the efficiency of modeling, and realizes the transformation from plane to solid at a lower cost. Oblique photography started early in foreign countries. Pix4D started at the Computer Vision Laboratory of the Federal Institute of Technology in Lausanne, Switzerland. Smart3D is the main product of Acute3D, a French photogrammetry software developer. Later Acute3D was acquired by Bentley and renamed as Context Capture, which is the research results of two top French research institutions for 25 years and the technical level is the industry benchmark [11]. Photoscan is a set of software based on image automatic generation 3D model developed by Agisoft, a Russian software company. In China, Datumate was launched by DJI in cooperation with Israel Datumate company in 2016, which is suitable for basic surveying, construction, infrastructure and engineering inspection, and other fields [12]. Altizure, of Shenzhen Zhuke innovation technology company, an incubator of the Hong Kong University of science and technology, can turn aerial photos of drones into threedimensional realistic models [13]. Besides, DJI Ground Station (GS) Pro, Litchi, Autopilot, and other flight control software can also realize data collection. Stereoscopic images obtained by oblique photography are also involved in the latest application of VR/AR. Although there are not many use cases combining the two disciplines, this is expected to change in the future. Vertical industries are still trying to find beneficial and profitable applications for VR/AR. However, photogrammetry
32 Research on Stereo Image Synthesis Based on Oblique Photography …
289
and VR/AR are a good combination because VR/AR environments require precision to measure the distance between objects in projected 3D space.
32.2 System and Principle of Stereoscopic Image Synthesis by Oblique Photography Oblique photogrammetry technology is a high-tech technology developed in the field of international remote sensing of surveying and mapping in recent years, with comprehensive perception of complex scenes in a wide range, high precision, and high definition. The data generated by efficient data acquisition equipment and professional data processing process can directly reflect the appearance, position, height, and other attributes of the ground objects, to ensure the real effect and precision of mapping level [14]. 3D modeling is increasingly widely and deeply applied in surveying and mapping industry, urban planning industry, tourism industry, even electrical business industry, etc., which pushes UAV inclined data modeling to a critical stage [15]. Oblique photography technology, according to the self-calibration technology of machine vision, can acquire more complete and accurate information of ground objects by carrying multiple sensors on the same flight platform (currently, fivelens camera is commonly used) and collecting images from different angles such as vertical and tilt. A set of images taken from a vertical ground Angle called positive. The four groups of images obtained by shooting the lens at a certain Angle with the ground point to the east, south, west, and north, respectively, which is called oblique film [16]. The uptake range is shown in Fig. 32.1. In the process of building the surface model of the building, the inclined image has significant advantages over the vertical image, because it can provide a better perspective to observe the side of the
Fig. 32.1 The principle and method of oblique photography modeling of UAV
290
Z. Jinji and W. Yuanqin
building and meet the needs of building surface texture generation. Vertical images taken in the same area can be used to generate or optimize 3D models.
32.2.1 Technical Process of Stereoscopic Image Synthesis by Oblique Photography Mainstream companies in the industry include Bently’s ContextCapture (Smart3D), Agisoft’s PhotoScan in Russia, and Pix4Dmapper in Switzerland. These several modeling software also have their own advantages and disadvantages. PhotoScan is relatively lightweight, but the resulting model texture is not ideal. The 3D model generated by Smart3D has ideal effect, with low manual repair work, but the software is complex and difficult to use and the price is high. Pix4Dmapper is an automatic, fast, professional precision UAV data and aerial image processing software. Thousands of images can be quickly produced into professional and accurate 2D maps and 3D models without professional knowledge and manual intervention. The software can quickly acquire the point cloud data from aerial photography using the principle of photogrammetry and multi-eye reconstruction, and carry on the later processing. As shown in Fig. 32.2, the process of stereoscopic image synthesis technology of tilting photography through Pix4Dmapper mainly includes these steps: aerial image acquisition, aerial triangulation, multi-view image-intensive matching, digital surface model data generation, texture fitting, and three-dimensional real scene modeling, virtual perspective shooting.
Fig. 32.2 Flowchart of stereoscopic image synthesis by oblique photography
32 Research on Stereo Image Synthesis Based on Oblique Photography …
291
Fig. 32.3 Aerial image acquisition platform and navigation point setting. a Mavic2 UAV. b The track of image acquisition
32.3 Experiments and Results 32.3.1 Field Data Collection As shown in Fig. 32.3a, Mavic2 UAV with 1-inch sensor with mechanical shutter is used as an experimental platform to capture 160 5-view samples from different points in Yangzhou building of Nanjing University. Flight routes are planned through DJI GS Pro. The track of image acquisition is shown in Fig. 32.3b. After making the flight plan and the actual flight for capturing image data, the collected sample data will be imported into Pix4Dmapper through the card reader for relevant data processing. The time for data collection and process took nearly two hours. The manual extraction and reconstruction of target 3D may need several months. Comparing the time and cost of this method and manual modeling, we can see that the method is effective and low cost.
32.3.2 Automatic and Fast 3D Model Construction Aerotriangulation is the most basic step in 3D reconstruction. Pix4Dmapper reads the data sources that the static modeling subject takes from different angles. It also contains additional auxiliary data, such as sensor properties (focal length, sensor size, main point, lens distortion coefficient), photo position parameters (such as GPS longitude and latitude), and photo attitude parameters (such as inertial navigation system). The sparse point cloud generated in the research area is shown in Fig. 32.4. The aerial images were matched to generate sparse point clouds. In the initialization process, the proportion of feature point images is high precision. In the process of point cloud and texture processing, the proportion is half of the images, the point cloud density is optimal, and digital surface model (DSM) orthography and index
292
Z. Jinji and W. Yuanqin
Fig. 32.4 Result of aerotriangulation
Fig. 32.5 Original and point cloud images. a Original picture. b Sparse point cloud map
are default configuration. Figure 32.5a shows the Yangzhou building of Nanjing University, and Fig. 32.5b is the sparse point cloud map. The initial matching results are sparse point clouds, which cannot be used to construct high-precision stereo image pairs. The point clouds need to be extended, filtered and optimized to generate dense point cloud images. The dense point cloud is reduced and optimized according to the complexity, and the triangular mesh at different scales is constructed to fit the texture. The optimized full 3D texture model is shown in Fig. 32.6. The enlarged image is shown in Fig. 32.7a. The sash can be seen in the enlarged window as shown in Fig. 32.7b. It can be seen from the renderings that the threedimensional model in the test area is realistic, and the detailed structure of the building (pillars, windowsills, etc.) is better reconstructed.
32 Research on Stereo Image Synthesis Based on Oblique Photography …
Fig. 32.6 3D texture model
Fig. 32.7 The enlarged image. a Enlarged image. b Partial enlargement
293
294
Z. Jinji and W. Yuanqin
Fig. 32.8 Stereoscopic image pair. a Left format. b Right format
32.3.3 Stereo Image Pair Synthesis 3D texture model with high fidelity is generated after texture mapping. According to the principle of stereoscopic display, a 3D object model rotating at a certain Angle around the reference axis of the coordinate system is synthesized into a stereo image pair by virtual shooting. Taking the stereoscopic image pair of left and right format as an example, the left and right stereoscopic image pair with parallax is shown in Fig. 32.8. More and more data will be processed in the cloud when 5G technology is fully deployed in the future. Raw material captured by aerial photography can be transmitted to the cloud in real time and analyzed synchronously. That, users can access through a Web browser or desktop client in the cloud to obtain data sources of interest. It will be a new subscription model that will be beneficial to people with limited resources, such as students.
32.4 Conclusion In this paper, a stereoscopic image synthesis method based on oblique photography technology with automatic, high precision, real scene, and realistic texture details is proposed to solve the problems in stereo image synthesis process, such as complex, high cost of professional photographic equipment, etc. That is an economical, efficient, and reliable stereo image synthesis method. With the emergence of new application fields such as 3D GIS, 3D city model, and AR/VR, the photogrammetry discipline has opened up new markets. Stereo images based on the technology of UAV oblique photography synthesis method can be applied to many fields, such as aerial mapping, disaster emergency, security enforcement, forestry, water conservancy, flood control and monitoring, power patrol, Marine environment, the university scientific research, and military.
32 Research on Stereo Image Synthesis Based on Oblique Photography …
295
Acknowledgements This research is sponsored by National Key R&D plan (2016YFB0401503), R&D plan of Jiangsu Science and technology department (BE2016173). The authors would like to acknowledge the contributions of Nanjing University stereo imaging technology (SIT) laboratory 3D imaging team.
References 1. 2D to 3D conversion technology. https://baike.so.com/doc/5270794-5504689.html. Accessed 23 July 2019 2. Zhuang, H.Y., Chen, N.: 2D to 3D video conversion technology overview and innovation. Journal 10, 167–169 (2012) 3. Yuan, L.: The development direction of stereoscopic television. Journal 7, 2–9 (1998) 4. Bian, L.Y.: Research on 2D to 3D video algorithm based on depth graph. Diss, University of Electronic Science and Technology of China 5. Chen J.: Research on three-dimensional plane image technology and key technologies in stereoscopic photography. Diss (2005) 6. Zhu, Q.S., Liu, R., Xu, X.Y.: Research on three-dimensional plane image. Journal 29(12), 2814–2818 (2007) 7. Yang, G.D., Wang, M.S.: Application and prospect of oblique photogrammetry. Journal 39(1), 14–15 (2016) 8. Zhang, C.S., Zhang, W.L., Guo, B.X.: Fast reconstruction of 3D texture of oblique image. Journal 44(7), 782–790 (2015) 9. Wang, Y.M., Hu, C.M.: A robust registration method for ground lidar point cloud and texture image. Journal 41(2), 266–272 (2012) 10. Yu, Z.G., Li, H., Ba, F.: Three-dimensional urban modeling based on consumer UAV. Journal 30(117), 70–75 (2018) 11. Barbasiewicz, A., Widerski, T., Daliga, K.: The analysis of the accuracy of spatial models using photogrammetric software: Agisoft Photoscan and Pix4D. Journal 26 (2018) 12. Xiong, Q., Wang, S.T., Wang, X.Y.: 3Dmodeling of oblique photogrammetry simulation system by Smart3D. Journal 27(7), 55–59 (2018) 13. Jiang, P.: The application of three-dimensional reconstruction technology in urban spatial data collection—take Altizure as an example. Journal (32), 99 (2017) 14. Yang, G.D., Wang, M.S.: Application and prospect of oblique photogrammetry. Journal 39(1), 13–15 (2016) 15. Mukherjee, A., Chakraborty, S., Azar, A.T.: Unmanned aerial system for post disaster identification. CONFERENCE 2014, I4C, vol. 1, pp. 247–252. IEEE, Bangalore (2014) 16. Principle and key technology of oblique photogrammetry, http://www.52vr.com/article-689-1. html. Accessed 24 July 2019
Chapter 33
Research on Production of Stereoscopic UAV Aerial Photography Based on Optical Flow Image Migration Technology Xv Chen, Xi-Cai Li, Bang-Peng Xiao, and Yuan-Qing Wang Abstract In this paper, a method to convert aerial image from the unmanned aerial vehicle (UAV) to stereo images based on optical flow image migration technology is proposed. First, the disparity map is generated by optical flow algorithm in image matching. Then the UAV images are mapped to virtual view images based on the disparity map. The virtual view images can be optimized by image segmentation algorithm. Finally, the virtual view images and the original images will be spliced to side-by-side 3D images. The experimental results show that the system can convert the 2D UAV images into 3D stereo pairs.
33.1 Introduction The conversion from 2D images into immersive 3D stereo images is of great significance in the field of UAV Photography, which has special uses in film production [1], military reconnaissance, aerial mapping, emergency rescue, VR/AR, historical reconstruction [2], urban modeling, etc. [3]. With the gradual maturity of the UAV photography technology and the steady marketization of 3D display technology in recent years, there is an increasing demand for 3D videos captured by 2D videos from UAV Photography. However, it usually takes a lot of manpower, resources, money, and time to shoot 3D images where at least two cameras are supposed to work together [4]. Moreover, the demand for relatively large baseline distance of the camera group when processing long-distance aerial photography t directly limits the size of the shooting device and brings challenges to the payload and endurance of UAVs. Although there is some business software that can convert flat images into 3D images at present, such as Storm Video and iQiyi, the principle of this 2D to 3D tool is mostly to move each pixel of the image by a predetermined parallax distance in order to achieve a stereoscopic display effect. The effect of this method is that all scenes through conversion have X. Chen · X.-C. Li · B.-P. Xiao · Y.-Q. Wang (B) Nanjing University, Nanjing 210046, Jiangsu, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_33
297
298
X. Chen et al.
the same parallax, so technically it is not true 3D transformation. Meanwhile, some other conversion software requires UAVs to fly according to a specific trajectory, performing multiple shots in parallel of the same attraction, and render stereoscopic pairs with parallax through rigorous mathematical modeling and calculation. This method is currently most widely used in UAV 3D image synthesis technology. But the disadvantage of this method is that the trajectory of the UAV will effect image rendering, which is difficult to promote. At present, the mainstream conversion method of monocular aerial video to stereo video is: firstly, according to the video, the depth information is obtained, and then the depth information is used to calculate the parallax of each scene, and finally the stereo image pair can be generated according to the parallax [5]. The challenge of this method is to obtain accurate depth information through the monocular camera and to further optimize the acquired stereo pairs, such as filling holes and smoothing depth maps [6]. Therefore, this paper develops the research on the UAV 3D image production by optical flow algorithm and image segmentation. The optical flow algorithm is used to calculate the motion vector to obtain the depth information of different scenes in the image. And the image segmentation algorithm is combined to make different parallaxes of different depths.
33.2 3D Conversion of the UAV 2D Images 33.2.1 Principle Analysis of 3D Conversion of the UAV 2D Images The following is the analysis of the principle of the conversion from 2D aerial images shotted by the monocular camera to 3d images.
Fig. 33.1 Principle of stereo vision
33 Research on Production of Stereoscopic UAV Aerial Photography …
299
In Fig. 33.1, there is a spatial three-dimensional coordinate system OXYZ. The left and right eye image plane coordinate systems are O1 X 1 Y1 , O2 X 2 Y2 Point P represents an object. And O L and O R represent left and right eyes in horizontal position, respectively. Let D be the pupil spacing of the human eye, so it can be inferred that |O O L | = |O O R | = D2 , |O1 O L | = |O2 O R | = f ( f is the imaging focal length of the pupil). P1 (x1 , y1 ) and P2 (x2 , y2 ) are projections of P(x, y, z) in the left and right eye image plane coordinate systems, respectively, so that y1 = y2 And the following can be obtained: y2 f x2 y1 x1 = = = = y y z x + D2 x − D2
(33.1)
And the coordinates of x 1 , x 2 , the projection points on the image plane of the spatial point P can be obtained. Therefore, the parallax and visual angle of the two images are: s = x1 − x2 =
D f D fD f x+ − x− = z 2 z 2 z
x + D2 x − D2 θ = ar ctan − ar ctan y2 + z2 y2 + z2
(33.2) (33.3)
It can be seen from Eqs. (33.2) and (33.3) that the imaging parallax s and the visual angle θ of the object in the human eye are inversely proportional to the observation distance z [7]. Left and right eye parallax images are located in two planes O1 X 1 Y 1 and O2 X 2 Y 2 , and the parallax size is related to the perception of object depth. The depth information in the planar image can be extracted according to this feature. Combined with the original left view, the virtual right view can be generated according to the principle of depth and parallax. The generated right view has a parallax with the original left view, and the two views can be combined into a side-by-side 3D image [8]. This technology is expected to solve the problem of 2D to 3D of the UAV photography, and the converted 3D image has a better stereoscopic effect.
33.2.2 Optical Flow Image Migration Technology and Generation of Stereoscopic Pairs Most UAVs’ movement direction is the same as the camera’s optical axis, so the camera lens of the aerial image moves forward. In this case, there is the only parallax in the front-rear direction between adjacent frames, but no left-right direction parallax is required for stereoscopic display. The main work of this paper is to complete the
300
X. Chen et al.
conversion between such parallaxes, instead of simply directly using adjacent frames as stereoscopic pairs. The UAV aerial image is a continuous image taken at the same visual angle, so the depth information can be obtained by analyzing the relationship between adjacent frames. This kind of motion parallax is generated by the relative motion of the observer (i.e., the camera) and the scene. In reality, the object that is close to the observer feels faster than the object with a distant distance. The information can be obtained by the optical flow method, as shown in Fig. 33.2a. According to the particularity of the UAV aerial images, the optical flow matching algorithm is used to obtain the depth information of the scene from the series of those images. And then the depth map is optimized by the image segmentation algorithm. Finally, the virtual viewpoint map is generated by the depth matching algorithm, and the original image can be combined into a stereoscopic pair. The basic framework of the algorithm is shown in Fig. 33.2b. Optical flow is the visual movement that the observer feels in the world of motion. For instance, when the observer looks out through the window in a moving car, he can see that the trees, the ground, the building, and the like are all receding, and this movement can be described by the optical flow. The motion characteristics of these scenes are that the speed of motion is inversely proportional to the distance of the observer, so the distance between the scene and the observer can be estimated by the motion. For instance, distant clouds and mountains move slowly or even at rest. But some objects that are relatively close, such as buildings and trees, move faster. Furthermore, the marking of the road, which is very close, move at a particularly fast speed.
(a) Image processing flow chart
(b) Algorithm basic flow chart
Fig. 33.2 Schematic diagram of algorithm system framework
33 Research on Production of Stereoscopic UAV Aerial Photography …
301
The optical flow estimation algorithm was proposed by Horn and Schunck [9], and Lucas and Kanade [10] in 1981, respectively. Objects moving continuously in space form a series of projections on the imaging plane under illumination. The optical flow is the velocity of the object on the projection plane that can be calculated by using the displacement and time difference of the objects between the projections. The principle is shown in Fig. 33.3. The accuracy of the optical flow is based on three major assumptions: the brightness constancy assumption, the time continuity hypothesis, and the spatial consistency hypothesis. For the UAV aerial images, it is common to use the same lens for continuous shooting in successive time periods to meet those three basic assumptions. Moreover, the motion of the camera itself is stable and larger than the motion of most scenes, that is, it can be regarded as a moving lens to capture still scenes. There are many implementations of the optical flow algorithm: the basic Horn–Schunck algorithm, the Lucas–Kanade algorithm, the optimized pyramid optical flow algorithm, the regional optical flow algorithm, the characteristic optical flow method [11], and so on. Considering the speed of video processing, this paper uses the pyramid Lucas–Kanade optical flow algorithm [12] to extract depth information, supplemented by image segmentation algorithm. The optical flow algorithm can be used to obtain the distance information between the scene and the observer (i.e., the depth information), so that the depth map can be obtained, and the effect is shown in Fig. 33.4. A depth map has the best matching results when optical flow matching is done on images that are separated by a certain number of frames. In the actual use of the UAV
Fig. 33.3 Projection of 3D motion in a 2D plane
(a) The original image
(b) The depth map
Fig. 33.4 The depth map obtained by optical flow algorithm
302
X. Chen et al.
photography, the inertial navigation and GPS measurements can be used to record the real-time speed and attitude information of the UAV during flight in order to establish a mathematical model. In this paper, the optimal flight image interval is determined by the general flight speed of the UAV and multiple experiments, because the aerial image used during the experiment has already been taken. The experimental results are shown in Fig. 33.5, which shows that the best number of interval frames is about 3 for obtaining the best depth map by the optical flow method. A virtual viewpoint map can be generated by the best depth map and the original image, as shown in Fig. 33.6. It can be seen that the depth map has many bad points whose distribution is very uneven, which causes certain distortion and deformation at the boundary of the same object on the generated virtual viewpoint image (as shown in Fig. 33.6c), and affects the three-dimensional visual viewing effect. In order to solve this problem, this paper uses the image segmentation algorithm. It is assumed that the same type of pixels have the same motion vector (i.e., depth value), so that the depth map can be optimized by the image segmentation information
Fig. 33.5 8 consecutive frames of best depth map test result
(a) The original image
(b) The depth map
(c) The virtual viewpoint map
(d) The stereoscopic pair
Fig. 33.6 The stereoscopic pair
33 Research on Production of Stereoscopic UAV Aerial Photography …
(a) MeanShift image segmentation
(b) K-means image segmentation
(c) The virtual view map optimized by image segmentation
(d) The stereoscopic pair after modification
303
Fig. 33.7 The stereoscopic pair after modification
of the original image based on the optimal depth map [13]. This paper combines kmeans algorithm [14] with Mean Shift algorithm [15] for image segmentation, and the optimization effect is shown in Fig. 33.7. It can be seen that the distortion and deformation on the generated virtual viewpoint image is reduced considerably (as shown in Fig. 33.7c), which can bring a better viewing experience to the viewer.
33.3 Experiment and Result In order to verify the effectiveness of the method proposed in this paper, the objective evaluation criteria and the subjective evaluation criteria are combined to evaluate it. The peak signal-to-noise ratio (PSNR) is used as an objective comparison criterion. The virtual viewpoint map generated by the RW + GC method [16], the optical flow algorithm and the method proposed in this paper are compared with that generated by the original image, respectively. The aerial photography used for the test is shown in Fig. 33.8 and the test results are shown in Table 33.1. It can be seen the method proposed in this paper is superior to other methods in the production of stereoscopic UAV aerial photography.
Fig. 33.8 The aerial photography used for the test
304
X. Chen et al.
Table 33.1 The PSNR of result Method
a
b
c
d
RW + GC
28.1425
14.9034
25.3456
21.3454
Optical flow
27.1680
14.4042
24.9507
20.7343
Our method
28.5740
14.9954
25.5251
21.9864
Unit dB
Fig. 33.9 3D video test on the naked eye stereoscopic display
The virtual viewpoint image obtained above and the original image are regarded as a pair of side-by-side 3D image pairs, and then the same processing can be performed on the entire aerial video to obtain a stereoscopic video [17]. The video is displayed on the naked eye stereoscopic display and the viewing effect is tested, as shown in Fig. 33.9. According to the feedback of the actual viewer, although the stereo image generated by the image segmentation algorithm has a strong stereoscopic effect, some scenes (as shown in Fig. 33.6c) have image distortion, which affected the stereoscopic viewing experience. On the contrary, the stereo image generated by the image segmentation algorithm improved the image quality and eliminated the distortion phenomenon. In summary, the 3D image with stereoscopic effect can be obtained quickly and effectively by this method, but some scenes still had a slight deformation phenomenon, which is caused by insufficient depth of depth map and image segmentation, and needs to be improved later.
33.4 Conclusion Based on the characteristics of the UAV aerial sequence images, this paper proposes a stable and effective method for generating side-by-side 3D image pairs from UAV
33 Research on Production of Stereoscopic UAV Aerial Photography …
305
aerial sequence images, and introduces the basic method of the UAV aerial image production based on optical flow matching and image segmentation algorithm. The method uses an optical flow algorithm to match image and generate depth map, then segments the image to obtain segmentation information of each object, optimizes image migration strategy based on this information, and finally transforms each object on the original image based on depth map and image segmentation information. The coordinate translation parameters corresponding to each object segmented on the virtual view are provided by the parameters of the depth map. On the one hand, the method reduces the influence of the dead pixels in the depth map, and on the other hand, ensures that the same object does not undergo excessive distortion and deformation, thereby ensuring a better visual viewing effect. The experimental results show that the method of this paper can not only convert the UAV aerial image into 3D image but also has a strong 3D sense. This method can also be applied to multiUAVs aerial stereo photography in MANET (Mobile Ad hoc Networks) [18]. The next phase of work will continue to improve the algorithm, use the idea of deep learning to generate stereoscopic visual effects with more realistic 3D immersion. Acknowledgements This research is sponsored by the National Key R&D plan (2016YFB0401500), R&D plan of Jiangsu Science and technology department (BE2016173). The authors would like to acknowledge the contributions of Nanjing University stereo imaging technology (SIT) laboratory 3D imaging team.
References 1. Zhang, W.X.: The skillful application of aerial photography by unmanned aerial vehicle in TV works. Western Radio Telev. 12, 42–44 (2017) 2. Mukherjee, A., Dey, N., Kausar, N.: A disaster management specific mobility model for flying Ad-hoc network. Int. J. Rough Sets Data Anal. 3(3), 72–103 (2016) 3. Samanta, S., Mukherjee, A., Ashour, A.S.: Log transform based optimal image enhancement using firefly algorithm for autonomous mini unmanned aerial vehicle: an application of aerial photography. Int. J. Image Graph. 18(4), 1850019 (2018) 4. Wang, F., Wang, C.S., Liu, X.J.: The principle of stereoscopic display. J. Eng. Graph. 31(5), 69–73 (2010) 5. Sun, C.: A probe into several stereo display technologies. Comput. Simul. 25(4), 213–217 (2008) 6. Chen, H.: Research on Depth Map Generation Technology based on 2D to 3D. Beijing University of Posts and Telecommunications, Doctor (2015) 7. Hill, L., Jacobs, A.: 3-Dliquid crystal display sand their applications. Proc. IEEE 94(3), 575–590 (2006) 8. Wang, Y.Q.: Research on the optical principle auto-stereo display base on grid. Modern Disp. 3, 29–32 (2003) 9. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(3), 185–203 (1981) 10. Lucas, B.D., Kanade, T.: An iterative image registration technique with an ap-plication to stereo vision. In: IJCAI, pp. 674–679. Morgan Kaufmann Publishers Inc (1981) 11. Tao, M., Bai, J., Kohli, P.: SimpleFlow: a Non-iterative, sublinear optical flow algorithm. Comput. Graph. Forum 31(2), 345–353 (2012)
306
X. Chen et al.
12. Li, Y.N.: Depth information extraction with pyramid lucas-kanade optical flow method based on image segmentation. J. China Railw. Soc. 1, 63–68 (2015) 13. Huang, Z.X.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Disc. 2(3), 283–304 (1998) 14. Meng, J.L., Shang, H.K., Bian, L.: The application on intrusion detection based on K-means cluster algorithm. In: 2009 International Forum on Information Technology and Applications, pp. 150–152 (2009) 15. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002) 16. Phan, R., Androutsos, D.: Robust semi-automatic depth map generation in unconstrained images and video sequences for 2d to stereoscopic 3d conversion. IEEE Trans. Multimedia 16(1), 122–136 (2014) 17. Hong, H.K., Park, J., Lee, S.C.: Autostereoscopic multi-view 3D display with pivot function using the image display of the square subpixel structure. Displays 29(5), 512–520 (2008) 18. Tyagi, S., Som, S., Rana, Q.P.: Trust based dynamic multicast group routing ensuring reliability for ubiquitous environment in MANETs. Int. J. Ambient Comput. Intell. 8(1), 70–97 (2017)
Chapter 34
Design of Intelligent Feeding System Based on Computer Vision for Indoor Breeding Fixed Locations Tao Chi and Yunjian Pang
Abstract The indoor factory aquaculture is almost all artificial feeding. The feeding amount and feeding time are basically determined by the individual experience of the farmers. There are some problems, such as the timing of feeding, the low precision of the ration, the labor intensity, the time consuming, the power consumption, and the poor effect. The design uses the computer vision with high positioning accuracy and no dependence on the external environment. The COMS camera is carried on the carrier equipment in the pond, and the image information of the target location is collected. By fuzzy processing of the acquired image information, the target location is recognized to plan the route for the ship-borne equipment, and then feed operation of aquatic products is implemented. In this design, the relationship between the moment and position information of the target image is analyzed in detail, and the accuracy of the location of the ship-borne equipment to the target location and the location of the actual location and the robustness of the system are analyzed experimentally, which is of great application value for the precise feeding of aquatic products.
34.1 Introduction In aquaculture, freshwater fish production accounts for 69.1% of all fish production in China and over 60% of world freshwater fish production [1]. With the continuous improvement of aquaculture feed price and labor cost, how to improve the efficiency and quality of indoor aquaculture is an important issue in factory farming [2]. With the continuous development of computer information technology, optical imaging technology, pattern recognition technology, and digital image processing technology, the control application based on visual servo have also been well developed [3, 4], T. Chi (B) · Y. Pang College of Information Technology, Shanghai Ocean University, Shanghai, China e-mail: [email protected] T. Chi School of Computer Science and Technology, Kashgar University, Kashgar, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_34
307
308
T. Chi and Y. Pang
which enhances the learning ability of smart devices to the surrounding environment, enables them to understand the environment and completes the assigned tasks, there is an unparalleled advantage in intelligent aquaculture feeding and medication [5]. In this paper, the function of the monocular visual servo to reach the predetermined target is realized on the embedded platform based on ARM9 [6, 7]. By mounting the CMOS camera on the shipboard device, the target object is set at the target location, and the image moment information of the target object is extracted, and the CMOS is mounted [8, 9]. The camera carrier equipment linearly tracks the target, and arrives at the destination when the real-time image moment information of the camera coincides with the set target image information. Experiments show that the system has high measurement accuracy and good robustness.
34.2 Agency Design In this design, a CMOS camera OV9650 is used as the image acquisition module for image acquisition, and the ARM9-based S3C2440 processor is selected as the processor of the embedded system. The hardware structure includes ARM9 series chip S3C2440 processor module, LCD display module, data storage module, camera module, and motor control module. The specific hardware structure shown in Fig. 34.1 is as follows: • The microprocessor adopts Samsung’s S3C2440, the system can run stably at 405 MHz, and the main frequency can reach 530 MHz. • 64 Mbytes of SDRAM, consisting of two K4S561632, working in 32-bit mode. • 256 M bytes of NAND Flash, using K9F2G08. • 4.3 inch TFT LCD screen • Camera module: CMOS camera, the core chip is OV9650, 300,000 pixels. • Motor control module: two 5 V DC motors.
LCD display module
CMOS camera module
S3C2440 processor module
FLASH data storage module
Fig. 34.1 System block diagram
Motor Control Module
34 Design of Intelligent Feeding System Based on Computer Vision …
309
The CMOS camera is connected to the main board through a 9-pin interface, the collected 8-bit RGB signal is transmitted to the storage device, and the storage device sends the data to the microprocessor for processing through the memory controller.
34.3 Visual Tracking Algorithm Because the motion of the camera in the process of visual tracking makes the target image blur, this design proposes a visual tracking algorithm based on the fuzzy image moment for the dynamic tracking of the stationary target. The algorithm uses the camera to have a proportional relationship between the zero moment of the target and the distance when the camera is at different distances. After calibrating the camera, the relationship is used to dynamically track the target.
34.3.1 Proof of Theoretical Relationship Derivation To study the problem method, the following assumptions and definitions are first made. Supposes in following: • The camera’s three-dimensional space coordinate system is {C}, the projected image coordinate system is {I}, the target is a plane rigid body that is always perpendicular to the camera’s optical axis; • z d is the ideal target depth; the current target depth is z(t); • The ideal position of the projection of the target center in the coordinate system {I } is (xoce )d , (yoce )d ; • At the current moment, the position of the target plane center projection in the coordinate system {I } is [xoce (t), yoce (t)]. • Defines in following: • The so-called ideal target image refers to such a target image: ➀ the image formed when the target image is z d at the ideal target depth; ➁ In the coordinate system T {I }, the projection position of the target center on the image is (xoce )d , (yoce )d . • Let the image be expressed by the function f (x, y), then the ( p + q) order image moment of the image f (x, y) is: mpq =
∞
−∞
∞
−∞
x p y q f (x, y)dx d y
The center distance of the ( p + q) order image of the image f (x, y) is:
(34.1)
310
T. Chi and Y. Pang
µ pq =
∞
−∞
∞ −∞
(x − x) ¯ p (y − y¯ )q f (x, y)dx d y
(34.2)
Among them, x¯ = m 10 /m 00 and y¯ = m 01 /m 00 , where x¯ and y¯ are the coordinates of the center of gravity of the image, and the center distance is a measure that reflects the distribution of the center of gravity of the image in the image. To define the target image as an ideal target one, two requirements must be met: ➀ In the coordinate system {I }, the target depth must be equal to z d ; ➁ The position coordinate of the target center projection in the coordinate system {I } must be equal to (xoce )d , (yoce )d . According to condition one, it can be known from the literature [4] that if a target is placed at different depths (z 1 , z 2 ), and the zeroth moment of the image is obtained, respectively, (m 00 )1 ,(m 00 )2 , then the following relationship holds: z tk /z d = (m 00 )d /(m 00 )tk , the error is 3.2–5.3%. Therefore, if the target is a planar rigid body that is always perpendicular to the optical then condition one is axis of the camera, established, when it is tk , gets z tk /z d = (m 00 )d /(m 00 )tk and (m 00 )tk /(m 00 )d . For condition two, it is known from definition two that when the target image is a binary image, the coordinates of the center of the target projection region is equal to the quotient of the first-order image moment of the binary image and its zero-order moment. Also, when the target is a planar rigid body that is always perpendicular to the optical axis of the camera, the target center is projected exactly at the center of the target projection area. Therefore, the target center projection point is the same point in the center of the target projection area, and its position coordinates can be estimated by: xoce (tk ) = m 10 (tk )/m 00 (tk ), yoce (tk ) = m 01 (tk )/m 00 (tk ), (xoce )d = (m 10 )d /(m 00 )d , (yoce )d = (m 01 )d /(m 00 )d . Therefore image features [m 10 (tk )/m 00 (tk ), m 01 (tk )/m 00 (tk )] are replaced with image features [xoce (tk ), yoce (tk )]. The advantages of the image feature [m 10 (tk )/m 00 (tk ), m 01 (tk )/m 00 (tk )] are: ➀ The uniqueness of the target center point eliminates the need for image feature matching for the point, thereby simplifying the image processing process; ➁ Image moments make the point insensitive to image noise. In summary, if the target is a plane rigid body that is always perpendicular to the optical axis of the camera, the judgment basis tar of the current √ (m 00 )d , get image equal to the ideal target image may be: (m 00 )tk = [m 10 (tk )/m 00 (tk ), m 01 (tk )/m 00 (tk )] = [(m 10 )d /(m 00 )d , (m 01 )d /(m 00 )d ], m 00 (tk ) = (m 00 )d , and m 01 (tk ) = (m 01 )d are satisfied.
34.3.2 Camera Calibration and Vision Tracking Algorithm • In-camera parameter calibration The calibration template used for the calibration of the camera’s internal parameters is a 7 × 5 black/white checkerboard network, each grid size is 31 mm × 31 mm.
34 Design of Intelligent Feeding System Based on Computer Vision …
311
The multi-template image is acquired by the CMOS camera, and then the internal parameters of the camera are calibrated using the Matlab toolbox developed by Bouguet [10, 11]. The internal parameter model is denoted as follows: ⎡ ⎤⎡ ⎤ ⎤ ⎡ ⎤ ⎡ xc /z c xc /z c u kx 0 u 0 ⎣ v ⎦ = ⎣ 0 k y v0 ⎦⎣ yc /z c ⎦ = Min ⎣ yc /z c ⎦ 0 0 1 1 1 1
(34.3)
In the formula, u 0 and v0 are the image coordinates of the optical axis centerline at the intersection of the imaging plane, k x is the magnification factor in the axial direction, k y is the magnification factor in the axial direction, [xc , yc , z c ] is the coordinate of the scene point in the camera coordinate system, and Min is the internal parameter matrix. • Visual tracking algorithm The monocular visual tracking algorithm is implemented using its theoretical derivation proofs. Assuming that the internal parameters of the camera are known, the ideal image of the preset target is first acquired, and the ideal distance z d of the preset target is measured, and the image is subjected to the binarized processing to obtain point of the preset target in the coordinate system {I } the central projection of (xoce )d , (yoce )d , calculating (m 00 )d , (m 01 )d and (m 10 )d as comparison parameters. During the movement of the camera, only if these conditions such as t = tk , m 00 (tk ) = (m 00 )d , m 01 (tk ) = (m 01 )d and m 10 (tk ) = (m 10 )d are satisfied and the current target image is equal to the ideal image, the predetermined target location is reached. The algorithm flow is shown in Fig. 34.2.
34.4 Experimental Verification and Analysis The experiment of the system is carried out in two aspects, one is the measurement accuracy experiment and the other is the system robustness experiment. The accuracy of the measurement results is compared and analyzed by measuring the distance between the average value of the ship-borne equipment reaching the preset target location and the ideal target location. The robustness test of the system is based on the proportion of the number of times the device accurately reaches the preset target point.
34.4.1 Camera Calibration Experiment The calibration of the parameters in the camera is done through the toolbox [12, 13], which requires several images to be acquired in advance. These images are
312
T. Chi and Y. Pang Start
Capture scheduled target images Measure the ideal distance, find the zero moment and first moment of the image
Camera capture image
Binary image grayscale processing
Drive motor operation not End Is it satisfied to following m 00 ( t k )=( m 00 ) d
m 01 (t k )=( m 01 ) d m 01 (t k )=(m 01) d
yes
Stop motor operation
Fig. 34.2 Visual tracking algorithm
obtained by collecting the calibration blocks from different angles with the camera to be calibrated. In this paper, a new method to calibrate the camera parameters by image pair is presented. The calibration using the toolbox is mainly divided into two steps: one is to read in the image, the other is to extract the vertices of the mesh, and the third is to calibrate. A screenshot of the mesh vertices is extracted during the calibration process, as shown in Fig. 34.3. The result after calibration is denoted as follows: ⎡ ⎤ 326.827 0 157.98942 Min = ⎣ (34.4) 0 327.05155 108.94480 ⎦ 0 0 1
34.4.2 Visual Tracking Results • Accuracy experiment The comparison of the distance and the ideal distance between the lens and the desired target location is shown in the Table 34.1.
34 Design of Intelligent Feeding System Based on Computer Vision …
313
Fig. 34.3 Calibration block cornet extraction interface
As seen from the Fig. 34.4, the absolute error is within 1 m and the accuracy is high. • Robust experiment In this test, a total of ten tests were conducted. From nine times, it was ready to reach the preset target location, and one it deviated from the preset target location, the robustness of the system was better [14, 15].
34.5 Conclusion In this paper, an ARM microprocessor is used as an embedded processor, and a monocular vision tracking and positioning system is designed with a CMOS camera as an image acquisition module. At present, the task that the system can accomplish is to perform position tracking on a preset stationary target object. The experimental results show that the monocular vision tracking and positioning system designed in this paper has high precision and strong robustness, and should meet the accuracy and stability requirements in practical applications. Currently, the system has relatively simple functions, which will be gradually improved such as tracking, detecting, and recognizing static targets with multiple degrees of freedom [16, 17].
5
5.1
0.1
Ideal distance
Arrival distance
Error
1
No
0.3
4.7
5
2
Table 34.1 Precision experimental data (unit: m)
0.2
5.2
5
3
0.2
4.8
5
4
0
5
5
5
0
5
5
6
0.3
5.3
5
7
0.1
4.9
5
8
0.4
4.6
5
9
314 T. Chi and Y. Pang
34 Design of Intelligent Feeding System Based on Computer Vision …
315
Fig. 34.4 Arrival distance comparison
Acknowledgements This work is partly supported by national natural science fund of China (61561027) and the shanghai natural science fund (16ZR1415100). Thanks to the national key R&D projects—research, development, and demonstration of technical equipment for quality-guaranteed storage and transportation of livestock and poultry aquatic products, and the authors would like also to thank the key laboratory of fisheries information of ministry of agriculture.
References 1. Nadarajah, Suthamathy, Flaaten, Ola: Global aquaculture growth and institutional quality. Mar. Policy 84, 142–151 (2017) 2. Wang, J.T., Li, X.Y., Han, T.: Effects of different dietary carbohydrate levels on growth. Aquaculture 459, 143–147 (2016) 3. Dong, G.Q., Zhu, Z.H.: Position-based visual servo control of autonomous robotic manipulators. Acta Astronaut. 115, 291–302 (2015) 4. Dong, G.Q., Zhu, Z.H.: Autonomous robotic capture of non-cooperative target by adaptive extended Kalman filter based visual servo. Acta Astronaut. 122, 209–218 (2016) 5. Lee, Phillip G.: Process control and artificial intelligence software for aquaculture. Aquaculture 23(1), 13–36 (2000) 6. Wu, W.Q., Wang, X.G., Huang, G., Xu, D.: Automatic gear sorting system based on monocular vision. Digit. Commun. Netw. 1, 284–291 (2015) 7. Said, Z., Sundaraj, K., Wahab, M.N.A.: Depth estimation for a mobile platform using monocular vision. Procedia Eng. 41, 945–950 (2012) 8. Karakasis, E.G., Amanatiadis, A., Gasteratos, A., Chatzichristofis, S.A.: Image moment invariants as local features for content based image retrieval using the Bag-of-Visual-Words model. Pattern Recognit. Lett. 55, 22–27 (2015) 9. Premaratne, P., Premaratne, M.: Image matching using moment invariants. Neurocomputing 137, 65–70 (2014) 10. Li, S.Q., Xie, X.P., Zhuang, Y.J.: Research on the calibration technology of an underwater camera based on equivalent focal length. Measurement 122, 275–283 (2018) 11. Davies, E.R.: Chapter 19—Image transformations and camera calibration. In: Davies, E.R. (ed.) Computer Vision, 5th edn., pp. 585–610. Academic Press (2018) 12. Mikhelson, I.V., Lee, P.G., Sahakian, Alan V.: Automatic, fast, online calibration between depth and color cameras. J. Vis. Commun. Image Represent. 25, 218–226 (2014) 13. Lv, Y.W., Feng, J.L., Li, Z.K., Liu, W., Cao, J.T.: A new robust 2D camera calibration method using RANSAC. Optik—Int. J. Light Electron Opt. 126, 4910–4915 (2015)
316
T. Chi and Y. Pang
14. de Moura França, L., Amigo, J.M.: Evaluation and assessment of homogeneity in images, Part 1: unique homogeneity percentage for binary images. Chemometr. Intell. Lab. Syst. 171, 26–39 (2017) 15. Guo, S.Y., Zhou, W.F., Wen, H., Liang, M.X.: Fast binary image set operations on a run-based representation. Pattern Recognit. Lett. 80, 216–223 (2016) 16. Cao, X., Sun, H.B.: Gene Eu Jan, Multi-AUV cooperative target search and tracking in unknown underwater environment. Ocean Eng. 150, 1–11 (2018) 17. Yao, P., Wang, H.L., Su, Z.K.: Cooperative path planning with applications to target tracking and obstacle avoidance for multi-UAVs. Aerosp. Sci. Technol. 54, 10–22 (2016)
Chapter 35
A Real-Time Detection Method of Substation Equipment Label Based on Computer Vision Lin Li, Ya bo Zhao, and Yu Li
Abstract In the article, the distribution characteristics of the paste label interface are studied, and a real-time detection method to detect the label alignment is put forward. Image is captured by the camera of the robot, and then it is processed by image and processing algorithms, such as gray level and filtering. Secondly, the features of image area are analyzed, and the Sobel operator is used to extract the level features of the label interface. Finally, the linear fitting method is used to analyze the characteristics of the interface, and the interface is judged according to the tilt angle of the straight line. Experimental results show that the method can detect the interface position in real time and check whether the interface is aligned. In addition, the algorithm is fast and robust, and can meet the requirements of industrial pipeline.
35.1 Introduction With the development of society, people put forward higher requirements for product labels detection. A wide variety of modern industrial products, in order to control the production process, improve production efficiency and the rate of qualified products, the modern industrial production emphasizes real-time and non-contact detection, which a lot of traditional industries cannot achieve [1, 2]. With the wide application of new electronic technology and microcomputer, the computer vision, artificial intelligence gradually mature, and visual detection technology is used to transport the birth under such situation [3]. Computer vision is an emerging detection technology, which adopts the technology of automation and intelligence, and realizes the non-contact, fast and high precision detection by computer identification and control [4, 5]. The labels pasted on substation equipment often contain relevant information of equipment. If the label sticking of substation equipment is not standard, the label image collected by the robot will bring difficulties to the recognition algorithm when L. Li (B) · Y. Zhao · Y. Li Shandong Luneng Intelligence Technology Co., Ltd, Jinan, Shandong, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_35
317
318
L. Li et al.
Fig. 35.1 Paste normal
Fig. 35.2 Paste abnormal
it is transmitted to the background. Therefore, the label paste whether standard or not, either influence commodity appearance or a manifestation of commodity quality. It is an important application value to study whether or not the label interface is aligned. Figures 35.1 and 35.2 are the normal and abnormal conditions of label paste.
35.2 Preprocessing In the process of image acquisition, the image is susceptible to noise, which may come from the camera or the external environment. Then, the detection and removal of noise is a key step.
35.2.1 Image Filter Image filtering is an important work in image preprocessing, which is to suppress the noise of the target image and retain the details of the image as much as possible. In image processing, smoothing filtering technology is mainly used to eliminate the noise in the image, so that the target area can be well segmented. Common methods
35 A Real-Time Detection Method of Substation Equipment Label …
319
of removing noise include mean filtering, median filtering, Gauss filtering, and so on. Among them, mean filtering is a linear filtering method; it has the advantage of high efficiency, but its drawback is obvious, the edge feature information is blurred and many features will be lost when calculating the mean value. Gauss filtering is also a linear filtering method. Median filtering is a nonlinear filtering method, which is often used to remove noise. Its biggest advantage is that it can effectively protect the edge information of the image. The idea of this design is to check the sampling in the input signal and determine whether it represents the signal, using an odd number of sampling observation window to achieve this function. Median filtering is a common step in image processing. It is particularly helpful for speckle noise and salt and pepper noise. The feature of preserving edges makes it useful in situations where blurred edges are not desirable [6].
35.2.2 Image Segmentation Image segmentation is a key image analysis technology, the purpose of which is to pick out the regions or objectives of interest through analysis and study of the image. Image segmentation plays an important part in image processing. It is a key step from the image processing into the image analysis, and the basis of further image understanding. Generally, image segmentation methods can be summarized as follows: methods based on threshold value, methods based on boundary detection, and edge linking, methods based on special theory [7, 8]. The threshold-based segmentation method is called threshold method for short. The basic idea is to compute one or more threshold based on gray image, and compared with the threshold gray values of each pixel in the image. Finally, the pixels are divided into the appropriate categories according to the comparison results. Therefore, the most critical step of this method is to solve the optimal gray threshold according to a criterion function. In order to get a good segmentation effect for all images, an adaptive threshold algorithm is used to split image in the paper. Figure 35.3 shows the image binaryzation effect.
35.3 Algorithm Design 35.3.1 Sobel Operator Sobel operator is a common detection method in edge detection, whose template has two directions: one is to detect horizontal edges, and the other is to detect vertical
320
L. Li et al.
Fig. 35.3 Binaryzation
Fig. 35.4 Horizontal template of Sobel
edges. Compared with other operators, Sobel operator can reduce the degree of edge ambiguity and achieve better results [9]. Sobel operator is a form of filter operator, which can be used to extract edges. At the same time, it can make use of fast convolution function to make it simple and effective. So, it is widely used in edge detection. The alignment of the label interface is detected in the paper, so only the horizontal edge detection of the Sobel operator is used. The horizontal template is shown in Fig. 35.4. The edge detected by horizontal Sobel operator is shown in Fig. 35.5. Because of the experimental object and the idea of algorithm design, this paper only detects the horizontal edge, but not the vertical edge.
35.3.2 Linear Fitting As can be seen from the binaryzation in Fig. 35.5, the coordinate distribution is concentrated, in which the distribution of these points is linear distribution. Therefore, the points can be determined by linear fitting. The linear fitting is a form of curve fitting. As we all know, the least squares method is simple and commonly used to straight line fitting. In short, the least squares method is an optimization method to
35 A Real-Time Detection Method of Substation Equipment Label …
321
Fig. 35.5 Horizontal edge detection
obtain the optimal value of the objective function. It can obtain unknown data and minimize errors in obtaining data and real data [10, 11]. The linear equation of the edge can be formulated as follows: y = ax + b
(35.1)
The value of x, y are known in Formula (35.1), the values of x, y represent the column and row coordinates of the boundary point, respectively. When fitting a straight line, the coordinates of six points are selected from the boundary at equal intervals. Among them, a and b are undetermined coefficients, representing the slope and intercept of the line, respectively.
35.3.3 Algorithm Flow The algorithm flow is shown in Fig. 35.6. Its steps are as follows. Step1: image preprocessing, mainly median filtering; Step2: binary to extract the target; Step3: extracting the contour to find the largest contour, so get ROI region; Step4: edge detection and noise removal; Step5: fitting the straight line; Step6: find the starting point and ending point of two lines, and compute the difference value;
322 Fig. 35.6 The flow diagram of algorithm
L. Li et al. the detection of image image gray
median filtering
the region of ROI selection
binaryzation
Sobel edge detection
feature extraction
linear fitting
linear angle calculation
less than threshold
N abnormal
Y normal
35.4 Experimental Result and Analysis The camera is fixed in the robot, and the product detection surface and the camera are opposite. The working environment is the indoor environment, the illumination environment is invariable, and the image changes only the label. Figures 35.7, 35.8, 35.9, and 35.10 are illustrations of intermediate results.
Fig. 35.7 Region of interest
35 A Real-Time Detection Method of Substation Equipment Label …
Fig. 35.8 Detection result
Fig. 35.9 Maximum contour and center of mass Fig. 35.10 Linear fitting
323
324
L. Li et al.
The result of Hough transform line detection is shown in Fig. 35.10. As can be seen from Fig. 35.10, multiple lines can be detected at the same position by using Hough transform to find line. On the other hand, there are many parameters involved in Hough transform, and it cannot be adaptively changed according to the actual situation of the scene. In summary, the method of fitting line can avoid the shorting of Hough transform line detection, that is, there will be no problem of merging or choosing multiple straight lines, so the algorithm in the paper has high efficiency and good real-time performance, and can be used in actual scene.
35.5 Conclusion In this paper, a kind of substation equipment labels detection algorithm is proposed, which can realize whether the label interface is aligned. The algorithm uses digital image processing technology and straight line fitting knowledge. The experiments show that the algorithm can be effectively applied to the real-time detection of tag interface state. The algorithm has stable performance, high accuracy, and strong robustness. The successful implementation of the algorithm can promote the process of non-contact detection in the industrial area, and promote the development of automation and intelligence in industrial inspection.
References 1. Huang, L., Zhang, Y.L., Hu, B., Ma, Z.M.: Automatic detection of liquid level in transparent bottle based on machine vision. Autom. Instrument. 27, 57–60 (2012) 2. Wu, P.F., Chang, J.M.: Label location detection based on computer vision. J. Jianghan Univ. Nat. Sci. Edition 64(4), 325–330 (2018) 3. Wu, X.: Oral Product Quality Machine Vision Detection Method. Hunan University, 1–9 (2013) 4. Liu, P., Wu, R.M., Yang, P.X., Li, W.J., Wen, J.P., Tong, Y., Hu, S., Ai, S.R.: Study of sensory quality evaluation of tea using computer vision technology and forest random method. Spectros. Spect. Anal. 39(1), 193–19 (2019) 5. Pena-Gonzalez, R.H., Ma, N.M.: Computer vision based real-time vehicle tracking and classification system. In: IEEE International Midwest Symposium on Circuits and Systems 57th, pp. 679–682. College Station, TX, USA (2014) 6. Ye, W.Z.: Optimality of the median filtering operator. Circuits Syst. Signal Process. 30, 1329– 1340 (2011) 7. Mageswari, S., UmaaMala, C.: Analysis and performance evaluation of various image segmentation methods. In: International Conference on Contemporary Computing and Informatics, pp. 469–473 (2015) 8. Peng, B., Zhang, L., Zhang, D.: A survey of graph theoretical approaches to image segmentation. Pattern Recognit.: J. Pattern Recognit. Soc. 46, 1020–1038 (2013) 9. Koyuncu, I., Cetin, O., Katircioglu, F., Tuna, M.: Edge detection application with FPGA based Sobel operator. In: Signal Processing and Communications Applications Conference 23rd, pp. 1829–1832. Malatya (2015)
35 A Real-Time Detection Method of Substation Equipment Label …
325
10. Guo, S.Y., Zhai, J., Tang, Q., Zhu, Y.J.: Combining the Hough transform and an improved least squares method for line detection. Comput. Sci. 39(4), 196–198 (2012) 11. Qiao, Y.Q., Xiao, J.H., Huang, Y.H., Yin, K.Y.: Randomized Hough transform straight line detection based on least square correction. J. Comput. Appl. 35(11), 3312–3314 (2015)
Chapter 36
Target Model Construct Method Based on Sobel and Snake Algorithm Xiao Guang Li, Qi Ming Dong, and Yu Hang Tan
Abstract Traditional target model is inaccurate. In order to solve this problem, this paper proposes a target model construct method based on Sobel and Snake algorithm. Firstly, get the edge information using Sobel operator in the target model area and get the gradient direction information of each edge pixel. The edge pixels are used as control points. Secondly use Snake algorithm to move the control points to get the accurate contour point. Bilinear interpolation is used to construct the target contour. The experimental results show that the new modeling method is more accurate than the traditional method. The pixel dot number is reduced to 10703 which is 50.9% of the traditional model. The calculate time is reduced to 31 ms when using normalized cross-correlation to calculate the correlation coefficient which provides the necessary model support for the following tracking.
36.1 Introduction Good target apparent model is one of the decisive factors in tracking performance [1, 2]. The image can be divided into two parts, the foreground and the background. The model can be used to distinguish the foreground from the background. The target model is composed of two aspects: the shape of the target template and the characteristics of the target. Xi et al. [3] gave a survey of appearance models in visual object tracking. Uyei [4] did model research in medical science. In the actual tracking, the characteristics of the target are used to determine the real time and robustness of the tracking algorithm. Aiming at the shape of the template, the characteristics of the target and the update strategy of the model are analyzed and studied, and the goal of modeling method based on Sobel and Snake algorithm is proposed.
X. G. Li (B) · Q. M. Dong · Y. H. Tan Electric Information College of Changchun Guanghua University, Changchun 130031, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_36
327
328
X. G. Li et al.
36.2 Traditional Modeling Method The traditional target tracking system uses rectangle or ellipse to represent the object, as shown in Fig. 36.1. This way is essentially impossible to remove the interference of the background, and when the choice of rectangular or oval is less than the target, such as the oval chart, it is possible to cause loss of the target part. In this paper, some scholars put forward the double rectangle to extract the target area, implementation of first target area perpendicular to the length of axis, rectangular area to make a certain angle with the rotation center axis length, and then to the intersection with the long axis of the vertices of a rectangular square, the square rectangle intersects with the region as the target template, as shown in Fig. 36.2. Rectangular or oval with the traditional ratio has been greatly improved, but extracted from the graph target template can be seen, this method is still carrying some background information or has resulted in some loss of target, so it is not the best shape. It can be seen that there is some background using traditional modeling method. In order to get the best target model, this paper proposes a contour extraction method based on the target table view, the target with the true shape of the target, with completely removed background interference but without losing the target information.
Fig. 36.1 Extraction methods of the traditional target model
Fig. 36.2 Extraction target model with improved double rectangle method
36 Target Model Construct Method Based on Sobel …
329
36.3 Target Extraction Based on Edge Detection The expression of the target with the real contour target can affect the complete removal of the background, and to ensure no loss of information of the target, in order to make the PC contour extraction machine, the first is to be given a rough rectangular target area, the center point of the target area should be on the target, and the rectangular length and width should be slightly larger than the target itself. Get a rough area, is a series of operations in the region to extract the contour. The specific steps are as follows: 1, edge detection; 2, extraction of peripheral edge; 3, the outer peripheral edge detection of line segments; 4, line segment connected target contour image data; and 5, from inside the contour as the target template.
36.3.1 Edge Detection of the Target Area Sobel operator [5–7] is a derivative of the edge detection operator, in the process of implementing the algorithm by 3 × 3 pixels, each template as the core, and image convolution and the operation done, and then select the appropriate threshold to extract the edge of the 2 direction templates, as shown in Fig. 36.2; the horizontal and vertical edge detection at any point, the use of the image of the operator, will produce the corresponding gradient vector or the normal vectors, as if the point is to judge the edge. Then use bilinear interpolation method to fill the data.
36.3.2 Snake Algorithm The contour obtained by the previous method is a rough outline. In order to make the outline more accurate and smooth, move the edge dots to accurate place. Snake algorithm [8–10] has been widely studied by Kass, Witkin, and Terzppoulos in 1987. The outline curve is seen as a continuous curve which is controlled by image force, external force, and internal force. The image force makes the curve move to the contour. It is often the image gray function or the gradient function. External force is the artificially added constraint. The contour line has certain characteristics. Preset the characteristics to control the curve and the internal force is from the curve itself. Traditional application of Snake algorithm needs to extract the control points. Based on the above work, uniform extract N points from the rough outline as the Snake points. Move the N points to local optimal place using Snake algorithm. So the outline is more real to present the target.
330
X. G. Li et al.
Fig. 36.3 Block of target outline template extract process
Fig. 36.4 Sketch of the rigid target template is extracted
36.4 Extraction Process of Target Outline Template The whole contour extraction process is shown in Fig. 36.3. The result is shown in Fig. 36.4. The steps seem complicated, but we can get the final target template using less than 1 ms. For this target, calculate the NC correlation [11] using the traditional target template and the new target template; the processing of the two cases is as Table 36.1.
36.5 Conclusions When calculate the NC correlation using outline template, the computational complexity is reduced. It is obvious that new target template can not only match the target more accurately than the original template, but also reduce the computational complexity of the algorithm. The new template ensures the real-time performance
36 Target Model Construct Method Based on Sobel … Table 36.1 Comparison of traditional template and new template
331 Example Traditional template
New template
Pixel
150*140 = 21000
10703
Process Time 1
51 ms
31 ms
Process Time 2
61 ms
31 ms
Process Time 3
59 ms
32 ms
Process Time 4
58 ms
31 ms
Process Time 5
52 ms
31 ms
Average process time
56.2 ms
31.2 ms
and reliability of the algorithm. The new modeling method is more accurate than the traditional method. The pixel dot number is reduced to 10703 which is 50.9% of the traditional model. The calculate time is reduced to 31 ms when using normalized cross-correlation. The work described in this paper was fully supported by Jilin Province Development and Reform Commission (No. 2019C054-7).
References 1. Ping-yue, L., Chang-qing, L., Sheng-li, S.: Dim small moving target detection and tracking method based on spatial-temporal joint processing model. Infrared Phys. Technol. (5) (2019) 2. Bitencourt-Ferreira, G., de Azevedo Filgueira, W.: Homology modeling of protein targets with modeller. Methods Mol. Biol. (Clifton, N.J.) (9) (2019) 3. Xi, L., Weiming, H., Chunhua, S.: A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. (TIST) (4) (2013) 4. Uyei, J.: Setting ambitious targets for surveillance and treatment rates among patients with hepatitis C related cirrhosis impacts the cost-effectiveness of hepatocellular cancer surveillance and substantially increases life expectancy: a modeling study. PloS One (9) (2019) 5. Ri-Gui, Z., Da-Qian, L.: Quantum image edge extraction based on improved sobel operator. Int. J. Theor. Phys. (8) (2019) 6. Evans, B.J.: Response to Dreyfus and Sobel. Am. J. Hum. Genet. (7) (2018) 7. Kun, Z., Yuming, Z., Pu, W.: an improved Sobel edge algorithm and FPGA implementation. Procedia Comput. Sci. (6) (2018) 8. Bittenbinder, M.A., Dobson, J.S., Zdenek, C.N.: Differential destructive (non-clotting) fibrinogenolytic activity in Afro-Asian elapid snakevenoms and the links to defensive hooding behavior. Toxicol. Vitr. (10) (2019) 9. Nicholas, Y.J., Christina, Z.N., James, D.S.: Mud in the blood: novel potent anticoagulant coagulotoxicity in the venoms of the Australian elapid snake genus Denisonia (mud adders) and relative antivenom efficacy. Toxicol. Lett. (12) (2018) 10. Domínguez-Pérez, D., Durban, J., Agüero-Chapin, G.: The Harderian gland transcriptomes of Caraiba andreae, Cubophis cantherigerus and Tretanorhinus variabilis, three colubroid snakes from Cuba.Genomics (12) (2018) 11. Benito-Arenas, R., Doncel-Pérez, E., Fernández-Gutiérrez, M.: A holistic approach to unravelling chondroitin sulfation: correlations between surface charge, structure and binding to growth factors. Carbohydr. Polym. (12) (2018)
Chapter 37
Parent Block Classification of Fractal Image Coding Algorithm Based on ‘Shizi’ Feature Li Wang and Zengli Liu
Abstract Image coding technology plays an important role in data transmission, and fractal image compression coding algorithm has attracted much attention due to its high compression ratio and excellent recovery effect. Based on the basic fractal image compression algorithm, this paper proposes a classification parent block fractal image compression algorithm based on ‘shizi’ feature. In the algorithm, the parent block is divided into three categories according to the ‘shizi’ feature of the parent block, which constitutes three types of codebooks and form three types of codebooks. The composition of each codebook is determined by the partition threshold. Child blocks are divided into stationary blocks and feature blocks, and each of them searches for the best matching block in the corresponding codebook. This algorithm can effectively shorten the encoding time, and the image recovery effect is also very impressive.
37.1 Introduction With the continuous development of the era of big data, the transmission and storage of data is facing a huge test. Usually, the original image directly transmits or stores a large amount of storage space, and the image needs to be encoded by an appropriate encoding method to achieve a certain space resource storage or transfer more data. Natural images tend to have great repetitiveness. But it is these characteristics that are similar in structure and high in repetitiveness, which leads to the concept of fractals. The coding idea of fractal image compression comes from researchers such as Barnsley. Since its development, many improved coding algorithms have been produced [1]. These improved algorithms are all around a core issue: how to get high-quality recovery images within the shortest time. This led to two problems. (1) To change global search to local search. (2) To ensure the recovery of image effects [2, 3]. L. Wang (B) · Z. Liu Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650000, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_37
333
334
L. Wang and Z. Liu
The basic fractal image coding algorithm can achieve high compression ratio, but time is too long. In order to solve it, He et al. proposed a fast fractal image coding algorithm based on variance to improve the coding efficiency. Later, it was proposed to use the characteristics of the image itself as the classification basis. For example, the sub-block feature classification proposed by Zhou [2], and the classification parent block method proposed by Zhao and Wang are all shortened by using block feature classification [4, 5]. In order to ensure image restoration quality, a method combining wavelet transform [6–8] and DCT transform [9] is proposed. At present, the feature method is widely proposed, how to wait for the crossover features realized by [10, 11] and the double crossover and features proposed by Zhang et al. [12]. There are also ways to use the neighborhood to define thresholds [13, 14]. This paper proposes an eigenvalue method, which extracts eigenvalues for classification, takes the parent block as the center, and draws the range for searching.
37.2 Coding Algorithm In the traditional fractal coding, the global search method is used to search for the best parent block Dm(i, j) in the codebook . The distance between the child block and the parent block in the codebook is defined as E. Then the encoding process is the min E(R, Wk D), Wk represents process of finding E(Rr , Wk(i, j) Dm(i, j) ) = D∈,k∈[1,8]
the transformation process, Wk(i, j) Dm(i, j) represent the best search term. Record transformation parameter contrast factor s and brightness factor o and equidistant transformation number k. The process of image restoration is the inverse of coding, which is the process of minimizing the error between child blocks and iteratively recovering child blocks s · Wk Dm + o · I , we can also think E(Rr , s · Wk(i, j) Dm(i, j) + o · I ). E(Rr , s · Wk(i, j) Dm(i, j) + o · I ) =
min|| D∈,k∈[1,8]
Rr , s · Wk Dm + o · I ||
(37.1)
The most important and time-consuming part of the fractal coding process is the process of child blocks finding the best matching parent block Dm(i, j) . Figure 37.1 shows the trend change between the three features of the image. The key to feature classification in the encoding process is to characterize the parent block . The method of classifying the parent block is as follows: (1) Select the parent block D : di, j |i, j = 1, 2, 3 . . . n , the size is n × n,then calculate the gray mean μ and variance σ . (2) The ‘shizi’ feature is extracted for each parent block D, and the feature extraction principle is as follows:
37 Parent Block Classification of Fractal Image Coding Algorithm …
335
Fig. 37.1 Plot of the three feature contrast
⎤ ⎡ n/2+1 n n n/2+1 1 ⎣ μ, = di, j + di, j ⎦ 4 ∗ n i=1 j=n/2 i=n/2 j=1 ⎤ ⎡
n/2+1 n n n/2+1
1
⎣ σ, = (di, j − μ, )2 + (di, j − μ, )2 ⎦ 4 ∗ n i=1 j=n/2 i=n/2 j=1 ⎤ ⎡ n n 1 ⎣ μ, = di,n/2+1 + dn/2+1, j ⎦ 2 ∗ n i=1 j=1 ⎤ ⎡
n n
1
⎣ σ, = (di,n/2+1 − μ, )2 + (dn/2+1, j − μ, )2 ⎦ 2 ∗ n i=1 j=1
(37.2)
where, respectively, represents that n is an odd number and an even number,μ and σ are ‘shizi’ features, and they are used to divide the parent block into codebooks based on two eigenvalues. (3) Define thresholds T 1, T 2, T 3 where T 1 is the upper limit of the variance threshold of the stationary block, T 3 is the lower limit of the variance threshold of the feature block, and T 2 defines the range of the mean value of the block. Find the characteristics of the parent block and then divide the parent block into codebook as 1 , 2 and 3 . The parent block features in the codebook are 1 , 2 , 3 .
336
L. Wang and Z. Liu
Fig. 37.2 Plot of three gray scale change
1 : σ < T1 |di, j − μ| < T 2 i, j = 1, 2, 3 . . . n 2 : σ > T3 |di, j − μ| > T 2 i, j = 1, 2, 3 . . . n 3 : T1 > σ > T3 |di, j − μ| > T 2 i, j = 1, 2, 3 . . . n
(37.3)
The gray distribution of the parent block in codebook 1 is relatively uniform and can be seen as a gray distribution without strong transformation fluctuations. The gray distribution changes relatively strong in 3 . The gray level of the parent block in codebook 2 presents a regular change. Here, the blocks divided under the features 2 and 3 are called contour blocks (Fig. 37.2). The graphs a, b, and c represent the gray distribution of the parent block in the three types of codebooks 1 , 2 , and 3 . The darker the color, the larger the gray value. The classification of the codebook defines a range for the child block search matching block, performs the same feature classification for each child block, and searches for the best match in the corresponding codebook Dm(i,j) . Define the initial search radius d and the step size td, where t is greater than or equal to 1 positive number, and search for a more suitable parent block Dm(i,j) in the d + td range centering on the best match. The specific algorithm is as follows: Step 1 Determine the child block R and the parent block D size n × n and m × m (general case m = 2 × n), define the matching error e. Step 2 Child block and parent block division for a given original image, Ri ∩ R j = ∅ and Di ∩ D j = ∅, and the block position is recorded with the coordinates of the upper left corner (k, l). Step 3 Takes the parent block from the image and calculates the ‘shizi’ feature for each of it, then classifies according to the feature, and records the coordinates. Step 4 Take the child block from the image and calculate the feature φ. If the characteristics are met φγ , the child block R is considered to be a stationary block, and it does not participate in search matching, saves child block coordinates and mean values only. Otherwise, calculate child block feature values and matches corresponding codebooks.
37 Parent Block Classification of Fractal Image Coding Algorithm …
337
φγ : σ , < σ |ri, j − μ| < σ {ri, j |i, j = 1, 2, 3 . . . n}
(37.4)
Step 5 Scale change A and 8 kinds of stationary transforms N Nn , n ∈ [1, 8] for the parent block D in the codebook. The process of matching is the process of finding D · A · N Nn . Finally the error dp, the π(k, l), the scaling factor s, and offset o must be found. Step 6 If dp < e, encode π(k, l), transformation number n, factor s, and the offset o. (7) Repeat steps 4 to 6 until all child blocks have been matched.
37.2.1 Decoding Process First, take the image of the same size as the original image, and then using the stored π(k, l), transform number n, factor s, and the offset o. B = s · R + I · o, from the formula, W (B), W (W (B)),W (W (W (B))), until the restored image is close to the original image.
37.3 Results and Discussion 37.3.1 Threshold Effect The thresholds T 1, T 2, and T 3 play a role in limiting the codebook partition. According to the division of the gray distribution of the parent block, the threshold divides the parent block into three types of codebooks. Table 37.1 shows the influence of the threshold on the codebook partition. The changes of the thresholds T 1, T 2, and T 3 will affect the number of parent blocks in the codebook. As we can see the change of T 2 will make the parent block in 1 be classified as 3 . Table 37.1 The effect of thresholds T 1, T 2, and T 3 on codebook partitioning T1
T2
T3
1
2
3
0.5
100
5
20
436
568
0.7
100
5
53
436
535
0.7
100
4
53
470
501
338 Table 37.2 The affect of e
L. Wang and Z. Liu e
Article algorithm
Basic algorithm
e = 0.7
PSNR = 31.5469
PSNR = 31.1280
Coding time = 55.7 s
Coding time = 370.3 s
e = 0.5
e = 0.3
Decoding time = 1.8 s
Decoding time = 1.8 s
PSNR = 31.6824
PSNR = 31.1434
Coding time = 57.35 s
Coding time = 406.7 s
Decoding time = 1.7 s
Decoding time = 1.8 s
PSNR = 31.6084
PSNR = 31.1349
Coding time = 59.9 s
Coding time = 424.2 s
Decoding time = 1.7 s
Decoding time = 1.8 s
37.3.2 Matching Error e The choice of threshold e directly affects the search results and determines whether the search process stops. Table 37.2 shows the effect of different e thresholds on image recovery under the conditions of T 1 = 0.5, T 2 = 110 and T 3 = 4.
37.4 Decoding Result This paper divides the parent block into three categories based on the gray. The best matching block is searched in the corresponding codebook. As the search scope is limited, the matching time is greatly shortened. Time corresponds to the number of iterations. In the image of Cameraman we can see from Fig. 37.3 where e = Fig. 37.3 PSNR under different time
PSNR under different Time(e=0.5)
32
Basic algorithm Article algorithm
30 28
PSNR
26 24 22 20 18 16 14 12
4
5
6
7
Time
8
9
10
37 Parent Block Classification of Fractal Image Coding Algorithm …
Original image
e =0.5
e =0.3
e =0.7
e =0.5
Original image
e =0.7
e =0.3
Original image
Original image
339
e =0.3
e =0.3
Fig. 37.4 Plot of recovery images with different e values
0.5, compared with the basic algorithm, the PSNR tends to be stable and achieve a relatively good recovery effect at Time = 6. The basic algorithm stabilizes when Time > 8. Figure 37.4 shows the image restoration effect when e = 0.3, e = 0.5, e = 0.7 under the conditions of T 1 = 0.5, T 2 = 110, T 3 = 4 with the image Cameraman and Lena.
37.5 Conclusion Based on the traditional basic fractal coding, it has the disadvantages of high computational complexity and long coding time. According to the feature method, a classification method of classification parent block fractal image based on ‘cross’ feature vector is proposed. By using the ‘cross’ feature to convert the global search in fractal compression coding into a local search, the coding time is shortened, which has strong flexibility and broad application prospects.
References 1. Huang, X.Y., Zhang, Z.H., He, C.J.: The Theory and Application of Modern Intelligent Algorithms. Science Press (2005) 2. Zhou, Y.B.: The principle and development trend of fractal image coding. Mod. Electron. Tech. 8(151), 57–59 (2003) 3. Polvere, M.,. Nappi, M.: Speed-Up In fractal image coding: comparison of methods. IEEE Trans. Image Process. 9(6) (2000) 4. Zhao, X.L., Sun, W.: Adaptive Image Fractal Coding Algorithm Based on New Parent Block Library Classification, vol. 31(4), pp. 12–17. Tianjin University of Technology Press (2015) 5. Wang, W.W., Zhang, A.H., Tang, T.T.: Fast fractal coding algorithm based on classification of parent block library. Comput. Technol. Dev. 27(4), 51–59 (2017)
340
L. Wang and Z. Liu
6. Hui, Q.J., Li, H.A., Lu, Y.: Image compression method based on wavelet transform and human vision system. J. Electron. Meas. Instrum. 30(12), 1838–1844 (2016) 7. Wang, W.W.: Image Compression Coding Method Based on Fractal Theory and Wavelet Transform, Nanjing (2018) 8. Andreopoulos, I., Karayiannis, Y.A., Stouraitis, T.: A hybrid image compression algorithm based on fractal coding and wavelet transform. In: ISCAS 2000—IEEE, pp. 28–31 (2000) 9. Chang, C.-C., Chiang, C.-L., Hsiao, J.-Y.: A DCT-domain system for hiding fractal compressed images. IEEE Xplore (2005) 10. He, C.J., Huang, X.Y.: A fast fractal image coding algorithm based on image block crossing. Chin. J. Comput. 28(10), 1753–1759 (2005) 11. He, C.J., Shen, X.N.: Improved fractal image coding for crossing algorithm. Chin. J. Comput. 30(12), 2156–2163 (2007) 12. Zhang, J., Zhang, A.H., Wang, W.: Research on fast fractal image coding based on double intersection and features. Comput. Technol. Dev. 27(3), 159–162 (2017) 13. Gupta, R., Methrotra, D., Tyagi, R.K.: Comparative analysis of edge-based fractal image compression using nearest neighbor technique in various frequency domains. Alex. Eng. J. 57, 1525–1533 (2018) 14. Roy, S.K., Kumar, S., Chanda, B.: Fractal image compression using upper bound on scaling parameter. Chaos, Solitons Fractal. 106, 16–22 (2018)
Chapter 38
An Improved LBP Feature Extraction Algorithm Yongpeng Lin, Jinmin Zhang, and Siming Wang
Abstract In this paper, an improved LBP (Local Binary Patterns) feature extraction algorithm is proposed. According to the traditional LBP algorithm for noise sensitivity, an improved median filtering is adopted. For the LBP operator, the neighboring neighbor points are ignored when extracting feature information. The shortcomings of the relationship, the LBP algorithm is improved, firstly according to the distance between the neighborhood point and the central pixel point, the new field pixel value is obtained, and then the central pixel is compared with the new neighborhood pixel gray value. The operation takes its absolute value, and finally finds the average of all absolute values as an introduced threshold. The threshold improves the parameters of LBP encoding, and uses the improved algorithm to extract the railway fasteners features. Experiments show that the improved LBP algorithm extracts the railway fasteners. The effect is better.
38.1 Introduction LBP is an operator proposed by Finnish scientist T.Ojala in 1996 to describe the texture features of images [1]. It is used to measure and extract the local texture information of images. It has obtained a wide range of excellent rendering capabilities for image local texture features. The LBP feature has strong classification ability and high computational efficiency. The main idea is to respond to the relative gray level of a certain point and its domain pixels. It is this relative mechanism that makes the LBP operator invariant to monotonic gray-scale transformation [2–4]. In this paper, the basic LBP and the LBP in the circular domain are introduced firstly. Then the LBP improvement method and the improved LBP feature are proposed. Then the Y. Lin (B) · J. Zhang School of Mechanical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China e-mail: [email protected] S. Wang School of Automation and Electrical Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_38
341
342
Y. Lin et al.
bolt is used as the carrier to extract the features and compare the extraction effects of different LBP operators.
38.2 Basic LBP Figure 38.1 shows a basic LBP operator. The process of applying the LBP operator is similar to the template operation in the filtering process. The image is scanned progressively for each pixel in the image, the gray level of the point is recorded as gc , and the 8th neighborhood of 3 × 3 is around, and the gray value is recorded as gi (i = 1…8) to gc . As a threshold, if the gray value of the surrounding pixel is greater than or equal to the gray value of the point, this point is marked as 1, otherwise marked as 0, and binarized as follows: Si (gi ) =
1, gi ≥ gc 0, gi < gc
(38.1)
The result of the binarization is formed into an 8-bit binary number in a certain order, and the value of the binary number (0–255) is used as the encoding of the point. VLBP =
n
2i · Si (gi )
(38.2)
i=1
For example, for the center point of the 3 × 3 regions in Fig. 38.1, the gray value is assumed to be as follows, with its center gray value 90 as a threshold, the 8 neighborhoods are binarized, and clockwise from the upper left point (the specific order may be arbitrary, as long as uniformity) can be used to compose the result of binarization into a binary number 10011001, which is 153 in decimal, as the response of the center point. After the entire progressive scan process, an LBP response image will be obtained. The histogram of the response image is called the LBP statistical
g1
g2
g3
125
73
88
g8
gc
g4
111
90
92
1
g7
g6
g5
66
42
203
0
1
0
0
Threshold
Fig. 38.1 Basic LBP operator
1
0
1
38 An Improved LBP Feature Extraction Algorithm
343
histogram, or the LBP histogram, which is often used as a feature of subsequent recognition work, and is therefore also called LBP feature.
38.3 Round Field LBP The basic LBP operator only contains 3 × 3 pixels, and the 3 × 3 pixels involve a small area, the number of effective pixels is insufficient. Which has no obvious influence on the central pixel, and can be further promoted to use neighborhoods of different sizes and shapes. The use of a circular neighborhood combined with bilinear interpolation to obtain arbitrary radius and any number of neighborhood pixels, both of which can be set autonomously. This LBP operator is denoted as LBPP,R , where P denotes the P neighborhood and R denotes the radius of the circular neighborhood. The distribution of image local texture T can be regarded as the joint distribution density of pixel gray scale in a local area. The texture can be defined as: T = t (gc , go , . . . , g P−1 )
(38.3)
Where gc is the gray value of the central pixel of the local neighborhood of the image, and gi (i = 0, 1, … , P − 1) corresponds to the P equidistant distributions with the radius R and the center pixel as the center the gray value of the pixel on the circumference, and the distribution of the local neighborhood pixel corresponding to different P and R values is shown in Fig. 38.2. Under the premise of not losing texture information, subtracting the central pixel gray value gc from the neighborhood point gi , there are: T = t (gc , go − gc , . . . , g P−1 − gc )
(38.4)
Calculating the difference between all the pixels in the neighborhood can make the larger or smaller gray level uniform (especially under the condition of uneven
(a) P=4, R=1
(b) P=8, R=2
Fig. 38.2 Schematic diagram of LBP operator in circular domain
(c) P=16, R=4
344
Y. Lin et al.
illumination), the uniform illumination can be regarded as the translation of the gray range. With uniform brightness invariance, LBP local texture features have translational invariance in the gray range. The difference between the center pixel and the surrounding pixel is independent of the value of the center pixel, and the transition (38.5) is obtained: T = t (gc , go − gc , . . . , g P−1 − gc )
(38.5)
Since t(gc ) only describes the brightness of the image, it is irrelevant to the local texture characteristics of the image, it can be ignored. The conversion formula (38.6) is: T = t (s(go − gc ), . . . , s(g P−1 − gc ))
(38.6)
1, x ≥ 0 According to the formula (38.6), the weight 2i is assigned, 0, x < 0 and the code of the LBP can be obtained. The s(x) =
L B PP,R =
p−1
2i · s(gi − gc )
(38.7)
i=0
38.4 Improved LBP Algorithm In the process of image acquisition, due to the influence of external environment such as illumination, the acquired image will inevitably have certain noise, which will affect the subsequent feature extraction [5–9]. In this paper, for the problem, the image is analyzed by median filtering, Gaussian filtering, and median filtering. The optimal filtering algorithm is selected for the images collected in this paper. After a large number of contrast experiments, this paper adopts an improved median filtering to scan the image line by line. When processing each pixel, it is judged whether the pixel is the maximum or minimum value of the neighboring pixels covered by the filtering window. If so, the pixel is processed using normal median filtering; if not, it is not processed. The filtering effect is shown in Fig. 38.3. After the comparison, it is not difficult to find that the (e) diagram in Fig. 38.3 has better preservation of the image details while the noise is perfectly filtered out, and the other edges are more clearly. The original LBP algorithm only performs the difference calculation between the central pixel point and the gray value of the domain pixel point, and only considers the result of subtracting the gray value of the pixel point in the center of the image region from other neighboring pixel points. Does not take into account the role of the
38 An Improved LBP Feature Extraction Algorithm
(a) Noise image
(b) Gaussian filtering
(d) Median filtering
345
(c) Mean filtering
(e) Improved median filtering
Fig. 38.3 Different filtering algorithm results
central pixel and the relationship between the gray values of several other neighborhood pixels and the position of the distribution of points in each field [10], and the images are generally continuous, so the relationship between adjacent neighborhood points can better describe the image. The traditional LBP algorithm will inevitably lead to some important local feature information loss affecting the final classification recognition [11]. The traditional LBP algorithm will cause some important information loss in feature extraction. Based on the defects of traditional LBP operator, this paper proposes an improved LBP operator. The specific methods are as follows: (1) As shown in Fig. 38.4, since the neighboring points g1 , g2 , … g8 are different in distance from the central pixel point gc , each neighborhood point has a different effect on the central pixel point gc , and the window of 3 × 3 in Fig. 38.4 is seen. Fig. 38.4 The relationship between the central point and the domain point
g1
g2
g3
g8
gc
g4
g7
g6
g5
346
Y. Lin et al.
In a square, the central pixel point gc is centered, where in the neighborhood points g1 , g3 , g5 , and g7 are diagonal vertices, which are far from the center point, and the vertex, closer to remaining four neighborhood points g2 , g4 , g6 , and g8 are adjacent √ the center point. The distance between gc and g1 , g3 , g5, g7 is 2 times the distance between gc and g2 , g4 , g6 , g8 . Since the closer to the center pixel point, the more √ the center point can be expressed, so let g1 , g3 , g5 , g7 weighting value of is δ1 = 2/2, g2 , g4 , g6 , g8 weighting value of is δ2 = 1. Each neighborhood point g1 , g2 …g8 is weighted according to formula (38.8), and the corresponding value G i is obtained as the value of the corresponding neighborhood point. Gi =
δ1 gi , i = 1, 3, 5, 7 δ2 gi , i = 2, 4, 6, 8
(38.8)
(2) Secondly, a threshold value will be introduced. Specifically, the gray value of the pixel point at the center position and the weighted gray value of the adjacent pixel point is subtracted to take the absolute value, and each absolute value is added and then the average value is taken to the threshold, as shown in Eq. (38.9). T =
n 1 G i − gc n i=1
(38.9)
(3) After getting the introduced threshold T, the gray value G i of the weighted domain pixels and the gray value of the central pixel gc are subtracted in a certain order to obtain the absolute value, and then the absolute value and the threshold T are used to size by comparison. If the value is greater than or equal to, the code is 1; otherwise, the code is 0, as shown in Eq. (38.10). Si (gi ) =
1, |G i − gc | ≥ T 0, |G i − gc | < T
(38.10)
(4) Finally, the result of binarization is formed into an 8-bit binary number in a certain order, and the value of the binary number (0–255) is used as the encoding of the point, as shown in Eq. (38.11). VLBP =
n
2i · Si (gi )
(38.11)
i=1
38.5 Experimental Analysis In order to test the performance of the algorithm, in order to verify the effectiveness and superiority of the proposed algorithm, the processor is selected for the Intel
38 An Improved LBP Feature Extraction Algorithm
(a) Original image
(b) Improved LBP
347
(c) Round field LBP
(d) Basic LBP
Fig. 38.5 Different LBP feature extraction effects
Table 38.1 Detection accuracy of different algorithms
Algorithm
Detection accuracy (%)
Basic LBP
89.87
Round field LBP
90.53
Improved LBP
94.75
i5 CPU 3.5GHZ memory on the 8 GB platform using MATLABR 2016b under the Windows 7 system to build the experimental platform for algorithm writing and experiment. The self-collected bolt image is selected as the test sample, and compared with the traditional LBP algorithm, the experimental results are analyzed, as shown in Fig. 38.5. Comparing the different algorithms to extract the characteristics of railway fasteners, as shown in Table 38.1, it can be seen from Table 38.1 that the feature recognition rate of the improved LBP algorithm is slightly higher than that of the traditional LBP and the circular field LBP. Therefore, the improved LBP algorithm is better in this paper.
38.6 Conclusion In this paper, an improved LBP feature extraction algorithm is proposed. For the inevitable noise of the image acquisition process, after comparing various filtering methods of the image, an improved median filter is used to filter the image for the basic LBP. The operator ignores the shortcomings of the relationship between adjacent neighborhood points when extracting feature information. The LBP algorithm is improved. Firstly, the weighting operation is performed according to the distance between the neighborhood point and the central pixel point, and a new domain pixel value is obtained. Secondly, a new threshold is introduced, which is used as a parameter in the improved LBP encoding. The threshold is the difference between the central pixel and the new neighborhood pixel gray value, and the absolute value is obtained. Finally, an average of all absolute values is obtained. The value is used to extract the feature of the rail fastening bolt using the improved LBP feature extraction
348
Y. Lin et al.
algorithm. The experimental results show that the improved LBP algorithm has better feature extraction. Subsequent LBP feature extraction is applied to feature extraction in face motion, namely, feature extraction and 3D face model research.
References 1. Ojala T.F., Pietikanem, M.S., Harwood D.T.: A comparative study of texture measures with classification based on featured distributions. Pattern Recognit. J. Pattern Recognit. Soc. 29(1), 51–59 (1996) 2. Wang Qiang, F., Li Bailin, S., Hou Yun, T.: An improved lbp for rail fastener identification. J. Southwest Jiaotong Univ. 53(5), 893–899 (2018) 3. Yu Yafeng, F., Liu Guangshuai, S., Ma Ziheng, T.: Improved pairwise rotation invariant cooccurrence local binary pattern algorithm used for texture feature extraction. J. Comput. Appl. 36(12), 3389–3393 (2106) 4. Tian Siyang, F., Xu Ke, S., Guo Huizhao, T.: Application of local binary patterns to surface defect recognition of continuous casting slabs. Chin. J. Eng. 38(12), 1728–1733 (2016) 5. Xie Qinlan, F.: Adaptive Gaussian smoothing filter for image noise reduction. Comput. Eng. Appl. 45(16), 182–184 (2009) 6. An Jingyu, F., Ma Xianmin, S.: Image denoising for furnace flame based on median filter and wavelet transform. Comput. Eng. Sci. 38(8), 1702–1708 (2016) 7. Zhao Jiyin, F., Hao Zhicheng, S., Li Jianpo, T.: Image denoising using improved adaptive proportion-shrinking algorithm in Wavelet field. Optoelectron. Eng. 33(1), 81–84 (2006) 8. Qiao Naosheng, F., Zhang Fen, S., Li Xiaoqin, T.: Defect Image Preprocessing of Printed Circuit Board. Laser Optoelectron. Prog. 2, 142–147 (2015) 9. Wu Yihong, F., Xu Gang, S., Jiang Juanjuan, T.: Research on Workpiece Image Feature Recognition Based on LBP and SVM. J. Chongqing Univ. Technol. Nat. Sci. 30(1), 77–84 (2016) 10. Xu Jinlin, F.: Research on face recognition algorithm based on improved LBP operator. An Hui University of Science and Technology (2018) 11. Jia Lei, F., Lu Xutao, S., Sun Yunqiang, T.: Facial expression recognition based on improved local binary pattern algorithm. Theory Method 37(10), 35–39 (2018)
Chapter 39
QR Code Digital Watermarking Algorithm Based on Two-Dimensional Logistic Chaotic Map Xiangjuan Ran and De Li
Abstract QR (Quick Response) codes, originally designed to facilitate tracking parts in automobile manufacturers, have been increasingly widely used in various fields, and gradually become an indispensable tool in people’s lives. However, with the development of network and multimedia technology, the pure QR code cannot effectively resist the two-dimensional code forgery and tampering attacks. Aiming at this shortcoming of QR code, this paper introduces digital watermarking technology into the protection of QR code image and proposes a digital watermark anticounterfeiting algorithm based on QR code binary image. According to the characteristics of the QR code binary image, the algorithm combines the MD5 (Message-digest Algorithm 5) encryption algorithm and the two-dimensional logistic chaotic mapping algorithm to embed the watermark, which improves the security of the QR code based on the correct identification. Experimental results show that the watermarking algorithm has strong concealment, robustness, and large watermark capacity.
39.1 Introduction With the rapid development of network technology and digital storage technology, image transmission becomes more and more convenient and fast, but this also makes the security of images greatly threatened. As an effective means to protect the copyright and integrity of digital products, multimedia digital watermark has become a research hotspot in the field of image security [1]. However, most of the existing digital watermarking algorithms are mainly for grayscale or color images, and the research of binary image watermarking is relatively rare. In addition, with the popularity of smartphones, QR codes have gradually occupied an important position in people’s lives. Among many QR codes, the QR code stands out because of its large capacity, strong error correction capability, and high-speed 360° reading. It has become the most widely used QR code. Moreover, since the QR code itself does X. Ran · D. Li (B) Department of Computer Science, Yanbian University, Yanji, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_39
349
350
X. Ran and D. Li
not carry anti-counterfeiting information and is easily copied and forged, the digital watermark anti-counterfeiting method based on the QR code has become a key issue for its wide application. In order to solve this problem, many scholars have combined digital watermarking technology with two-dimensional codes. Literature [2] proposed a QR bar code digital watermarking method based on LSB and its improved algorithm, but the QR code is essentially a binary image. The algorithm is not strictly an LSB algorithm, and its improved algorithm is aimed at grayscale images. Such processing increases the size of the QR code image, which is detrimental to transmission and storage. The digital watermarking method proposed in [3] for QR code is to pre-process it and then embed the watermark in the discrete cosine transform (DCT) domain. This method directly converts the binary image into a grayscale image through fuzzy noise addition processing and then performs watermark embedding operation on the grayscale image. Strictly speaking, it does not directly perform the watermark embedding operation on the QR code binary image. This paper proposes a digital watermark anti-counterfeiting algorithm based on QR code with strong robustness and large capacity. The algorithm uses the strong error correction capability of the QR code itself and combines the MD5 encryption algorithm and the two-dimensional logistic chaotic mapping algorithm to directly embed the watermark information in the spatial domain of the QR code binary image.
39.2 Research on Related Technology In order to improve the security and embedding capacity of the watermarking algorithm, it is usually necessary to combine various technologies. In this chapter, two kinds of related techniques are introduced. As an effective means to protect the copyright and integrity of digital products, multimedia digital watermark has become a research hotspot in the field of image security.
39.2.1 Message-Digest Algorithm 5 (MD5) MD5 algorithm, the fifth version of the message-digest summary algorithm. It is an irreversible algorithm that takes a message file of arbitrary length as input and performs a cyclic nonlinear function operation on the input according to the process described in Fig. 39.1 [4], HMD5 is the function that handles the MD5 grouping, and the output of the last round of HMD5 is the final ciphertext [5].
39 QR Code Digital Watermarking Algorithm Based on Two-Dimensional …
351
Fig. 39.1 The MD5 encryption process
39.2.2 Two-Dimensional Logistic Chaotic Map Chaos refers to a random phenomenon that occurs in deterministic systems. It does not converge but is bounded, and has extremely sensitive dependence on initial values [6]. Therefore, chaotic phenomena can be used to construct a very good information encryption system. Based on this, this paper chooses the logistic chaotic map for watermark embedding. American ecologist May proposed the logistic chaotic map in 1976 [7]. It is derived from the model of the number of insects and is a classical nonlinear chaotic dynamic system. The equation is as follows: xn+1 = μxn (1 − xn ), n = 0, 1, 2, · · ·
(39.1)
When μ ∈ [3.5699456, 4], xk ∈ (0, 1), the logistic map presents a chaotic state. However, the one-dimensional logistic mapping is a very simple chaotic map, and its security is not high. The two-dimensional logistic mapping has both the simple form of one-dimensional logistic mapping and the multi-parameter and multiparameter chaotic system [8]. Therefore, this paper uses Two-dimensional logistic chaotic mapping for watermark embedding. Its equation is as follows:
xn+1 = μλ1 xn (1 − xn ) + γ yn yn+1 = μλ2 yn (1 − yn ) + γ xn
(39.2)
Among them, x0 and y0 are initial values. μ, λ1 , λ2 , and γ are parameters that control the dynamic behavior of two-dimensional logic mapping. When μ = 4, λ1 = λ2 = 0.89, γ = 0.1, the system enters a chaotic state.
352
X. Ran and D. Li
39.3 Design of Digital Watermarking Algorithm Based on QR Code The QR code is essentially a binary image. The digital watermark anti-counterfeiting method based on the QR code embeds the watermark information in the binary image. The binary image is difficult to adopt the transform domain watermarking method because of its simple structure and small information redundancy. In order to improve the anti-counterfeiting performance of the QR code, this experiment uses the strong error correction capability of the QR code to embed watermark information in the spatial domain of the QR code binary image. The algorithm has a large watermark capacity and can extract clear and distinct watermark. Information is used for verification.
39.3.1 QR Code Digital Watermark Embedding Algorithm The digital watermark embedding a process based on the QR code is shown in Fig. 39.2, which is described as follows: Step1: Through the MD5 algorithm, the generated information of the twodimensional code is generated into the MD5 watermarking sequence, which is then converted into a binary sequence Wi of length N. Step2: The QR code binary image matrix with size M ×M and pixel P(i, j), i, j = 0, 1, 2, · · · , M −1 is mapped into a one-dimensional sequence Q(n) by the following formula, which is used as the carrier sequence for watermarking embedding. Q(n) = P(i, j), n = i + M j
Fig. 39.2 Digital watermark embedding process based on QR code
(39.3)
39 QR Code Digital Watermarking Algorithm Based on Two-Dimensional …
353
Step3: Set a key k, use the parameter μ=4, λ1 =λ2 =0.89, γ =0.1 to generate two identical chaotic sequences X n and Yn according to the formula (39.2), and then use the X n sequence to generate a position sequence Si according to the formula (39.4). Si =
M 2 − 1 X i , i = 0, 1, 2, · · ·
(39.4)
‘[ ]’means rounding, the first N positions Si , i = 0, 1, 2, · · · N − 1 which do not coincide with each other are taken as the embedding positions of the watermarking. Step4: According to the position sequence Si , the watermark sequence Wi is embedded in the carrier sequence Q(n), and the embedding formula is as follows:
Q (n) =
Q(n), n ∈ / {Si } Wi , n = Si , i = 0, 1, 2, · · · , N − 1
(39.5)
Step5: Finally, according to the formula (39.6), the one-dimensional sequence Q (n) after embedding the watermark is mapped into the two-dimensional pixel matrix P , and the QR code embedded in the watermark is obtained. P (i, j) = Q (n), n = i + M j
(39.6)
39.3.2 QR Code Digital Watermark Extraction Algorithm First, before extracting the watermark information, we need to decode the QR code through the decoder to ensure that the QR code can be decoded normally. And then we extract the watermark. The extraction process of digital watermarking algorithm based on the QR code is shown in Fig. 39.3, which is described as follows: Step1: Mapping the QR code binary image of QR watermark to one-dimensional sequence w Q(n) according to the formula (39.3).
Fig. 39.3 Digital watermark extraction process based on QR code
354
X. Ran and D. Li
Step2: First, use chaotic key k to generate chaotic sequence according to formula (39.2), then the position sequence is generated according to formula (39.4), and finally take the first N items that are not repeated in the sequence to form a new position sequence Si . Step3: According to the known watermark length N , the watermarked image sequence wQ(n), and the position sequence Si , the hidden watermark sequence is inversely extracted by the formula (39.5), then converted this sequence into a sequence of strings W , which is the last extracted watermark information.
39.4 The Experimental Results and Analysis In order to verify the algorithm proposed in this paper, the experiment selected the information of “Yanbian University’s beautiful scenery” to generate QR code binary image, and carried out the simulation experiment in MATLAB R2014b. The generated QR code image is shown in Fig. 39.4. Its version number is 6, the error level is Q, and the size is 90 * 90 (pixels). Through MD5 algorithm, the information of QR code—“Yanbian University’s beautiful scenery” will be generated into MD5 sequence value “4F85F0BA0B7BC3012C256FD919F6FE6A”. Then, according to the watermark algorithm embedding process in Chap. 3, the chaotic key k = 0.4 is selected for watermark embedding, and the obtained QR code of the watermark is shown in Fig. 39.5. The most important performance indicators for multimedia digital watermarking are concealment, robustness, and watermarking capacity. However, considering the special external form of the two-dimensional code, the digital watermark evaluation index applied to the two-dimensional code has a certain difference from the multimedia digital watermark in the usual sense. The following is a comparison of the classical algorithm used in the reference with the three indicators of the algorithm. Fig. 39.4 The QR code for experiment
39 QR Code Digital Watermarking Algorithm Based on Two-Dimensional …
355
Fig. 39.5 The watermarked QR code
As can be seen from Table 39.1, the concealment of the digital watermark based on the QR code means that the embedding of the digital watermark cannot affect the normal decoding of the QR code. In the verification process, we not only need to extract the digital watermark to prove the authenticity of the QR code, but also identify the text information in the QR code to further prove the legitimacy of the document. Therefore, for QR code binary images, image distortion is not the most important, as long as it does not affect decoding. Table 39.1 Digital watermark concealment comparison
Evaluation index Watermarking algorithm
Evaluation index
Performance
Reference [9]
Watermarking invisibility
Watermark visible
QR code normal recognition
Correct identification
Reference [10] Reference [11] Proposed algorithm
Table 39.2 Digital watermark robust comparison
Watermark visible
Correct identification
Evaluation index watermarking algorithm
Evaluation index
Performance
Reference [9]
Image resistance to geometric attacks
Weak
Reference [10] Reference [11] Proposed algorithm
Anti-counterfeiting ability
Weak Very strong Very strong
356 Table 39.3 Digital watermark capacity comparison
X. Ran and D. Li Evaluation index Watermarking algorithm
Evaluation index
Performance
Reference [9]
Watermark capacity
64bits
Reference [10]
64bits
Reference [11]
128bits
Proposed algorithm
1024bits
It can be seen from Table 39.2 that the robustness of the digital watermarking method based on QR code anti-counterfeiting is slightly different from the multimedia digital watermarking method. The robustness here refers to the ability to prevent illegal tampering or forgery of QR codes. The illegal user of the general multimedia digital watermark destroys the digital watermark from the digital media so that it cannot be detected and extracted, but the illegal user of the digital watermark based on the QR code anti-counterfeiting is to falsify or forge the QR code. The illegally forged QR code can’t be distinguished only by the human eye, but the real QR code not only generates the information that is determined by the user, but also contains the encrypted digital watermark information. These two points make it difficult for illegal users to forge or even impossible to forge at all. Table 39.3 shows that in both the digital media watermark and the QR code digital watermark, the watermark capacity is the same measure, and it is hoped that the bigger the better, without affecting normal use. Under the premise of decoding the QR code normally, the proposed algorithm has a watermark capacity of up to 1024 bits, far exceeding other algorithms. In summary, the QR code watermarking algorithm based on two-dimensional hyperchaos proposed in this paper has excellent characteristics and is of value.
39.5 Conclusion In view of the characteristics of QR codes and binary images, this paper proposes an algorithm for directly embedding QR codes in spatial domain. The algorithm firstly preprocesses the watermark information and the QR code binary image to generate their corresponding one-dimensional sequences, and then uses the chaotic sequence generated by the two-dimensional logistic chaotic key K to determine the position sequence. Finally, the embedding of the watermark is accomplished by replacing the watermark information at the corresponding location of the onedimensional sequence generated by the QR code image. The experimental results show that the two-dimensional Logistic chaotic map greatly improves the security of the algorithm. In addition, the embedding capacity of the algorithm can reach 1024 bits and decode QR codes normally, which far exceeds the embedding capacity
39 QR Code Digital Watermarking Algorithm Based on Two-Dimensional …
357
of many classical binary image watermarking algorithms [8, 9], and it is robust compared to the classical algorithm. Acknowledgements This research project was supported by the National Natural Science Foundation of China (Grant No. 61262090)
References 1. Zhang, J., Zhang, C.T.: Summary of digital watermarking techniques for binary images. Comput. Eng. 31(3), 1–3 (2005) 2. He, X.L., An, H., Zhang, W.J.: QR barcode digital watermark based on improved LSB algorithm. Comput. Inf. Technol. 10, 1–4 (2010) 3. Li, L., Wang, R.L.: A digital watermarking algorithm for QR code. J. Hangzhou Dianzi Univ. 31(2), 46–49 (2011) 4. Ingale, S., Mehta, M.: Characterizing suspicious images in social media using exif metadata. Int. J. Adv. Comput. Eng. Netw. 50, 668 (2013) 5. Xu, R.: Research and application of MD5 encryption algorithms. China New Commun. 17(21), 72 (2015) 6. Xu, B., Yuan, L.: Research on digital image encryption algorithms based on improved logistic chaotic mapping. Comput. Meas. 22(7), 2157–2159 (2014) 7. Chen, S.X., Peng, J., Li, F.W.: Fragile digital watermarking based on two-dimensional logistic chaotic map DWT. Image Process. Multimed. Technol. 32(6), 38–43 (2013) 8. Zhu, H.G., Lu, X.J., Zhang, X.D., Tang, Q.S.: Image encryption based on two-dimensional logistic mapping and quadratic residual. J. Northeast. Univ. Nat. Sci. 35(1), 20–23 (2014) 9. Tzeng, C., Tsai, W.: A new approach to authentication of binary images for multimedia communication with distortion reduction and security enhancement. IEEE Commun. Lett. 7(9), 443–445 (2003) 10. Wu, M., Liu, B.: Data hiding in binary image for authentication and annotation. Trans. Multi. 6(4), 528–538 (2004) 11. Xie, R.S., Zhao, H.X., Chen, Y.M.: Digital Watermarking Method of anti-counterfeiting electronic ticket based on QR code. J. Xiamen Univ. Nat. Sci. 52(3), 338–342 (2013)
Chapter 40
A Software Watermarking Algorithm Based on Instruction Encoding and Resource Section Format Zhen Chen and De Li
Abstract In view of the poor robustness of the current software watermarking algorithm based on the portable executable files, a software watermarking scheme based on instruction encoding and resource section is proposed. The main idea is to use the code section instruction equivalent replacement principle and resource section format to hide secret information in the executable file. The paper gives the analysis of the code section and the resource section, and describes the process of embedding and extracting software watermarking. This paper analyzes and compares the concealment and the hidden capacity of this algorithm, and carries out deletion attack, optimization attack, and tampering attack. The algorithm can resist the first two kinds of attacks, and the probability of being tampered with is less than 4%.
40.1 Introduction Nowadays, the problem of software piracy is becoming more and more serious. In order to solve these problems, software watermarking is an attempt in this aspect. Software watermark is a kind of digital watermark, which can be used for copyright authentication by embedding important information such as copyright and ownership [1]. At present, the carrier of software information hiding is mostly the file source code, and there are a few researches on using executable format file as the carrier to hide watermark information. Literature [2] proposed an information hiding algorithm for rearranging the import table data structure of executable files. The executable file import table data structure has its default order, and changing its order will cause the attacker’s suspicion and the security is not high. Literature [3] proposed a watermarking algorithm based on the redundant space of executable files, and the hidden capacity can be infinite. But the concealment is not good. In order to improve the concealment of the watermark embedded in the executable file and resist various common attacks, this paper proposes a software watermark Z. Chen · D. Li (B) Department of Computer Science, Yanbian University, Yanji, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_40
359
360
Z. Chen and D. Li
algorithm based on instruction encoding and resource section format. Theoretical analysis and experimental results show that this algorithm not only has a large hidden capacity and good concealment, but also can resist a variety of common attacks.
40.2 Research on Related Technology The dynamic link library (dll) and executable (exe) files are constructed by Portable Executable (PE) file format. The PE file uses a flat address space, and all the code and data are combined to form a large organizational structure. The contents of the file are divided into different sections, which contain code and data. This algorithm is to embed hidden information in the code section and the resource section.
40.2.1 PE File Code Section Analysis The code section (.text) of the PE file is used to store the instructions generated by the compiled source code [4–6]. Choose some suitable instruction pairs, some of which represent binary 1, and some of which represent binary 0. By analyzing the characteristics of the instructions, we coded that “add” represents 1 and “sub” represents 0. Suppose you have the following code snippet:
05
00100000
add
eax
1000
83 EB
01
sub
ebx
1
According to the above coding scheme, the representative information is 10. If the watermark information to be embedded is 01, the code is replaced with
2D
00F0FFF
sub
eax
−1000$
83C3
FF
add
ebx
−1
This completes the embedding of the watermark information 01.
40 A Software Watermarking Algorithm Based on Instruction …
361
40.2.2 PE File Resource Section Analysis The resource section (.rsrc) is used to store all resources of the module, such as bitmap, dialog, icon, sound, cursor, and toolbar [7–9]. The structure of bitmaps, icons, and cursors is similar. The structure is shown in Fig. 40.1. The color table stores a color value every 4 bytes. The data is the index to the color table [10].
40.3 A Software Watermarking Algorithm Based on Instruction Encoding and Resource At present, software watermarking with an executable file as the carrier can be divided into two categories: the first is to use the rearrangement of the file structure to hide the watermark information [11], the second is to embed the watermark information into the redundant space of the file [12]. Among them, the first kind of software watermarking algorithm has good concealment, can resist a variety of common attacks, and has good robustness, but the embedding capacity is limited, and cannot resist tampering attacks. The amount of information embedded in the second type of software watermarking algorithm can be large enough, but the concealment is low and the robustness is poor. In order to improve the concealment of the watermark embedded in PE files and resist various common attacks, this paper proposes a software watermark algorithm based on instruction encoding and resource section format. When embedding, first encrypt the watermark information under the effect of the key K and transform it into the binary form. The key of the Data Encryption Standard (DES) algorithm is 64 bits, and the key of the Advanced Encryption Standard (AES) algorithm is at least 128 bits. In the algorithm of this paper, the watermark information is hidden in the code segment by means of instruction encoding. The concealment is very good and the possibility of being discovered is very low, so the general encryption method can meet the requirements. When important information such as the key is hidden in the resource segment, the increase of the key length will inevitably affect the concealment, so the DES algorithm is finally selected. Then disassemble the code Fig. 40.1 Resources structure
362
Z. Chen and D. Li
snippet. Through analysis, find the encoded instructions, read the sequence they appear in and change the sequence to achieve the watermark information embedding and extraction. Then hide the important information such as key K in the resource section. Select a resource in the resource section. Copy the color table to get an extended color table, hide the important information in the data area, hide 0 with the original index, and hide 1 with the extended index.
40.3.1 Watermark Embedding Algorithm The watermark embedding process is shown in Fig. 40.2, which is mainly divided into the following steps: Step1: Encrypt the watermark information by the DES algorithm under the control of the key K; Step2: In order to improve concealment, the watermark information is embedded from a random position in the code section. (1) Generate a random number: use time as a seed, use the “srand” function to generate a random number R, and determine the random position based on the random number. (2) N is the total number of all encoded instructions in the code section. L is the length of the watermark. “%” is the remainder symbol. The random starting position S is described as follows: S = R%(N − L)
(40.1)
Step3: Use c32asm to disassemble the PE file and find the encoded instructions “add”, “sub”;
Fig. 40.2 Watermark embedding algorithm for PE Files
40 A Software Watermarking Algorithm Based on Instruction … Table 40.1 Machine code table
363
Instruction
Opcode (single byte)
Opcode (double byte)
add
0000010?
100000?? ??000???
sub
0010110?
100000?? ??101???
Step4: Specify instruction encoding rules: Specify “add” for 1, “sub” for 0. Instruction replacement requires the use of the machine code table, as shown in Table 40.1. “Add”, “sub” have a single byte and double byte, “?” indicates that this position can be 0 or 1; the difference is as follows: “Add” and “sub” are replaced with each other as follows: Immediate = 0 −Immediate if (Opcode >= 0 × FF) Opcode[1]∧ = 0 × 28 else Opcode[0]∧ = 0 × 28
(40.2)
Step5: Read the resource section in the PE file, using the way of extending the color table to hide K, L, and S in the resource section. Copy a color table to get an extended color table, the “oplength” is the index value, the “wx” is the new data, and the “count” is the number of bits when K, L, and S are converted to binary numbers. The embedding process is as follows: for i = 1:count if msg = 1 wx(row(i), col(i)) = x(row(i), col(i)) + oplength else wx(row(i), col(i)) = x(row(i), col(i))
(40.3)
40.3.2 Watermark Extraction Algorithm The algorithm steps are as follows: Step1: Disassemble the PE file with watermark and read the resource data in the resource section. If the data is greater than the original index, get 1, and if less than the original index, get 0, and as shown, get the key K, the watermark length L, and the random position S embedded in the resource section. for i = 1 : count if x(row(i), col(i)) > oplength msg = 1 else msg = 0
(40.4)
Step2: Starting from the random position S of the code section, the following L encoded instructions are read to obtain the binary information flow.
364
Z. Chen and D. Li
Fig. 40.3 Cursor comparison before and after watermark embedding: a The original cursor; b Watermarked cursor
Step3: DES decryption is performed according to the key K, and the final watermark information is extracted.
40.4 Experimental Results and Analysis 40.4.1 Experimental Environment and Pretreatment In order to verify the feasibility of the algorithm, the selected PE file is QQPCDownload, and the simulation experiment was carried out under MATLAB R2010b and analysis software c32asm. The embedded digital watermark is “chenzhen”. The selected resource of the resource section is the cursor resource. Hide K, L, and S in the cursor resource of the resource section by using the extended color table. The cursor did not change before and after embedding, as shown in Fig. 40.3. And the software works fine.
40.4.2 Performance Analysis The evaluation of PE file information hiding technology mainly depends on three indexes: concealment, hidden capacity, and robustness.
40.4.2.1
Concealment
View the size of each section before and after embedding watermark with the section table analyzer, as shown in Table 40.2. The size of each section does not change, and the overall size of the PE file does not change, so the algorithm has good concealment.
40 A Software Watermarking Algorithm Based on Instruction …
365
Table 40.2 The information comparison of sections Section
Virtual size
Virtual offset
Original size
Original offset
Eigenvalues
Before
.text
0003D831
00001000
0003E000
00001000
60000020
After
.text
0003D831
00001000
0003E000
00001000
60000020
Before
.rsrc
00004F20
00054000
00005000
00050000
40000040
After
.rsrc
00004F20
00054000
00005000
00050000
40000040
Table 40.3 The algorithm performance comparison .text
.rsrc
Method
Encoding
Redundant
LSB
Unused colors
Extended color table
Capacity (bit)
2109
128
256
256
256
40.4.2.2
Hidden Capacity
There are 2109 encoded instructions in the code section, which can embed binary information stream of 2109 bits. This is sufficient for applications such as copyright authentication that does not require too much embedding capacity. K, L, and S are 120 bits in total. The capacities of several watermark embedding methods about cursor resources are shown in Table 40.3. They all meet the capacity requirements of hiding these important information.
40.4.2.3
Robustness
Common attacks against PE files include deletion attack, optimization attack, tampering attack, etc. (1) Deletion attack In this algorithm, the watermark information is embedded in the code of the software itself. Deleting the watermark means deleting the logic of the software itself, and the software will not be able to operate or its functions will become abnormal, as shown in Fig. 40.4. (2) Optimization attack The optimization attack tool is PEditor. The hexadecimal comparative analysis tool UltraCompare is used to compare the code section and the resource section data before and after the attack, and some data screenshots are shown in Fig. 40.5. Because the comparative analysis tool did not mark the data red, the data of the code section and the resource section did not change, the watermark information was not destroyed.
366
Z. Chen and D. Li
Fig. 40.4 Abnormal operation of PE files
Fig. 40.5 Code segment data comparison
Table 40.4 The ratio table of encoded instructions Test software
Size
chongpai.exe
25.7 kb
All instructions 904
Encoded instructions 32
Proportion (%) 3.5
iku_setup
3.81 M
7547
101
1.3
uc_chinese
43.4 M
10485
323
3.1
(3) Tampering attack View the small, medium, and large types of software, analyze the ratio of encoded instructions to the total number of instructions in the code section, as shown in Table 40.4. The statistics found that the hit rate is only less than 4% in the extreme case where the watermark length is approximately equal to the number of encoded instructions. The probability of this random tamper attack causing damage to the watermark is very low.
40.5 Conclusion This paper proposes a software watermarking scheme based on instruction encoding and resource section format by analyzing the code section and the resource section of a PE file. The algorithm implements the watermark embedding using the equivalent replacement of specific instructions in the .text section, and hides the key K, the watermark length L, and the random position S in a more concealed resource
40 A Software Watermarking Algorithm Based on Instruction …
367
section. Experimental results show that although the hidden capacity of this algorithm is smaller than that of the scheme based on redundancy, it is enough to meet the requirements of the embedding capacity of copyright authentication; The embedding position of the watermark in the code section is selected randomly. After the embedding, the overall size of the software and the size of each section do not change, so the concealment is good. The algorithm can resist deletion attacks, optimization attacks, and the proportion of random tampering attacks is less than 4%. Acknowledgements This research project was supported by the National Natural Science Foundation of China (Grant No. 61262090).
References 1. Feng, T.W.: Research on spread spectrum watermarking of software protection. Wuhan University of Technology, Wuhan, Hubei (2013) 2. Log, F.Y., Liu, J.Y., Yuan, X.: Software watermark based on structure transform of PE file import table. J. Comput. Appl. 30(1), 217–219 (2010) 3. Li, Q., Fang, Y., Tan, D.L., Zhang, C.S.: New design for information hiding without capacity limit in PE files. Appl. Res. Comput. 28(7), 2758–2760 (2011) 4. Kim, H., Khoo, W.M., Lio, P.: Polymorphic attacks against sequence-based software birthmarks. Digit. Asset Prot. Assoc. 15(6), 77–85 (2012) 5. Zhu, T.M., Liu, J.Y.: Information hiding scheme of pe file format based on code moving. Commun. Technol. 43, 184–186 (2010) 6. Ru, S.T.: A software watermark scheme based on the instruction encoding. Jinan University, Guangzhou, Guangdong (2012) 7. Wei, C.C.: Research on digital image steganography and steganalysis. J. China Comput. Commun. 12, 213–215 (2008) 8. Tian, Z.W., Li, Y.F., Liu, Y.: Research of PE file information hiding based on import table migration. J. Comput. Sci. 43(1), 207–210 (2016) 9. Xu, X.J., Xu, X.Y., Liang, H.H.: Watermark algorithm based on PE file resources section format. J. Comput. Eng. Des. 28(23) 5802–5804 (2007) 10. Chen, G.: Research on software watermark based on pe file. Hunan University, Changsha, Hunan (2008) 11. Tamada, H., Nakamura, M., Monden, A., et al.: Design and evaluation of birthmarks for detecting theft of Java programs. In: Proceedings of the International Conference on Software Engineering, pp. 569–575 (2004) 12. Ceccato, M., Tonella, P.: Codebender: Remote software protection using orthogonal replacement. IEEE Softw. 28(2), 28–34 (2011)
Chapter 41
High-Security Image Watermarking with Multilevel Embedding Using Various Image Transform Techniques Ch. Hima Bindu
Abstract Digital image watermarking techniques are used to provide authentication for secured data transmission. In this paper, multilevel embedding approach is addressed for high security. Initially the cover image is converted into YUV model. Now apply Discrete Wavelet Transform (DWT) on Y component of cover image and watermark images. In further steps, Discrete Fourier Transform (DFT) and Singular Value Decomposition (SVD) are applied sequentially on LH component of Y and watermark images for high security. Finally embedding process is done on these details using scaling factor. At extraction stage, inverse of embedding process is carried to retrieve the cover image. The proposed method is good based on qualitative and quantitative analysis than the existing methods like DWT, DWT-SVD, etc.
41.1 Introduction The major problem in information technology is unauthorized accessing of data during transmission. This can be overcome by introducing multi-layer authentication process to access the data. The original content of information is hidden with cover image with various levels of processing. This technique is named as image watermarking. Image watermarking allows hiding of watermark (information) image into cover image. With this process the image can be transmitted securely. The DWT transform-based approach is used in many image processing applications like image watermarking, image compression, etc. In DWT [1] the image is decomposed into four parts: LL, LH, HL, and HH. Singular value decomposition [2] decomposes a symmetric matrix into submatrices. For example, A is symmetric matrix A = USVT. Here S represents rectangular diagonal matrix and the elements are singular values of matrix A, U and V are transformed coefficients of image. DFT [3] decomposes an image into sequence of equally spaced samples with different coefficients.
Ch. H. Bindu (B) Department of ECE, QIS College of Engineering & Technology, Ongole, AP, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_41
369
370
Ch. H. Bindu
The important characteristics of watermarking are as follows: Data Payload, Robustness, Security, Imperceptibility, Cost, Fragility. Medical image watermarking needs more caring in watermarking process. The reason is the process should not lead to affect the originality of watermark image. The watermarking can be carried out in two ways: Visible, Invisible. The only difference between these techniques is the availability of content at retrieval side. The rest of the paper is organized with survey, proposed work, and results.
41.2 Literature Survey Wang et al. [4] proposed watermarking with ASIFT aligned with geometric attacks. This paper produces a great balance between robust and imperceptibility and also against different attacks like salt and pepper, JPEG, etc. Haribabu et al. [5] proposed watermarking model on HSI color space with wavelet transform. This paper produces improvement of watermarking scheme using DWT based on the scaling factor. Jain et al. [6] proposed a robust watermarking technique for textured images. This paper declares that the method of watermarking is good to provide security from unauthorized persons. Cui et al. [7] proposed an optimized watermarking using wavelet based on differential evolution for color image. This paper uses a new differential evolution algorithm for optimization values using correct scaling factors. Escalante-Ramírez et al. [8] proposed image watermarking using wavelet domain. This paper provides accurate result, which is robust and imperceptible. Selvakumari et al. [9] proposed image watermarking scheme with Hermite Transform and Brightness Model. This paper uses Hermite transform for better security. Sharma et al. [10] proposed using visible and invisible watermarking algorithms for indexing medical images. This paper produces an enhancement type of image watermarking. Hamidreza Zarrabi et al. [11] proposed watermarking algorithm with adaptive prediction method. This paper produces better performance results than the existing ones. AlShaikh et al. [12] proposed watermarking for HIS using wavelet domain. This is the superior method for providing in terms of capacity and distortions.
41.3 Proposed Method This work addresses the process of embedding and extraction by showing in flow charts and algorithms.
41.3.1 Embedding Algorithm 1.
Consider the RGB cover image as (C) and the watermark image as (W).
41 High-Security Image Watermarking with Multilevel Embedding …
2. 3.
Convert the RGB model cover image C into YCb Cr model. Perform 1-level DWT on the converted YCb Cr image and watermark image W to get the 4 sub-bands like LL, LH, and HL and HH from those images. [LL1 , LH1 , HL1 , HH1 ] = DWT(C) [LL2 , LH2 , HL2 , HH2 ] = DWT(W )
4.
(41.5)
Apply inverse SVD to the watermarked coefficient SS. A = U1 ∗ SS ∗ V1
9.
(41.4)
The singular values of the watermark image are embedded into the cover image singular values with some scaling factor α to get the watermarked coefficient SS by using the following equation: SS = (S1 + α ∗ S2 )
8.
(41.3)
The df1 and df2 coefficients of both the images are transformed into singular values by using SVD transform. [U1 S1 V1 ] = SVD(d f 1 ) [U2 S2 V2 ] = SVD(d f 2 )
7.
(41.2)
Apply DFT to the sub-bands HL12 and HL22 of the cover image and watermark images. [d f 1 ] = DFT(HL12 ) [d f 2 ] = DFT(HL22 )
6.
(41.1)
Once again apply the same 1-level DWT transform on the obtained sub-bands HL1 , HL2 from step 2. [HL11 , HL12 , HL13 , HL14 ] = DWT(HL1 ) [HL21 , HL22 , HL23 , HL24 ] = DWT(HL2 )
5.
371
(41.6)
Take inverse DFT of the above watermarked coefficient A. A1 = IDFT(A)
(41.7)
10. Then apply 2-level inverse DWT to the above watermarked coefficient A1 to get the original watermarked image. WD1 = IDWT(LL2 , A1 , HL2 , HH2 )
372
Ch. H. Bindu
WD = IDWT(LL1 , WD1 , HL1 , HH1 )
(41.8)
41.3.2 Extraction Algorithm 1. Convert the RGB model watermarked image into YCb Cr image. 2. Consider the 2-level DWT of the watermarked image. WD ∈ (LLW 1 , LHW 1 , HLW 1 , HHW 1 ) HLW 1 ∈ (LLW 2 , LHW 2 , HLW 2 , HHW 2 )
(41.9)
3. Apply the DFT to LHW2 sub-bands. d f 3 = DFT(LHW 2 )
(41.10)
4. Take the singular values of Fourier transform coefficient df3 . [U3 S3 V3 ] = SVD(d f 3 )
(41.11)
5. The original watermark image can be extracted by using the following equation: W3 = (S3 − S1)/α
(41.12)
6. Take Inverse SVD of watermark coefficient. A2 = [U2 ∗ W3 ∗ V2 ]
(41.13)
7. Apply the inverse DFT to the watermark coefficient. A3 = IDFT(A2 )
(41.14)
8. Take the inverse DWT of the above coefficients to get the original watermark image W. W1 = IDWT(LL4 , A3 , HL4, HH4 ) W = IDWT(LL2 , LH2 , W1 , HH2 )
(41.15)
41 High-Security Image Watermarking with Multilevel Embedding … Fig. 41.1 a Embedding process. b Extraction process
373
a Watermark image
Cover image
rgb2ycbcr Y
LL
LH
LL
LH2
HL
HL
HH
HH
LL
LH
LL
LH
HL
HL
HH
HH
DFT
DFT
SVD
SVD
SS=S1+ (α*S2)
ISVD
IDFT 2 level IDWT
Watermarked image
b
Cover image
Watermarked image
rgb2ycbcr
rgb2ycbcr
Y
Y LLw
LLw
LHw
LHw
HLw
HLw2
LL
LH
LL2
LH
HHw
HHw
DFT
HL
HL
DFT
SVD= (Uw, Sw, Vw)
SVD=(U1, S1, V1)
SS1=(Sw-S1)/α
ISVD
IDFT 2 level IDWT
Extracted Watermark
HH
HH2
374
Ch. H. Bindu
41.4 Experimental Results This paperwork has simulated in MATLAB software. The 512 × 512 and 256 × 256 are color cover and gray level watermark images, respectively. The host images watermark, embedding and extracted images are displayed in Fig. 41.2. The graphs and tables of performance measures (PSNR and MSE) are given in Fig. 41.3 and Table 41.1. The value of PSNR should be high and given in db, when the watermarked image and cover image are similar, MSE should also be minimum [5]. PSNR(dB) = 10 ∗ log10 MSE =
(255)2
1 M×N
M−1 N −1 m=0
n=0
(C(m, n) − WD(m, n))2
M N 1 [IN(m, n) − WD(m, n)]2 m=1 n=1 M×N
(41.16) (41.17)
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 41.2 a, e, i the cover images; b, f, g watermark images; c, g, k the watermarked images; and d, h, l watermark extraction images
41 High-Security Image Watermarking with Multilevel Embedding …
80 60 40 20 0
375
PSNR MSE Ref [5]
Ref [12]
Proposed method
80 60 40 20 0
PSNR MSE Ref [5]
Ref [12]
Proposed method
Fig. 41.3 Graphical representation of PSNR and MSE values comparison with Refs. [5, 12] work
Table 41.1 Performance results in terms of PSNR and MSE
Images
Techniques
Air
Reference [5]
42.6519
3.5310
Reference [12]
53.6332
0.2 9.1436e-04
Lena
Bindu
PSNR
MSE
Proposed method
70.5197
Reference [5]
45.0833
2.0172
Reference [12]
59.4482
0.0409 0.0012
Proposed method
71.5101
Reference [5]
44.5579
2.2766
Reference [12]
58.9159
0.0712
Proposed method
70.7880
0.0017
41.5 Conclusion The specialty of watermarking scheme is using DWT; SVD and DFT techniques are involved for embedding and extraction process. This proposed method possesses high security using multilevel embedding using various transforms like DWT, DFT, and SVD. Initially, the watermark and cover image contents are preserved by applying DWT. For more security, multi-level transform is applied to the above coefficients before embedding. At embedding process, the scaling factor value is varied from 0 to 1 randomly. The proposed method is more effective than the DWT, and quality parameters prove the effectiveness of the watermarking process. The PSNR and MSE values in this paper are above 70 and near to zero, respectively. Future work will focus on advanced transformation techniques to improve reliability and robustness. In further work, the reliability can be checked with by applying various attacks on the technique.
376
Ch. H. Bindu
Acknowledgements The author wants to extend her sincere thanks to the AICTE for supporting through Travel Grant (Ref: F.No.: 1-3561691684-AICTE: Travel grant) to attend the conference.
References 1. Thirumaraiselvia, C., Sudhakara, R., Priyadharshinia, G., Kalaivania, G.: Embedding patient information in medical images using watermarking approach for authentication. South Asian J. Eng. Technol. 2(18), 33–40 (2016) 2. Shaoli, J.: A novel blind color images watermarking based on SVD. Optik Int. J. Light Electron 2868–2874 (2014) 3. Ningombam, J., Hemachandran, K.: DFT based digital image watermarking: a survey. Int. J. Advan. Res. Comput. Sci. 9(2), 540–544 (2018) 4. Wang, C., Zhang, Y., Zhou, X.: Robust image watermarking algorithm based on ASIFT against geometric attacks. Appl. Sci. 1–19 (2018) 5. Haribabu, M., Hima Bindu, Ch., Veera swamy, K.: A secure & invisible image watermarking scheme based on wavelet transform in HSI color space. Proc. Comput. Sci. 93,462–468 (2016) 6. Jain, P., Ghanekar, U.: Robust watermarking technique for textured images. Proc. Comput. Sci. 125, 179–186 (2018) 7. Cui, X., Niu, Y., Zheng, X., Han, Y.: An optimized digital watermarking algorithm in wavelet domain based on differential evolution for color image, PLOS One 1–15 (2018) 8. Escalante-Ramírez, B., Gomez-Coronel, S.L.: A perceptive approach to digital image watermarking using a brightness model and the hermite transform. Math. Probl. Eng. 1–20 (2018) 9. Selvakumari, J., Suganthi, J.: Using visible and invisible watermarking algorithms for indexing medical images. Int. Arab J. Inf. Technol. 15(4), 748–755 (2018) 10. Sharma, C., Singh, A., Kachera, M.S..: An invisible, reversible, robust and efficient image watermarking algorithm based on adaptive prediction method. Int. Res. J. Eng. Technol. 5(5), 3980–3984 (2018) 11. Zarrabi, H., Hajabdollahi, M., Soroushmehr, S.M.R., Karimi, N., Samavi, S., Najarian, K.: Reversible image watermarking for health informatics systems using distortion compensation in wavelet domain. In: International Conference on Engineering Medical Biology, pp. 798–801. IEEE (2018) 12. AlShaikh, M., Laouamer, L., Nana, L., Pascu, A.: A novel CT scan images watermarking scheme in DWT transform coefficients. Int. J. Comput. Sci. Netw. Secur. 16(1), 62–71 (2016)
Chapter 42
The Research and Design of Virtual Reality Technology-Based Teaching Cloud Platform Dacan Li, Yuanyuan Gong, Dezheng Li, Qingpei Huang, and Guicai Feng
Abstract At present, educational informatization has become an inevitable choice for the development of the times. As a new teaching method combining virtual reality technology with traditional education, “VR education” has been recognized and favoured by more and more educational circles. This paper mainly analyses the design process of VR-based teaching cloud platform. The platform provides a new way of thinking, new method and new means for the transformation of new educational methods such as “interactive” learning and “flipped classroom” into immersive practical teaching. It will certainly play a positive role in promoting the reform and development of teaching.
42.1 Introduction The combination of VR technology and education is considered to be the development trend of the education industry. The advantages of VR technology in the field of education are embodied in time, space, educational resources and ways of thinking. In the traditional education mode, due to the factors of long teaching time, small experimental operation space and limited experimental teaching funds, the teaching materials and reference models can only be as close to the truth as possible [1, 2]. However, the combination of daily education and VR technology, through three-dimensional modelling and scene building, can provide learners with a nearly realistic teaching model and enable them to experience teaching scenes that are difficult to experience in the real world. Therefore, the education industry urgently needs to design and develop a teaching cloud platform based on virtual reality technology, through VR technology to achieve 3D simulation teaching, to give learners intuitive experience, thereby improving teaching efficiency.
D. Li · Y. Gong (B) · D. Li · Q. Huang · G. Feng Shiyuan College of Nanning Normal University, Nanning 530226, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_42
377
378
D. Li et al.
42.1.1 The Concept of Virtual Reality Virtual Reality is a three-dimensional virtual world generated by computer simulation, which provides users with visual and other sensory simulation [3], so that users can feel as if they are experiencing the scene, and can observe things in threedimensional space in real time and without restrictions [4, 5]. The underlying belief motivating most virtual reality (VR) research is that this will lead to more natural and effective human–computer interfaces.
42.1.2 Virtual Reality Technology With the development of computer technology, graphics processing, stereo image and audio technology, information synthesis and display technology and many other technologies, virtual reality technology is gradually emerging. Virtual reality technology uses three-dimensional spatial performance and human–computer interaction to bring people an immersive experience, which has fundamentally changed the dull, rigid and passive interaction between human and computer, and created a new field for human–computer interaction [6]. Virtual reality is a synthesis of many technologies. The related technologies mainly include the following aspects, as shown in Fig. 42.1. Fig. 42.1 VR-related technology
05 Stereoscopic Display and Sensing Technology
01 Environment al Modeling Technology
02 Threedimensional Interactive Technology
VR Technology
04 System Integration Technology
03 Image Rendering Technology
42 The Research and Design of Virtual Reality …
379
42.1.3 The Characteristic of Virtual Reality Virtual reality technology creates a new interactive system by simulating the threedimensional world. It has the characteristics of 3I, namely, Imagination, Immersion and Interactivity [7], as shown in Fig. 42.2. Imagination: Imagination emphasizes that virtual reality technology has a broad imaginable space, which can broaden the scope of human cognition. It can not only reproduce the real environment, but also reproduce the non-existent environment [8]. Immersion: Immersion is that the user can completely immerse in the virtual environment generated by the computer, and the user has a sense of immersion in the virtual scene. What users see, hear, touch and even smell is exactly the same as what they feel in the real environment. This is the core element of virtual reality [9]. Interaction: Interaction refers to the ability of users to interact with various objects in a virtual scene, which is the key factor of human–computer harmony. After users enter the virtual environment, they interact with the multidimensional information environment through a variety of sensors [10–12]. When users perform necessary operations, the virtual environment responds accordingly, as in the real world.
42.2 Design of VR Teaching Cloud Platform VR teaching cloud platform is an information management system applied to virtual reality classroom teaching. With HTC VIVE equipment and other hardware equipment, virtual reality teaching content, teachers can achieve global control, precise guidance and effective evaluation of teaching. Fig. 42.2 Three characteristics of VR
Imagination 01
VR 02
03
Immersion
Interactivity
380
D. Li et al.
Fig. 42.3 VR teaching cloud platform content publishing model diagram
42.2.1 Frame of VR Teaching Cloud Platform VR teaching cloud platform can share VR educational applications developed by teachers and third parties with schools around the world. The platform can provide teachers with real-time feedback on students’ learning status, including the use of data, interactive results, changes in focus of attention and other data. Through the analysis of students’ real-time behaviour and historical records, it can provide teachers with intelligent, customized and personalized teaching suggestions. The user scenarios of VR teaching cloud platform are virtual reality general course teaching, practical course teaching, virtual reality innovation and entrepreneurship skills training. In addition, it also includes virtual reality teaching, general skills education (such as safety and labour) in K12 schools. VR teaching cloud platform covers teachers’ desktop software, students’ desktop software, classroom central control server solutions and so on. The cloud platform content publishing model is shown in Fig. 42.3.
42.2.2 Logical Architecture Design The system architecture of the virtual reality teaching cloud platform mainly includes display layer, business logic layer and data layer, as shown in Fig. 42.4. Display layer: It mainly provides interactive pages between users and cloud platform. Business layer: Mainly realizes the functional requirements of the teachers and students.
42 The Research and Design of Virtual Reality …
Display layer
381
User login page
Other interactive pages
VR Cloud Courseware Library Management Examination and Answer Real-time monitoring of student machine status Teacher-side One Key Synchronization Control of Teaching Progress
Business Level
Multi-Sound Source Acoustic Fusion in Teaching Scene Real-time capturing of students' head-up pictures Student-side Panel Control Student-side Student-side curriculum management
Data layer
Cloud Courseware Library
Teacher Information Base
Student Information Base
Question database
Fig. 42.4 Logical architecture design diagram
Data layer: It mainly includes VR teaching cloud courseware database, test questions database, teachers and students information database and other databases.
42.2.3 Functional Structure Design VR virtual reality teaching cloud platform mainly includes teacher-side and studentside functions. 1. Teacher side VR Cloud Courseware Library Management: Teachers choose VR cloud courseware to teach according to the actual content of the course. After selecting the course package, teachers can view all the courseware contents in this area. Examination and Answer: Examination and Answer function is used to show the learning effectiveness of the current class, including the answer situation of all students in the VR courseware. Real-time monitoring of student machine status: This function can help teachers monitor the status of each student machine in real time during class, find problems in time and deal with them. It is an intelligent teaching assistant for teachers.
382
D. Li et al.
One Key Synchronization Control of Teaching Progress: Teachers can synchronously open the designated courses, suspend the course operation, switch to the designated scenes in the course and finish the course in all the students’ heads, so that teachers can still be the dominant part of the whole teaching process in the virtual reality teaching environment. Multi-Sound Source Acoustic Fusion in Teaching Scene: In the process of virtual reality learning, students need to acquire both background sound of the VR content and teacher’s explanation. The VR teaching platform adopts the technology of multi-source volume equalization and sound effect fusion. It combines the teacher’s explanatory voice with the background sound of the VR content, so that students can hear two kinds of voices clearly in the earphone at the same time, without delay and break. Real-time capturing of students’ head-up pictures: The platform uses real-time capturing technology of virtual reality head display image and high-speed video transmission technology, which enables teachers to see all the pictures currently seen by all students in the head display, and even to view the operation details of a student in full screen. 2. Student-side Student-Side Panel Control: During the operation of the content, students can adjust the volume of teachers and the volume of courses in the volume adjustment area of the right-side panel of the interface to achieve better multi-source volume balance. Student-Side Curriculum Management: It mainly includes VR cloud courseware learning, VR courseware examination function; students can learn and test according to their actual needs.
42.3 Design of Network Topology Structure When we finished developing the system, we also need to arrange the hardware environment, for instance, the classroom. Each standard classroom contains a teacher’s computer, 20 students’ computers, a server and a switch. Each computer is connected to a set of head-mounted devices. Teachers’ computers, 20 students’ computers and servers are connected through switches. The server retrieves data from the cloud. The network topology of the VR teaching cloud platform is shown in Fig. 42.5.
42.4 Conclusion The immersion teaching product produced by the combination of the education industry and the VR technology is the trend of the times. The VR teaching cloud platform designed by our team provides a platform for students to play their own roles. Compared with traditional teaching platform, this new teaching method provides learners
42 The Research and Design of Virtual Reality …
383
Fig. 42.5 Network topology structure design diagram
with a nearly realistic teaching model and enables them to experience teaching scenes that are difficult to experience in the real world. Besides, this method also provides a new way of thinking, new method and new means for the transformation of new educational methods such as experimental “interactive” learning and “flipping classroom” into immersion practical teaching. It will certainly play a positive role in promoting the reform and development of teaching, and also have great significance for the development of the education industry.
References 1. Jiang, H.L.: Exploration of the integration of virtual reality technology and industry in higher vocational education. Comput. Knowl. Technol. 13(18), 210–211 (2017) 2. Shi, S.L.: Application of virtual reality in education. Course Educ. Res. 4(32), 205–206 (2017) 3. Liang, R.N.: Visualization analysis of research on virtual reality technology in the field of domestic education. J. Langfang Teach. Univ. Nat. Sci. Edit. 18(3), 29–35 (2018) 4. Yang, X.Z., Ren, Y.Q.: Development of virtual reality and EEG linkage system and exploration of its educational research function. J. Distance Educ. 37(1), 45–52 (2019) 5. He, J.H., Liang, R.N., Han, G.X., Xiao, X., Liang, Y.S.: Research on construction of deeper learning field model based on virtual reality. E-educ. Res. 40(1), 59–66 (2019) 6. Hu, W.Q.: Application of virtual reality technology in education. Comput. Knowl. Technol. 15(9), 233–234 (2019) 7. Mine, M.R., Brooks Jr, F.P., Sequin C.H.: Moving objects in space: exploiting proprioception in virtual-environment interaction. Proceedings of SIGGRAPH, pp. 19–26 (1997) 8. Abulrub, A.H.G., et al.: Virtual reality in engineering education: the future of creative learning. In: 2011 IEEE Global Engineering Education Conference (EDUCON) (2011) 9. Mustafa, H., Carl, N.: The benefits of virtual reality in education—a comparison study. Bachelor dissertation, University of Gothenburg (2015)
384
D. Li et al.
10. Ch’ng, E., Cai, Y., Pan, Z., Thwaites, H.: Virtual and augmented reality in culture and heritage. Presence Spec. Issue Cult. Herit. (2017) 11. Fakotakis, D.: Virtual Reality Exploration of World Heritage Sites: Shaping the Future of travel, 23 July (2018) 12. Machidon, O., Duguleana, M., Carrozzino, M.: Virtual humans in cultural heritage ICT applications: a review. J. Cult. Herit. 33, 249–260 (2018)
Chapter 43
Research on Secure Keyword Query Based on Cloud Computing Ying Ren, Huawei Li, Yanhong Zhang, and Weiling Wang
Abstract Cloud computing is a virtual computer resource library, which provides an available, convenient, on-demand network access, configurable computing resource sharing pool. With the development of cloud computing technology, the storage of files in the cloud is being welcomed by more and more people. But on the cloud computing platform, the threat, especially three-dimensional data threats are still not negligible which requires the cloud computing provider to pay enough attention. In order to safely and efficiently obtain the files in the cloud server, this paper presents a more challenging and more practical cloud model and deterrence concept, which forces the cloud server to operate illegally at the cost of reducing computing and communication. Once the cloud server is illegal, the data user will find the illegal operation with a large probability. The results of the security key sorting query in the cloud are verified, and the effectiveness of the scheme in data security including 3D data security is verified.
43.1 Introduction Cloud computing is based on the increasing use and interaction of Internet-related services, usually involving the provision of dynamic, scalable, and often virtualized resources through the Internet. With the development of cloud computing technology, we have easier access to data in the cloud, more freedom to manage resources, and more money-saving access to computing and storage resources. But cloud computing services are currently monopolized by private institutions, and the data in cloud computing is no secret Y. Ren (B) · Y. Zhang Naval Aviation University, Shandong, China e-mail: [email protected] H. Li Shan Dong Business Institute, Shandong, Yantai 264001, China W. Wang Yantai Geographic Information Center, Yantai 264003, China © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_43
385
386
Y. Ren et al.
to businesses that provide cloud computing. So business and government agencies have to consider some potential dangers when choosing cloud computing services. The existing schemes to efficiently query encrypted data are based on an ideal cloud server model, but cloud servers may perform some dishonest operations. Cloud servers may falsify query results, or cloud servers may return incomplete query results to reduce the system burden when they encounter performance bottlenecks [1]. In order to verify the ranking query results in the cloud, this paper proposes a more practical and more challenging cloud server model, and uses the model to verify the results of the cloud data security ranking query. When the data provider finds suspicious behavior in the cloud, the detection can be enhanced by updating the verification data.
43.2 Model 43.2.1 System Model The system model shown in Fig. 43.1 includes three entities: data provider user, cloud server, and data usage user. The data provider user Pi Pi extracts keywords from the file collection and indexes them. For each keyword, the data provider user Pi selects ω files from the collection of files and records the file ID number and the correlation between the file and the keyword. Pi uses its sampled file ID number and correlation as anchor data with other n − 1 data providers randomly [2]. After the sample data and anchor data are obtained, each data provider user joins it into a string, and the ciphertext of the string is used as verification data. Finally, each data provider user uploads the cloud with encrypted files, indexes, and validation data. Before an authorized user submits a secure Rank-t ranking query, they first encrypt the keyword, generates the trap door, and submits it to the cloud server along with the variable t. Upon receiving a query request, the cloud server performs a security
Fig. 43.1 The security key query result verification system model in the cloud
43 Research on Secure Keyword Query Based on Cloud Computing
387
keyword ranking query and returns t files most relevant to the user query keyword to use. After users get the query results, they are free to choose whether the results need to be verified. If the user finds the query results are suspicious, an encrypted authentication data buffer is constructed and submitted to the cloud server. The cloud performs some calculations on the encrypted authentication data buffer and returns the buffer. Finally, the user decrypts and recovers the authentication information from the encrypted authentication data buffer, and uses it to verify the query results. If the query result fails validation, the query result is considered illegal [3].
43.2.2 Design Goal A secure and efficient ranking query verification scheme needs to meet three design objectives: efficiency, security, and detectability. The proposed scheme should enable users to construct validation data efficiently, cloud servers can efficiently return validation data and data user can efficiently complete authentication operations [4]. The cloud server should be prevented from acquiring the contents of the authentication data buffer. The cloud server is deterred by the scheme of operating the service in bad faith. Once the cloud server is illegal, the scheme can detect illegal behavior with a very high probability.
43.3 Query Result Design We propose a preventive scheme to deter dishonest behavior of cloud servers, which makes cloud servers afraid to operate illegally. Once the cloud server has illegal operations, the system will be detected with a very high probability. Deterrence measures include embedding sampling data anchor data and construct encryption verification data buffer allowing cloud server to operate random encrypted data dynamically and updating verification data and so on. The ultimate goal is to prevent the cloud server from doing illegal acts, and once the cloud server has done illegal acts, it will be found with a high probability and be punished with heavy penalties [5].
43.3.1 System Summary As shown in Fig. 43.2, the steps of the verification scheme are as follows: first, each data provider user prepares the validate data. Data user submit an encrypted authentication request and specifies the size of the cloud server to return the authentication data buffer. The cloud server calculates encrypted data and returns query results and validates data buffers. Data user decrypts the returned query results and verifies the
388
Y. Ren et al.
Fig. 43.2 Verification process
data buffer, and also verifies the query results to determine whether the cloud server has any wrongdoing or not.
43.3.2 Prepare Validation Data 43.3.2.1
Construction of Sampling Verification Data
Structural sampling validation data is divided into three steps: (1) Data provider user Pi select files from the original file collection; (2) Pi extracts the ID number of the sample file and correlates with the keyword; (3) Pi binds its ID to the ID relevance of the sample file. It is assumed that the key Ki of the data provider user Pi is contained in m files. The process of sampling the m files of Pi is as follows [6]: Step 1: Pi initializes the “header” file of the sampled validation data to: Ki ||i, where Ki represents the keyword and i indicates the ID number of the data provider user Pi ; Step 2: The m files of Ki are arranged in descending order according to the degree of correlation; Step 3: Pi connects FileID[0]||RT0,i to Ki ||i, where FileID[0] indicates the ID number of the file File0 , RT0,i indicates the correlation between the file File0 and the keyword Ki ; Step 4: Pi sampled ω − 1 files evenly and randomly from the remaining m − 1 file; Step 5: Connect all ω sample data together with header data Pi ||i to form sample validation data TDi ; Step 6: Algorithm returns sample validation data TDi .
43.3.2.2
Exchange Anchor Data
The data provider user exchanges a portion of the data with other users before uploading the verification data to the cloud. Such that the contact between the different data provider user is present in the authentication data. In particular, each data provider
43 Research on Secure Keyword Query Based on Cloud Computing
389
user randomly selects other n − 1 data provider user and exchanges it with the previously prepared ω sample verification data as anchor data. Because these operations take place between the data provider users, the cloud server cannot know which authentication data is contained in the authentication data of a data provider user. After the exchange, each data provider user will have the other n − 1 data to provide the users authentication data. Therefore, even if a certain data providing users file is not present in the query result, the verification data may also be present in the authentication data [7].
43.3.2.3
Combination Validation Data
Assume that the data provider user Pi receives the anchor data of the other n − 1 data provider use, Pi uses its sampling verification data and anchor data to generate the verification data further. The specific steps are as follows: Step 1: Pi extracts the ID number and correlation of each file from n − 1 anchor data; Step 2: Pi sort ω sample verification data and (n − 1) · ω anchor data in descending order of correlation; Step 3: Pi connects n · ω data to form the verification data; Step 4: Pi use a symmetric encryption algorithm to encrypt the authentication data. The key is Hs (i), Where Hs (·) is the previously mentioned secret hash function shared between data usage user and data provider user, i representing the ID number of the Pi . We represent the encryption result as Ei , which is encrypted authentication data of Pi . Eventually, Pi uploads Ei with its encrypted file and encrypted index to the cloud server.
43.3.3 Submit Fetch Validation Data Request When an authorized data user desire to validate the query results, you need to specify that the cloud server returns some data to provide the users encrypted authentication data. In order to prevent the cloud server from knowing which the encrypted authentication data of the data provider user is actually returned, we propose to construct a secret authentication data buffer. The tectonic process is as follows [8]: Step 1: Data user randomly inserts the ID of other data provider users so that the retrieval ID set element becomes large. According to the cloud, the user submits and requests to validate the data. To prevent cloud servers from knowing that the real ID, we can encrypt ID. Step 2: Data user append l to each actual ID that needs to be returned, and 0 to the other ID.
390
Y. Ren et al.
Step 3: Data user encrypts these 0 and 1 using Paillier. Here we assume that data provider users and data using users share a pair of Paillier encryption key, that is, public PK, private key SK. Therefore, 0 is encrypted into G(PK, 0) and 1 is encrypted into G(PK, 1). Step 4: Data user sents the expanded ID set and the additional encrypted data G(PK, 0) G(PK, 0), G(PK, 1) G(PK, 1) to the cloud server.
43.3.4 Return Validation Data After receiving the retrieve authentication data request of the user, the cloud server is ready to return authentication data. The cloud server first initializes an authentication data buffer with γ entries. Based on the expanded ID set provided by the data user, the cloud server finds the authentication data by the data providing users, calculates the encrypted data, and maps the result to the verification data buffer by using τ hash functions. Where the output range of each hash function is 0, γ . When the cloud processes all the ID, the cloud returns the validation data buffer to the data user [9].
43.3.5 Verify the Query Results 43.3.5.1
Restore Validation
After receiving the authentication data buffer DB returned by the cloud server, the data user uses its key SK to decrypt the authentication data buffer and get the authentication data. As you can see from Fig. 43.3, data user can recover Y1 and Y2 from the first and third entries of the DB, respectively. Because authorized data user can calculate in advance which locations of the buffer do not collide, authorized data usage users only need to decrypt entries in the buffer that did not collide, rather than decrypting the entire authentication data buffer. In order to prevent data user from using authenticated data in the buffer to detect cloud wrongdoing, the cloud server will try to destroy the contents of the authentication data buffer and falsely claim that the
Fig. 43.3 Decryption verification data buffer
43 Research on Secure Keyword Query Based on Cloud Computing
391
data cannot be recovered as a result of the collision. However, this approach in the cloud is not going to succeed because the data usage user can calculate the location of collisions ahead of time. Therefore, once the cloud server destroys the authentication data outside the collision location, it is easy for users to detect their illegal behavior.
43.3.5.2
Validate Ranking Query Result
After getting the authentication data provided by data provider user, data usage user recover the sample data and anchor data. Data usage users further utilize these data to verify that the return results are correct. The validation process is completed in two steps: (1) The data user verifies the correctness of the query results from the same data provider user; (2) If the verification is successful through the first step, the data user uses anchor data to verify the correctness of the query results from different data provider users. Once the cloud server has illegal behavior, data provider user can discover the cloud illegal behavior through the above method.
43.3.6 Update Validation Data Because data user may also be the data provider user (data user can query and validate their own data), if they find that some wrongdoing on the cloud server cannot be detected through existing authentication data, they will update the validation data. The update operation consists of the following three steps: (1) Downloading validation data from the cloud server; (2) Decrypting the verification data, updating the sampling data and the anchor data, and re-encrypting the verification data; (3) Uploading new validation data to the cloud.
43.4 Analysis 43.4.1 Safety Analysis For secure ranking query protocols, data provider users first encrypt files and indexes, and then upload encrypted data to the cloud. When the data is queried, the user first encrypts the query key to generate a trap. The trap is then submitted to the cloud server. For verification of query results, each data provider user needs to construct an authentication data for each keyword, which is encrypted with AES. The verification data is secure as long as the AES is secure. The verification request submitted by the data user is generated based on the Paillier encryption scheme, and the cloud is simply calculated on the random ciphertext. The Paillier encryption scheme is proved to be semantic safe, and the authentication scheme is also secure.
392
Y. Ren et al.
43.4.2 System Deterrence Analysis This paper proposes a deterrent to prevent the illegal operation of the cloud server at a small cost. During the validation of query results, the cloud server simply computes random ciphertext. Thus, he does not know how much of the data provided by the user is returned, nor is it known which the data of the data offer user can be used to verify the results of the query. The cloud server only knows that once he has an illegal operation, he will be detected with a high probability and will have very severe penalties. Through this series of measures, cloud servers dare not have illegal operations.
43.4.3 System Performance Analysis 43.4.3.1
Data Provides User Computing Overhead
The computational overhead of data provider user is mainly from constructing validation data. In the process of sampling validation data, the calculation time is mainly due to sorting the collection of files for each keyword. Therefore, its computational complexity is: O(m · log(m)). During the secret verification data construction process, the data providing user needs to sort all ω · n sampled data against anchor data. Therefore, its computational complexity is: O(ω · n). The computational complexity of the data-providing user is: O(max{m · log(m), ω · n}) [10].
43.4.3.2
Data Providing User Communication Overhead
The communication overhead of data provider user comes mainly from two aspects: exchanging anchor data and submitting verification data. During the anchor data exchange process, each data provider user needs to transmit ω anchor data to the other n − 1 data providing users, so the communication overhead is O(ω · n), data providing user undefined total communication overhead is: O(ω · n).
43.4.3.3
Cloud Server Compute Overhead
The computing overhead of the cloud server is mainly due to mapping validation data to verification data buffer. In this process, the cloud server needs to map β validation data to the validation data buffer, and each validation data is mapped to τ times. Therefore, the computing overhead of the cloud server is: O(β · τ ).
43 Research on Secure Keyword Query Based on Cloud Computing
43.4.3.4
393
Cloud Server Communication Overhead
The communication overhead of the cloud server mainly comes from two aspects: receiving authentication data from the data offer user and transmitting authentication data buffer to the data user. If there are S data provider user in the system, the communication overhead of the cloud server to accept the authentication data is: O(S · n · ω). At the same time, the communication overhead of the cloud server returning the authentication data buffer is O(γ · ω · n). Therefore, the total communication overhead of the cloud server is O(max{γ · ω · n, S · n · ω}).
43.4.4 Probability Analysis of Illegal Behavior Detection Once the Cloud server commits wrongdoing, the system will have a high probability of finding the wrongdoing. For a secure Rank-t ranking query, assume that the ti of the t query results appear in the returned validation data. At the same time, we assume that the system has S data provider user, data usage user recover g different authentication data from the authentication data buffer. Causes PBerror to represent the probability that the data uses the user to detect wrongdoing, then: PBerror = 1 −
P(S · m − g, t − ti ) P(S · m, t)
(43.1)
As you can see from Fig. 43.4a, when S equals 10, that is, when 10 data in the system provide users, m equals 100, and each keyword corresponds to an average of 100 files, even if ti equals 1, there is only one query result that appears in the validation data. Detection probability PBerror is still greater than 0.999. At the same time, the detection probability PBerror increases linearly with the increase of the number of returned query results t. In addition, the greater g, the greater the number of validation data recovered from the validation data buffer, and the greater the detection probability PBerror . In Fig. 43.4b, we set m = 100, g = 10, t = 50, and we find that the detection probability increases with the increase of ti . When ti > 2, the detection probability PBerror is very close to 1. In Fig. 43.4c, we set t = 50 and ti = l, and with the increase of g, the detection probability increases linearly. In addition, the greater the S, the greater the detection probability PBerror .
43.5 Conclusions This paper presents a cloud server model that validates the results of security keyword ranking queries in cloud computing environments. Once the cloud server is operating illegally, the data user will discover the illegal operation with a high probability. In order to achieve this goal, the data provider user systematically constructs the
394
Y. Ren et al.
Fig. 43.4 Probability of illegal behavior detection
sampled data, anchor data and assembles them into the verification data, which makes the cloud server unable to know how much data is set to the verification data, and which data provider users construct the secret authentication buffer, which makes the cloud server unable to know which data provider authentication data is returned as anchor data. In addition, once the data provider user finds the cloud suspicious, the data provider user can dynamically update the validated data. In order to improve the detection ability of cloud illegal behavior as far as possible, we study the key parameters of the system and give the optimal solution under different parameter settings. Data usage users can configure relevant parameter values according to their personal preferences or system requirements. Finally, the effectiveness of the scheme is verified by theoretical analysis and experiments.
References 1. Xu, Z., Kang, W., Li, R.: Efficient multi-keyword ranked query on encrypted data in the cloud. In: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp. 244–251. IEEE (2012) 2. Li, R., Xu, Z., Kang, W.: Efficient multi-keyword ranked query over encrypted data in cloud computing. Futur. Gener. Comput. Syst. 30, 179–190 (2014)
43 Research on Secure Keyword Query Based on Cloud Computing
395
3. Sun, W., Wang, B., Cao, N.: Verinable privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. IEEE Trans. Parallel Distrib. Syst. 25(11), 3025–3035 (2014) 4. Xu, J., Zhang, W., Yang, C.: Two-step-ranking secure multi-keyword search over encrypted cloud data. In: Proceedings of the International Conference on Cloud and Service Computing (CSC), pp. 124–130. IEEE (2012) 5. Wang, B., Li, B., Li, H.: Oruta: privacy-preserving public auditing for shared data in the cloud. In: Proceedings of the 5th IEEE International Conference on Cloud Computing (CLOUD), 295–302. IEEE (2012) 6. Li, J., Wang, Q., Wang, C.: Fuzzy keyword search over encrypted data in cloud computing. In: Proceedings of the IEEE INFOCOM, pp. 1–5. IEEE (2010) 7. Chuah, M., Hu, W.: Privacy-aware bedtree based solution for fuzzy multi-keyword search over encrypted data. In: Proceedings of the 31st International Conference on Distributed Computing Systems Workshops, pp. 273–281. IEEE (2011) 8. Wang, B., Yu, S., Lou, W.: Privacy-preserving multi-keyword fuzzy search over encrypted data in the cloud. In: Proceedings of the IEEE INFOCOM, pp. 2112–2120. IEEE (2014) 9. Wang, C., Ren, K., Yu, S., et a1.: AchieVing usable and privacy-assured similarity search over outsourced cloud data. In: Proceedings of the IEEE INFOCOM, pp. 451–459. IEEE (2012) 10. Paillier, P.: Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of the Advances in Cryptology-EUROCRYPT, pp. 223–238. Springer (1999)
Chapter 44
Three-Dimensional Computational Imaging with One Liquid Crystal Lens Camera and Refocusing from Extended Depth of Field Intensity and Depth Image Pair Xiaoxi Chen, Yalei Zhang, Jiacheng Ma, Liming Zheng, and Mao Ye Abstract This paper proposes a novel refocusing method that can effectively, digitally refocus objects on the arbitrary plane of the three-dimensional scene from the all-in-focus intensity and depth image pair with a liquid crystal (LC) lens camera. LC lens imaging system combines the transient prosperity of LC lens and the image process to simultaneously obtain the large depth of field intensity and depth image. The LC lens imaging system captured a set of constant magnification images with different focused distances between the two LC lens states’ switching. Then the depth information will be calculated with the far- and near-focused images, and the large depth of field intensity will be fused from the sets of images.
44.1 Introduction Computational photography and imaging is a rapidly developing and interdisciplinary research field including applied optics, image processing, computer vision, and others to enhance or extend the capabilities of digital photography and imaging such as super-resolution imaging, high dynamic range imaging, 3D imaging, extended depth of field (DoF), and refocusing. 3D imaging technologies have an important position among the computational imaging applications to obtain the depth or range of scene from 2D imagery. Depth information provides more possibilities for image segmentation, target detection, object tracking, and other computer vision applications and is applied in refocusing, DoF control, and 3D display. There are many kinds of 3D imaging technologies available and can be divided into active and passive methods by controlling the illumination environment or the imaging path. The active 3D imaging is to measure the DoF by projecting to the observed object actively using an independent artificial light source. Passive 3D X. Chen (B) · Y. Zhang · J. Ma · L. Zheng · M. Ye University of Electronic Science and Technology of China, Chengdu 610054, Sichuan, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_44
397
398
X. Chen et al.
imaging is composed of more than two spatial or temporal sensors by imitating biological eyes. The DoF is calculated according to the principle of geometric relationship measurement by using the position of the observed object in each image sensor and the relative physical position of several image sensors. Depth information perception from binocular vision is based on two cameras likening human eyes. Integral imaging and light-field imaging can record the direction of ray using a lens array [1, 2]. Depth from focus (DFF) selects the clearest image from the image sequence of the 3D scene, and then calculates the depth according to the imaging formula [3, 4]. Depth from defocus (DFD) takes two or more defocused images of scenes, and analyzes the depth by comparing the defocus degrees [5–7]. The depth image can be calculated from the images of LC lens variable focal length. In this paper, we give the proposed imaging system with LC lens to capture the all-in-focus and depth image and the experimental results.
44.2 Proposed Imaging System Figure 44.1 presents a schematic configuration and the basic principle of the proposed imaging system using a LC lens camera. The compound lens consists of the main lens with a fixed focal length and a LC lens with a tunable focal lens. The distance between lens and image plane v is kept as v0 . When the LC lens is applied 0 voltage, the focal length of camera is f g and the system focuses on some planes, as shown in Fig. 44.1a. If the LC lens worked as a negative lens, the focal length of camera becomes longer than f g and the system focuses on a far plane, as shown in Fig. 44.1b. If the LC lens worked as a positive lens, the focal length of camera becomes shorter than f g and the system focuses on a near plane, as shown in Fig. 44.1c. The aperture of LC lens is used as the aperture stop of the compound lenses and the diaphragm of the imaging system; the magnification of the same object point remains unchanged under different LC lens power. In this way, without mechanical movement and magnification changing, misalignment problems are avoided. If a series of images focused on different distances are taken, an extended DoF image can be obtained by wavelet fusing the images in the series. If the required powers of LC lens to images that focused on different distances are confirmed, a depth image can be calculated by DFD or DFF. The improved RGB-D image pairs enable post acquisition the images of refocused different distances or reshaped aperture.
44.3 Experimental Results Figure 44.2 shows a photograph of the experimental setup composed of a camera, a main lens, and the LC lens. The focal length of the main lens is 8 mm. The LC lens is attached to the main lens, and the gap between the LC lens and the main lens
44 Three-Dimensional Computational Imaging with One Liquid Crystal …
Fig. 44.1 Formation of focused and defocused images
399
400
X. Chen et al.
Fig. 44.2 Experimental setup
is negligible. We manufacture a 2 mm hole-pattern LC lens [8] with the LC layer of 31 µm,as shown as in Fig. 44.3. The wavefront of the LC lens is measured by the interferometer, thus the optical powers and root mean square (RMS) aberrations of LC lens at different voltages or times are calculated by the Zernike polynomials fitting [9]. After measuring the power of LC lens under different driving voltages (v1
Fig. 44.3 Structure of the LC lens
44 Three-Dimensional Computational Imaging with One Liquid Crystal …
401
Fig. 44.4 The lens states of LC lens switching
and v2 ), the maximum positive power of LC lens is P+LC = 3.3 m−1 when v1 = 37 V −1 when v1 = and v2 = 12 V, and the maximum negative power is P− LC = −4.4 m 8 V and v2 = 65 V. Figure 44.4 shows the transient optical properties between the switching of P+LC and P− LC , when the LC lens is in the non-lens status, the spacing between the combined lens and camera sensor v0 ≈ 8.32 mm, and adjusting the system to focus plane at 20 cm distance. Thus, the focused plane is changed from near 12 cm to far 200 cm when the optical power of the LC lens switches between 4.4 and −2.5 m−1 . The camera sensor has 2048 × 2048 pixels with the pixel size of 2.2 × 2.2 µm. Due to the captured speed, the resolution of captured images is set as 1024 × 960 pixels and the gap time between adjacent frames is set as 20 ms. In order to prove that the magnification across the scene depth is preserved, magnification variation of the adjacent images is within a pixel shift calculated using the Fourier transform phase principle. Thus the magnification of the system can be considered being constant. We set up 6 objects in the scene placed at different distances, working on the optical power between P+LC and P− LC , getting 30 fps video. The calculated DoFimproved image is shown in Fig. 44.5a. Depth information of scene is calculated by DFD as shown in Fig. 44.5b, the depth of the initial focus plane is near 128 mm (Fig. 44.6).
402
X. Chen et al.
Fig. 44.5 Intensity and depth image pair a Depth map b Extended DoF image
Fig. 44.6 Digital refocused images a focused on far plane b focused on middle plane c focused on near plane
44 Three-Dimensional Computational Imaging with One Liquid Crystal …
403
44.4 Conclusions In this paper, a refocusing method using a LC lens is proposed. All of the optical devices are fixed, and thus mechanical manipulation in the conventional image system is not required. The LC lens is placed in the proposed imaging system by applying different voltages on the LC layer to switch the power of system between two lens states, to capture the image series of the different focused distances. The LC lens is designed to have fast response time and good lens states with low aberrations during the period of switching two lens states. The depth and DoF-improved images are calculated and save from the images of LC lens switching in the real-time imaging. Our experimental results prove that this is an effective way to reconstruct the depth and all-in-focus images simultaneously. The reconstructed refocused image from the depth and all-in-focus images confirm that the proposed method can be used in the 3D photography with two 2D image data and post processes including the refocusing.
References 1. Xiao, X., Javidi, B., Martinez-Corral, M., Stern, A.: Advances in three-dimensional integral imaging: sensing, display, and applications. Appl. Opt. 52(4), 546–560 (2013) 2. Ng, R.: Digital light field photography. Ph.D. thesis Stanford University, vol. 115(3), pp. 38–39 (2006) 3. Nayar, S.K., Nakagawa, Y.: Shape from focus. IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 824–831 (1994) 4. Subbarao, M., Tyan, J.K.: Selecting the optimal focus measure for autofocusing and depth-fromfocus. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 864–870 (1998) 5. Watanabe, M., Nayar, S.K.: Rational filters for passive depth from defocus. Int. J. Comput. Vis. 27(3), 203–225 (1998) 6. Ye, M., Chen, X., Li, Q., Zeng, J., Yu, S.: Depth from defocus measurement method based on liquid crystal lens. Opt. Express 26(22), 28413–28420 (2018) 7. Nilanjan, D., Chintan, B., Amira, S.A.: Big Data for Remote Sensing: Visualization, Analysis and Interpretation: Digital Earth and Smart Earth. Springer, Switzerland (2018) 8. Wang, B., Ye, M.: Sato: Liquid crystal lens with focal length variable from negative to positive values. IEEE Photon. Technol. Lett. 18(1), 79–81 (2006) 9. Chen, X., Bai, Y., Chao, C., Ye, M.: Driving liquid crystal lens to extend focus range. Jpn. J. Appl. Phys, 57 (2018)
Chapter 45
Application of Finite Element Analysis in Calibration of Non-contact Magnetic Piezoelectric Type of Displacement Sensor Jin Gao Abstract Aiming at the problems of large workload and many error factors in the calibration of non-contact magnetic piezoelectric displacement sensor, a method of calibration by finite element simulation is designed. ANSYS was used to establish a two-dimensional model of two magnets. After defining the material properties, the finite element mesh was divided to simulate the relationship curve between force and displacement of two magnets in air medium. The accuracy of simulation results is verified by experimental comparison.
45.1 Introduction Non-contact magnetic piezoelectric displacement sensor is a new type of displacement sensor. The sensor is based on a combination of magnet homopolar repulsion and piezoelectric effect. It is necessary to calibrate the sensor before use, but the force between the magnets is very complex and cannot be calculated by conventional mathematical methods, so the commonly used calibration method is experimental calibration method. The numerical correspondence between the input and output of the sensor is obtained through multiple experiments. This method has the disadvantages of large workload and many error factors. Finite element method is to decompose the whole problem area, so as to obtain a number of relatively simple sub-areas, and then to calculate the solution. Through the finite element method, ANSYS can be used to easily conduct two-dimensional modeling of the magnet and simulate its force [1–5].
J. Gao (B) Research Institute of Highway Ministry of Transport, Beijing 100088, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_45
405
406
J. Gao
45.2 The Structure of Non-contact Magnetic Piezoelectric Displacement Sensor The mechanical structure of the displacement sensor is shown in Fig. 45.1. It mainly includes calibration bar, magnet 1, magnet 2, and piezoelectric element. Both magnet 1 and 2 are cylindrical N48 Nd-Fe-B magnets [6] with a diameter of 50 mm and a thickness of 20 mm. The range of the piezoelectric element is 1000 N, and the linear signal voltage is 0–5 V. Magnet 1 is fixed at one end of the displacement to be detected through a calibration bar. The piezoelectric element is installed at the other end of the displacement to be detected, and magnet 2 is attached to the piezoelectric element. Magnet 1 and magnet 2 are on the same axis and opposite to each other. The calibrating rod is composed of metal sleeve, bolt and thread column. The metal casing and threaded column are connected by pipe thread, and the distance between the two magnets is adjusted by rotating the threaded column to adapt to different ranges and facilitate calibration. Bolts are used to lock or loosen the metal sleeve and threaded column.
Fig. 45.1 Structure of non-contact magnetic piezoelectric displacement sensor
45 Application of Finite Element Analysis in Calibration …
407
45.3 Sensor Modeling and Simulation Cylindrical Nd-Fe-B magnet is axisymmetric magnet of magnetic field, so it can be modeled by vector potential method and analyzed by static magnetic field simulation.
45.3.1 Establishment of Parametric Model Based on the theory of 3D modeling and simulation [7], two-dimensional modeling of two cylindrical Nd-Fe-B magnets was conducted and divided into magnet unit area, medium unit area, as shown in Fig. 45.2. And define the parameters as follows: Magnet 1: Radius r 1 = 25 mm, Thickness δ 1 = 10 mm Magnet 2: Radius r 2 = 25 mm, Thickness δ 2 = 10 mm Initial distance between magnet 1 and 2 d =4 mm 2 Gap between magnet 1 and medium: d1 = 3 d2 + δ1 + r12 2 Gap between magnet 2 and medium: d2 = 3 d2 + δ2 + r22
Fig. 45.2 Two dimensional model of two magnets
408
J. Gao
Radius of medium around magnet r 3 = max(d1 , d2 ) Infinite cell radius r 4 = 2r 3 .
45.3.2 Define Material Properties Magnet 1 and 2 are both N48 Nd-Fe-B magnets, and the parameters were define as follows, Br = 1.38T, Hcb=835000A/m. Remanence induction intensity Br = 1.38T, Magnetic induction coercivity bHc = 835KA/m. According to the linear b–h relationship of Nd-Fe-B magnet, the relative permeability of the magnet(μr ) is calculated to be 1.32 [8]. μr =
Br 1.38 = 1.32 = μ0 H0 4π × 10−7 835 × 103
(45.1)
45.3.3 Establish the Finite Element Model and Solve It Due to the B-H characteristics and low frequency simulation characteristics of NdFe-B magnet, PLANE53 element and INFIN 110 element are selected. PLANE53 unit is a two-dimensional 8-node magnetic solid unit used to define magnets and media (air). INFIN 110 cell is a two-dimensional 8-node infinite quadrilateral cell that can be used with PLANE53 in plane and axisymmetric analysis, where it is used for the outer boundary [9]. In the grid division of the infinite distance element area, the radial line element grid density was selected as 1, the overall element grid density was selected as 32, and the quadrilateral element was selected as the type. The mapping grid splitter was used for the division, and the INFIN 110 far-field element was obtained. In the grid division of the magnet element area and the medium element area, a free grid divider is used and intelligent element size method is adopted to select precision level 2 and triangular elements for the division. The finite element model after grid division is shown in Fig. 45.3. The following work is carried out after grid division: (1) Apply INF: select all nodes on the far-field boundary and apply magnetic markers. (2) Parallel boundary conditions are applied to the axis of symmetry. (3) Modify the coordinate system: change the polarization direction of the magnet to the X-axis direction of the coordinate system. (4) Magnet 1 and 2 units are defined as components, and Maxwell face marks and virtual work boundary conditions are applied by FMAGBC.
45 Application of Finite Element Analysis in Calibration …
409
Fig. 45.3 Two-dimensional finite element model of two magnets
Fig. 45.4 The simulation curve
(5) The migration step length of the magnet is defined as 0.1 mm, the initial value is 4 mm, and the maximum displacement is 60 mm. FMAGSUM is used to solve the force between two magnets. The simulation curve is shown in Fig. 45.4.
410
J. Gao
Fig. 45.5 Comparison diagram of simulation and experimental data
45.4 Experimental Verification To verify the accuracy of the simulation results, the non-contact magnetic piezoelectric displacement sensor was fixed and tested. In the experiment, the threaded column is used to regulate the distance between magnet 1 and magnet 2, the inner micrometer is used to measure the distance between the two magnets, and piezoelectric sensor is used to detect the pressure on magnet 2. The applied force of the magnet is obtained by removing the weight of magnet 2. After the experiment, the experimental results and simulation data are compared as shown in Fig. 45.5. It can be seen from the comparison of experimental data and simulation data that there is a certain error between the two, but the relative error is within 10%, and the overall trend of magnetic force is consistent.
45.5 Conclusion (1) This paper presents a new calibration method of non-contact magnetic piezoelectric displacement sensor based on finite element method. (2) The finite element model of the sensor is simulated by ANSYS. The accuracy and feasibility of simulation are verified by experimental comparison. It provides a new method and basis for sensor calibration.
45 Application of Finite Element Analysis in Calibration …
411
Acknowledgements This research was supported by “Technical requirements and test procedures for autonomous operating vehicles” and “Research on integration of intelligent vehicle-road cooperation system and operational risk prevention system” (2019–0010).
References 1. Wei, Y.: Theoretical research and finite element analysis of the magnetic gear. Machinery 4(38), 21–24 (2011) 2. Niu, X.: Research on Magnetic Field of Permanent Magnet Governor Based of ANSYS. Chang’an University (2012) 3. Wang, Y., Pan, S., Wang, W.: Mathematical modeling and FEA of magnetic density for MR da. Machinery 5(33), 1–3 + 6 (2006) 4. Dey, N., Mukherjee, A.: Embedded Systems and Robotics with Open Source Tools, pp. 45–47. CRC Press (2017); 5. Mukherjee, A., Dey, N.: Smart Computing with Open Source Platforms, pp. 37–41. CRC Press (2019) 6. Guo, D., Cheng, M.: Design and analysis of electric—mechanical converter’s magnetic circuit structure of pulsed gas supply valve. Machinery 10(40), 54–58 (2013) 7. Yu, Z., Zhang, G., Qiu, Q., Hu, L.: Numerical simulation of levitation characteristics of a cylindrical permanent magnet and a high-temperature superconductor based on the 3D finiteelement method. Trans. China Electrotech Soc. Press, 32–37 (2015) 8. Li, H.: Preparation and Performance Study of Bonded NdFeB Magnet. Taiyuan university of science and technology (2011) 9. Sun, M., Hu, R., Cui, H.: ANSYS10.0 Electromagnetic Finite Element Analysis Example Tutorial, pp. 14–16. China Machine press (2007)
Chapter 46
Approaching Obstacle Detection by a Vehicle Fisheye Camera Yu Pu Shi, Hai Ping Wei, and Hong Fei Yu
Abstract The fisheye camera can get rich information, and the fisheye camera has a lower installation cost. Therefore, it has an irreplaceable role in the assisted driving system. This paper proposes a detection method based on the fisheye camera. Firstly, a method of feature block filtering based on gradient maxima is proposed, which makes it possible to distribute the features as much as possible on the outer and inner edges of the obstacle and on the edge of the texture, while limiting the number of selected features. Much of the integrity of the obstacle area is preserved. The feature block classes are then clustered to obtain a complete obstacle region. Compared with the current popular moving target detection methods, the proposed algorithm can preserve the integrity of the obstacle area more effectively, and can effectively reduce the feature matching error rate. Moreover, the method can effectively judge the relative distance between the obstacle and the vehicle. Change the situation to detect proximity to obstacles. Experiments show that the proposed method is effective in many scenarios and has stronger accuracy and robustness.
46.1 Introduction In the current environment of increasing accident traffic rate, the automobile auxiliary driving system has become an important research topic. Fisheye camera is becoming more and more important in the field of moving target detection because of its abundant information and low installation cost. Compared with target detection in stationary environment [1–3], the difficulty of moving target detection in vehicle assistant driving system lies in the changing background. Because of the complex driving environment, there are many kinds of targets to be detected in assistant driving system, which makes it more difficult to extract features. In addition, because of the characteristics of fisheye camera, the image is seriously deformed, which makes the moving target more difficult. Y. P. Shi · H. P. Wei (B) · H. F. Yu Liaoning Shihua University, Fushun, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2020 R. Kountchev et al. (eds.), Advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology, Smart Innovation, Systems and Technologies 179, https://doi.org/10.1007/978-981-15-3863-6_46
413
414
Y. P. Shi et al.
In the existing vehicle environment, there are two mainstream algorithms for moving object detection. The first is the frame difference method, and the second is the optical flow-based algorithm. Among them, the frame difference method is to compare the pixel difference of the corresponding position of the adjacent frames, and the difference is greater than the threshold, then the moving object is considered and marked [4–6]. The disadvantage is that the detection object is easy to exist, the detection object is incomplete, false detection, missed detection, and the like. The optical flow method converts the velocity information of a three-dimensional space into two-dimensional calculations, and analyzes the flow vector characteristics over time to detect the motion regions in the image sequence [7–11]. In this paper, a fisheye camera is used. In the research, the original method tried to eliminate the original image of the fisheye and then apply the standard algorithm to the undistorted image, such as local binary mode (LBP) [12] or DPM [13] problem. However, this method relies heavily on the camera’s own calibration parameters. The non-twisting process usually impairs the image quality, and the obtained image may have some differences from the original image, thereby adversely affecting the subsequent process. In this paper, a new method of moving object detection based on feature points is proposed. The feature points are obtained by detecting the local gradient maximum, and then the feature points are clustered to obtain the obstacle region. In addition, in order to solve the problem of fisheye camera deformation, we project the fisheye head into a spherical model and normalize the spherical model. Get the object area of the sport. The advantage of this algorithm is that a relatively complete obstacle region can be obtained, and it is suitable for multi-object detection, which improves accuracy and robustness compared with traditional algorithms.
46.2 Imaging Model of Fisheye Camera The coordinate system O-XYZ is established with the optical center of the camera as the origin, where OY is perpendicular to the ground, OZ is parallel to the ground, OZ is coincident with the camera optical axis, and OX and OY are parallel to the image plane ox- and oy-axes. In order to compensate for the deformation of the fisheye camera imaging, and at the same time it is easy to calculate, we use the spherical projection method to project the center point of the feature block into the spherical coordinates, the imaged point (x, y) in the image and the point in the space (X, Y, Z) have a corresponding function relationship, namely: (x, y) = P (X, Y, Z), where P is a mapping relationship function. Projecting the spatial coordinates (X, Y, Z) onto the spherical surface, there is a mapping relationship (x, y) = P (X, Y, Z). Then, any point P (X, Y, Z) outside the origin in the space is placed. The normalization process is performed to normalize the coordinates ps (xs , ys , z s ), and there is a one-to-one correspondence between the two which is
46 Approaching Obstacle Detection by a Vehicle Fisheye Camera
⎞ ⎛ ⎞ xs X ⎜ ⎟ ⎜ ⎟ ⎝ ys ⎠ = D · ⎝ Y ⎠ Z zs
415
⎛
(46.1)
√ where D = X 2 + Y 2 + Z 2 . The correspondence between the point P (X, Y, Z) to the imaging plane point p(u, v) can be written as p(u, v) = G(P(X, Y, Z ))
(46.2)
G is a point mapping relation function, which can be obtained from the literature. Equations (46.1) and (46.2) establish the mapping relationship between the point p(u, v) in the image and the normalized coordinates of the space.
46.3 Approaching Object Detection The approach detection in this paper is based on the detection of feature points. According to the working principle of the camera and the principle of near-large or far-small in the image, the flow chart of the detection method in this paper is shown as follows (Fig. 46.1).
46.3.1 Feature Point Detection and Matching Based on Fisheye Image In the extraction of the feature block of interest in this paper, it is first necessary to obtain a gradient map of the image of the frame at the current time: Mt (x, y) = |
∂ It (x, y) ∂ It (x, y) |+| | ∂x ∂y
(46.3)
∂ It (x,y) ∂x
t (x−1,y) = It (x+1,y)−I indicates the derivative of the current frame image in the 2 ∂ It (x,y) t (x,y−1) represents the derivative of the horizontal direction x, ∂ y = It (x,y+1)−I 2 current frame image in the vertical direction y. Then need to preserve the point of gradient value of the local maximum of the image’s gradient map, thus obtaining a new gradient map M’t .
⎧ M(x, y) ≥ M(x, y + 1) M(x, y) ≥ M(x + 1, y) ⎪ ⎨ M(x, y) ≥ M(x, y − 1) M(x, y) ≥ M(x − 1, y) or ⎪ ⎪ ⎩ ⎩ M(x, y + 1) = M(x, y − 1) M(x + 1, y) = M(x − 1, y) ⎧ ⎪ ⎨
(46.4)
416
Y. P. Shi et al.
Fig. 46.1 Algorithm flowchart
The points in M’t are traversed in a predetermined fixed step size. Then, using a pre-set (w * w) size search window, the number n of points larger than the given gradient threshold in the statistic center is traversed, where for window size and for control deviation, select a luminance feature point block of size (w * w) is generated by traversing the maximum point position within the window and centering on the image. When the points in the image are traversed in the set fixed step size, a plurality of feature point blocks composed of the pixel luminance values in the image are obtained (Figs. 46.2 and 46.3).
46.3.1.1
Matching of Feature Points
This paper uses Lucas and Kanade’s feature point tracking method to obtain matching feature points in adjacent frame images [14]. Lucas and Kanade’s optical flow method is popular in current tracking methods. This method is used to calculate the movement of pixel positions between two frames of images. This method is a difference method and is a method based on the Taylor
46 Approaching Obstacle Detection by a Vehicle Fisheye Camera
417
Fig. 46.2 The original image of It
Fig. 46.3 The point block selection of the image It , where the red dot indicates the center position of the selected feature point block
series of image signals, using partial derivatives for spatial and temporal coordinates. The premise of the calculation of this method is a small motion with constant brightness and uniform space (Fig. 46.4). Fig. 46.4 Feature point matching diagram
418
46.3.1.2
Y. P. Shi et al.
Eliminating Feature Points of Mismatching
Continue to use Lucas and Kanade’s method to further obtain the matching feature block on the It−2 frame image. Continue the above method to track until the t k frame image is tracked (k ≥ 3), a sequence of feature points having a length k + 1 formed by the center position of the feature block can be obtained. Because of the motion characteristics of the vehicle, the motion of the points in the image should have a certain smoothness. Therefore, the removal of the mismatched feature point block in this paper is determined according to whether the sequence of feature points is smoothed. In this paper, the smoothness of the judgment point sequence is judged according to the difference of the angle and length of the adjacent feature point vectors in the sequence of feature points. The specific calculation method is as follows: For the feature point sequence {(xi , yi )|i ≥ t − k, i ≤ t} formed by the feature matching block center, the difference between the vectors formed by each two adjacent feature point pairs is calculated: ⎤ ⎡ 2 ⎤−1 ⎡ ⎤ I I I I I − Ixi Iti Vx xi xi 2 yi xi zi ⎣ Vy ⎦ = ⎣ Ixi I yi I I I ⎦ ⎣ − I yi Iti ⎦ yi yi 2 zi Vz − Izi Iti I xi I z i I yi Izi Izi yi − yi−1 yi−1 − yi−2 180 × − arctan B = arctan xi − xi−1 xi−1 − xi−2 π ⎡
(46.5)
(46.6)
If the difference between each two adjacent feature point pairs A and B is satisfied: A < T1 B < T2
(46.7)
Then the sequence is considered to be the correct matching sequence, and the corresponding matching point block obtained in is the successfully matched feature point block. Here are the distance threshold and the angle threshold, respectively, which can be flexibly set according to the actual situation. The T1 = 0.2, T2 = 80◦ are set in this article (Fig. 46.5).
46.3.2 Feature Point Clustering The clustering method in this paper is to connect the center points of the feature blocks of the images It and It−1 , and obtain the center point of the feature block n , n ∈ 1, 2 . . . N pointing to the of the frame image at time t, and the vector pt−1,t center point of the feature block at time t − 1, N represents the total number of image feature blocks that have been successfully matched.
46 Approaching Obstacle Detection by a Vehicle Fisheye Camera
419
Fig. 46.5 Sequence smoothness diagram: a smooth sequence b non-smooth sequence, vector with large change in direction in the sequence, see red vector c non-smooth sequence in the figure, phasor with large change in length in the sequence, see figure Medium red vector n n n pt−1,t = (xtn − xt−1 , ytn − yt−1 )
(46.8)
1. Select any unmarked feature point block center position (xti , yti ) as the seed point to mark it. If it has not been assigned a class number, assign a new class number, otherwise use the class number it has obtained; 2. Search for the remaining unmarked and unclassified feature point block center positions, assigning the same class number to all feature point blocks satisfying (46.9) (46.9) (46.11). If there is no such feature point block, mark the current point as the feature point of the seed point. Based on experience: j
j
|xt − xti | + |yt − yti | < R
(46.9)
where R is the distance threshold, R = 7 pixels |x i − x i | + |y i − y i | − |x j − x j | − |y j − y j | t t t t t−1 t−1 t−1 t−1