288 65 5MB
English Pages 183 Year 2020
Intelligent Systems Reference Library 182
Margarita N. Favorskaya Lakhmi C. Jain Editors
Computer Vision in Control Systems—6 Advances in Practical Applications
Intelligent Systems Reference Library Volume 182
Series Editors Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology, Sydney, NSW, Australia; KES International, Shoreham-by-Sea, UK; Liverpool Hope University, Liverpool, UK
The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and well structured form. The series includes reference works, handbooks, compendia, textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains well integrated knowledge and current information in the field of Intelligent Systems. The series covers the theory, applications, and design methods of Intelligent Systems. Virtually all disciplines such as engineering, computer science, avionics, business, e-commerce, environment, healthcare, physics and life science are included. The list of topics spans all the areas of modern intelligent systems such as: Ambient intelligence, Computational intelligence, Social intelligence, Computational neuroscience, Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems, e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent control, Intelligent data analysis, Knowledge-based paradigms, Knowledge management, Intelligent agents, Intelligent decision making, Intelligent network security, Interactive entertainment, Learning paradigms, Recommender systems, Robotics and Mechatronics including human-machine teaming, Self-organizing and adaptive systems, Soft computing including Neural systems, Fuzzy systems, Evolutionary computing and the Fusion of these paradigms, Perception and Vision, Web intelligence and Multimedia. ** Indexing: The books of this series are submitted to ISI Web of Science, SCOPUS, DBLP and Springerlink.
More information about this series at http://www.springer.com/series/8578
Margarita N. Favorskaya Lakhmi C. Jain •
Editors
Computer Vision in Control Systems—6 Advances in Practical Applications
123
Editors Margarita N. Favorskaya Reshetnev Siberian State University of Science and Technology Krasnoyarsk, Russia
Lakhmi C. Jain Faculty of Engineering and Information Technology, Technology Centre for Artificial Intelligence University of Technology Sydney Broadway, NSW, Australia
ISSN 1868-4394 ISSN 1868-4408 (electronic) Intelligent Systems Reference Library ISBN 978-3-030-39176-8 ISBN 978-3-030-39177-5 (eBook) https://doi.org/10.1007/978-3-030-39177-5 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The research book is a continuation of our previous books which are focused on the recent advances in computer vision methodologies and technical solutions using conventional and intelligent paradigms. • Computer Vision in Control Systems—1, Mathematical Theory, ISRL Series, Volume 73, Springer-Verlag, 2015 • Computer Vision in Control Systems—2, Innovations in Practice, ISRL Series, Volume 75, Springer-Verlag, 2015 • Computer Vision in Control Systems—3, Aerial and Satellite Image Processing, ISRL Series, Volume 135, Springer-Verlag, 2018 • Computer Vision in Control Systems—4, Real Life Applications, ISRL Series, Volume 136, Springer-Verlag, 2018 • Computer Vision in Control Systems—5, Advanced Decisions in Technical and Medical Applications, ISRL Series, Volume 175, Springer-Verlag, 2020 The main aim of this volume is to present a sample of recent practical application of computer vision systems implemented by a number of researchers in Russian Federation. The book is directed to the Ph.D. students, professors, researchers, and software developers working in the field of computer vision technologies and their applications. We wish to express our gratitude to the authors and reviewers for their contributions. The assistance provided by Springer-Verlag is acknowledged. Krasnoyarsk, Russia Broadway, Australia
Margarita N. Favorskaya Lakhmi C. Jain
v
Contents
1
2
3
Image Processing for Practical Applications Lakhmi C. Jain and Margarita N. Favorskaya 1.1 Introduction . . . . . . . . . . . . . . . . . . . . 1.2 Chapters in the Book . . . . . . . . . . . . . 1.3 Conclusions . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
..................
1
. . . .
. . . .
1 2 5 5
...
7
...
7
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
New Methods of Forming and Measurement of Sub-pixel Shift of Digital Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuriy S. Radchenko and Olga A. Masharova 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Shift Algorithm Based on Discrete Chebyshev Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Position Estimation in Noisy Images . . . . . . . . . . . . . . . . . 2.4 Analyze of Autocorrelation Function . . . . . . . . . . . . . . . . . 2.5 Shift’s Estimation by Using Discriminator . . . . . . . . . . . . . 2.5.1 Discriminator Structure . . . . . . . . . . . . . . . . . . . . . 2.5.2 Distribution Law of Estimation . . . . . . . . . . . . . . . 2.5.3 Robust Estimate of Signal Parameter . . . . . . . . . . . 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Characteristics of the Phase-Energy Image Spectrum . . . . Andrei V. Bogoslovsky, Irina V. Zhigulina, Vladimir A. Sukharev and Maksim A. Pantyukhin 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Model of One-Dimensional Energy-Phase Spectrum . . . 3.3 The Model of Two-Dimensional Phase-Energy Spectrum . . 3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . . . . . . .
8 11 13 14 15 18 20 22 23
...
25
. . . . .
25 26 31 35 36
. . . . . . . . .
. . . . . . . . .
. . . . .
. . . . .
vii
viii
4
5
6
7
Contents
......
39
...... ...... ......
39 40 42
...... ......
43 47
...... ...... ......
49 51 52
..........
53
. . . . . . .
. . . . . . .
53 54 56 57 59 61 62
....
63
. . . .
. . . .
. . . .
. . . .
63 64 65 67
. . . . .
. . . . .
. . . . .
. . . . .
68 70 71 75 75
......
77
...... ...... ......
77 78 79
Detectors Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrei V. Bogoslovsky, Andrey V. Ponomarev and Irina V. Zhigulina 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Primitive Detectors Field . . . . . . . . . . . . . . . . . . . 4.3 Drift of the Detectors Field . . . . . . . . . . . . . . . . . . . . . 4.4 Two-Dimensional Discrete Filtering of Detectors Fields for Output Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Using Detectors Field Filtering in Images Affected by Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparative Evaluation of Algorithms for Trajectory Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Konstantin K. Vasiliev and Oleg V. Saverkin 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Target Motion Models . . . . . . . . . . . . . . . . . . . . . 5.3 Trajectory Filtration Algorithms . . . . . . . . . . . . . . 5.4 Body-Fixed Frame . . . . . . . . . . . . . . . . . . . . . . . 5.5 Comparative Analysis of Filtration Efficiency . . . . 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Watermarking Models of Video Sequences . . . . . . . . . . . . . . . Margarita N. Favorskaya 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Watermarking Model of Videos in Uncompressed Domain 6.4 Watermarking Models of Videos in Compressed Domain . 6.4.1 Watermarking Schemes for Compressed Video Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Watermarking Models for Three Strategies . . . . . 6.5 Basic Requirements for Watermarking Schemes . . . . . . . . 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Data Acquisition and Management Software for Camera Trap Data Studies . . . . . . . . . . . . . . . . . . . . . . Aleksandr Zotin and Andrey Pakhirka 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Camera Traps Data . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
Contents
7.4
Proposed Software System . . . . . . . . . . 7.4.1 Module of Data Management . . 7.4.2 Module of Preliminary Analysis 7.4.3 Module of Image Enhancement . 7.4.4 Module of Animal Detection . . . 7.4.5 Module of CNN Control . . . . . . 7.4.6 Module of Semantic Description 7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
9
ix
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
Two-Stage Method for Polyps Segmentation in Endoscopic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nataliia A. Obukhova, Alexander A. Motyko and Alexaner A. Pozdeev 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Proposed Two-Stage Approach for the Classification and Segmentation of Polyps . . . . . . . . . . . . . . . . . . . . 8.3.1 The Idea of a Two-Stage Approach . . . . . . . . . 8.3.2 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Binary Classification Based on Global Features 8.3.4 Segmentation Based on CNN . . . . . . . . . . . . . 8.4 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
81 82 83 85 86 87 88 90 91
......
93
...... ......
93 94
. . . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . .
Algorithms for Markers Detection on Facies Images of Human Biological Fluids in Medical Diagnostics . . . . . . . . . . . . . . . . . . Victor Krasheninnikov, Larisa Trubnikova, Anna Yashina, Marina Albutova and Olga Malenova 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Examples of Images of Biological Liquids Facies . . . . 9.3 The Image Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Algorithms for Markers Detection and Recognition . . . . . . . 9.5 Statistical Tests of Algorithms . . . . . . . . . . . . . . . . . . . . . . 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . .
. . . . . . . .
96 96 98 98 100 101 104 105
. . . 107
. . . . . . .
. . . . . . .
. . . . . . .
108 109 110 117 123 124 124
10 An Investigation of Research Activities in Intelligent Data Processing Using Data Envelopment Analysis . . . . . . . . . . . . . . . . . 127 Andrey V. Lychev, Aleksei V. Rozhnov and Igor A. Lobanov 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 10.2 The Foresight of Impending Smart Infrastructure from the Position of Pervasive Informatics . . . . . . . . . . . . . . . . 129
x
Contents
10.3 Data Envelopment Analysis Background . . . . . . . . . . . 10.4 System Integration of Research Activities in Geosocial Networking Using Data Envelopment Analysis . . . . . . . 10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 131 . . . . . . 134 . . . . . . 136 . . . . . . 137
11 Hybrid Optimization Modeling Framework for Research Activities in Intelligent Data Processing . . . . . . . . . . . . . . . . Aleksei V. Rozhnov, Andrey V. Lychev and Igor A. Lobanov 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Intelligent Data Processing and Object-Based Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Hybrid Optimization Modeling Framework . . . . . . . . . . 11.3.1 Functionality of Hybrid Optimization Modeling Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.2 Experimental Studies . . . . . . . . . . . . . . . . . . . . 11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 141 . . . . . 142 . . . . . 143 . . . . . 146 . . . .
. . . .
. . . .
. . . .
12 Non-local Means Denoising Algorithm Based on Local Binary Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. K. Kartsov, D. Yu. Kupriyanov, Yu. A. Polyakov and A. N. Zykov 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Description of Non-local Means Algorithm . . . . . . . . . . . . . . 12.4 Modified Non-local Means Algorithm . . . . . . . . . . . . . . . . . . 12.5 Non-local Means Based on Local Binary Patterns . . . . . . . . . . 12.6 Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 The Object-Oriented Simultaneous Localization and Mapping on the Spherobot Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vladimir A. Antipov, Vasilii P. Kirnos, Vera A. Kokovkina and Andrey L. Priorov 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Robot Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Proposed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Data Acquisition and Synchronization . . . . . . . . . . 13.3.2 Determining the Location of the Mobile Platform . . 13.3.3 Construction of Three-Dimensional Map . . . . . . . . 13.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
146 148 149 150
. 153 . . . . . . . .
154 155 156 158 160 161 162 163
. . . 165
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
165 166 167 167 168 170 172 173 175
About the Editors
Dr. Margarita N. Favorskaya is a Professor and Head of Department of Informatics and Computer Techniques at Reshetnev Siberian State University of Science and Technology, Russian Federation. Professor Favorskaya is a member of KES organization since 2010, the IPC member and the Chair of invited sessions of over 30 international conferences. She serves as an associate editor of Intelligent Decision Technologies Journal, International Journal of Knowledge-Based and Intelligent Engineering Systems, International Journal of Reasoning-based Intelligent Systems, a Honorary Editor of the International Journal of Knowledge Engineering and Soft Data Paradigms, Guest Editor, and Book Editor (Springer). She is the author or the co-author of 200 publications and 20 educational manuals in computer science. She co-authored and co-edited several books for Springer. She supervised nine Ph.D. candidates to completion and presently supervising four Ph.D. students. Her main research interests are digital image and video processing, remote sensing, pattern recognition, fractal image processing, artificial intelligence, and information technologies.
xi
xii
About the Editors
Dr. Lakhmi C. Jain, Ph.D., ME, BE (Hons), Fellow (Engineers Australia) is with the University of Technology Sydney, Australia, and Liverpool Hope University, UK. Professor Jain founded the KES International for providing a professional community the opportunities for publications, knowledge exchange, cooperation, and teaming. Involving around 5000 researchers drawn from universities and companies worldwide, KES facilitates international cooperation and generates synergy in teaching and research. KES regularly provides networking opportunities for professional community through one of the largest conferences of its kind in the area of KES. http://www.kesinternational.org/organisation.php
Chapter 1
Image Processing for Practical Applications Lakhmi C. Jain and Margarita N. Favorskaya
Abstract The chapter presents a brief description of chapters on image processing in different practical fields, from radar systems to medical applications. In spite of the fact that images can be multidimensional, additional dimensions extend the possibilities of methods and applications. Keywords Object detection · Kalman filter · Video watermarking · Camera trap · Medical diagnostic · Data envelopment analysis · Landmarks descriptors
1.1 Introduction Current technical achievements of humanity became possible due to the persistent work of scientists throughout the world. The main contribution of the book deals with the new techniques of forming and measurement of sub-pixel shift of digital images, characteristics of phase-energy image spectrum, detectors fields, comparative evaluation of algorithms for trajectory filtering, watermarking models of video sequences, experimental data acquisition and management software for camera trap data studies, two-stage method for polyps segmentation in endoscopic images, algorithms for markers detection in facies images of human biological fluids in medical diagnostics, research activities in intelligent data processing using data envelopment analysis, hybrid optimization modeling framework for research activities in intelligent data processing, non-local means denoising algorithm based on local binary patterns and object-oriented simultaneous localization and mapping on the spherobot platform.
L. C. Jain (B) University of Technology Sydney, Sydney, Australia Liverpool Hope University, Belle Vale, UK M. N. Favorskaya Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Ave., 660037 Krasnoyarsk, Russian Federation e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_1
1
2
L. C. Jain and M. N. Favorskaya
1.2 Chapters in the Book Chapter 2 introduces new methods of forming digital images with sub-pixel shift and measuring this shift. Modification of discrete Chebyshev transformation is suggested to create the images with given sub-pixel shift [1]. The proposed algorithm allows to generate the digital images with a given sub-pixel shift and arbitrary scale without additional oversampling and interpolation. The exact distribution of estimations on the output of discriminator at the presence noise is obtained. This distribution is nonGaussian and has got “heavy tails”. Therefore, it was proposed to use restrictions such as Tukey, Huber, and Hampel at the output of the discriminator. In this case, the algorithm generates the robust and consistent estimations. Statistical modeling of proposed estimations is performed. It is established that the analytical law of estimation distribution coincides with experimental one. Chapter 3 proposes a novel method to determine oscillation based on the analysis of one- and two-dimensional power spectra of the images or video sequences [2]. The authors assumed that one of the features confirmed object detection should be the determination of oscillation of the object or its parts in the image or videos. However, the information about oscillations mainly contain in 3D phase spectrum, which is difficult to extract. Nevertheless, phase-energy spectrum helps to solve this problem. The authors analyse complete information about amplitude and phase of image or video characteristics and obtain a real-valued function from phase-energy spectrum (i.e. weighted sum of cosines), which is convenient for the video analysis. Chapter 4 introduces a new term—detectors fields as a set of two-zone structures (detectors). This approach is inspired by biological-like methods of video content analysis based on two-zone structures. Also, a drift mechanism of the detectors field is described. Use of the detectors fields allows to reduce the size of the processed image but keep all significant features. Thus, it is possible to reduce information flow to the next step of image processing. Detectors field is an adaptive computational environment for object detection. The distinguishing feature of detectors field is size reduction of the image. Detectors field consists of two-zone structures able to overlap and vary in size. The retinal receptive field can be considered as a prototype of the detectors fields. Redundancy reduction facilitated the implementation of quasioptimal real-time filtering similar to Wiener–Hopf method. The practical tasks of motion blur and compensate can be solved using this approach. Chapter 5 is dedicated to the study of trajectory filtering algorithms based on the use of Kalman filter. A new algorithm for estimating trajectory parameters was synthesized based on the model in the body-fixed frame during observations in the spherical coordinate system. A mathematical modelling was performed using the known linear and nonlinear Kalman filters and proposed algorithm. The proposed approach consists in the quasi-linearization of the equations for the projections of new coordinates onto the axis of the Cartesian system [3]. Mathematical model constructed in MATLAB environment performs a comparative analysis and research of
1 Image Processing for Practical Applications
3
the effectiveness of the proposed modifications of linear and nonlinear Kalman filters. It was clarified that the use of filtering with adjustment in body-fixed coordinates is more effective than the algorithm based on the known Kalman filters. Chapter 6 covers the models for multilevel adaptive watermarking schemes [4, 5] of the uncompressed and compressed video sequences in the highlights of H.264/SVC standard. The architecture of SVC encoder is analyzed, and three strategies for watermark embedding using H.264/SVC standard are proposed. They are classified as the watermark embedding before video encoding (Strategy 1), integrated watermark embedding and coding (Strategy 2), and compressed-domain embedding after encoding (Strategy 3). Also, different criteria for embedding in videos as the recommendation measures for the multilevel protection are proposed. Some objective functions are considered and their linear combination with the weighed coefficients can serve as the generalized criterion for evaluation of the extracted watermark. Chapter 7 presents experimental data acquisition and management software for camera trap data studies. The study and preservation of biodiversity and regulation of the impact of human activity on ecosystems involves the analysis of big data, which cannot be implemented without intelligent software tools. Workflow of typical camera trap data management system includes three distinct stages: file management, data annotation, and data extraction. Analysis that follows data extraction is specific to a study and can be conducted by specialists. However, software system can prepare data for the analysis in automated ways. Since uneven illumination has a great influence on the background model formation and visual understanding of images, the modified Multi-Scale Retinex (MSR) algorithm, which utilizes wavelet transform to speed up the calculations [6], was used. In order to automate annotation, modules for automatic image importing with metadata extraction, semi-automatic background model formation module [7, 8], and sematic builder, which utilizes CNNs data, are introduced. The convolutional neural network training module, which allows to extend dataset by augmentations [9], is also included in system. Chapter 8 develops a method for early detection of pathology changes in the stomach, which makes it possible to ensure timely treatment and avoid more serious consequences. The authors combine the traditional approach based on the geometric primitives, color features, and texture descriptors with CNN approach, which is able to capture various features and can be used including the segmentation of polyps. Thus, an algorithm for segmentation of polyps that takes into account the characteristics and specificity of the problem of segmentation of polyps in endoscopic images is propose. The main idea of the proposed algorithm is to combine the advantages of both the above approaches under conditions of a substantially limited training base. The implementation of the algorithm includes two consecutive steps. Binary classification step provides a preliminary analysis of global image features using traditional machine learning technologies. The result of the preliminary classification is the decision about the presence of a polyp in the image. Segmentation step is based on the use of CNN with the purpose of segmentation of one or several polyps if their presence in the image was confirmed at the previous stage. The main experimental result is in that the use of binary classification as a preliminary segmentation
4
L. C. Jain and M. N. Favorskaya
stage increases Dice score more than 10% in conditionals of small database in CNN training. Chapter 9 describes a method for precise early diagnostics of different diseases based on examination of human biological liquids (blood, tears, cervical mucus, urine, etc.,) [10, 11]. A small drop of liquid is drawn on an object-plate and dried out slowly, thus a thin dry film (facies) remains. In the process of fluid crystallization there appear characteristic patterns (markers) in the facies. Each marker is a highly definite sign of some pathology even at an early stage of a disease development. The authors develop the algorithms for detecting several markers on facies images. First, the characteristic features (location, geometry, brightness, variation, spectrum, etc.) are revealed by means of their visual analysis of markers. Then, the methods of these features’ detection are applied. The decision about the presence of the marker is made using a set of necessary characteristics. Tests of algorithms showed that accuracy of correctly identified images achieves 86–98%. Chapter 10 reports the opportunities of intelligent data processing in object-based image analysis for location-based social networks. The intelligent transport systems and infrastructure technology solutions are of direct research interest. A pervasive space is characterized by the physical and informational interaction between the users and designed environment. The concept of the work is to present implements a vision of intelligent data processing task and elaboration of the efficiency evaluation using data envelopment analysis in research activities discussions on the problem investigations of advanced technology precursors. The core approach is a nonparametric method in operations research for the estimation of production frontiers for intelligent data processing tasks [12, 13]. The target setting corresponds to an initiative focused on a comprehensive discussion of geosocial networking formation issues and assessment of the quality of intelligent data processing on the basis of conceptual models of integration advanced technology of computer vision and location-based social networks, and innovative potential of distributed computer vision and collaborative innovation network. Chapter 11 is aimed at the implementation of effective commons-based peer production of the geosocial networking in the progressive movement of pervasive informatics based on investigation of geosocial networking using data envelopment analysis [14, 15]. The hybrid optimization modeling system includes a number of algorithms for efficiency analysis and multidimensional frontier visualization with the help of construction of two- and three-dimensional sections. In order to enhance effect from geosocial networking analysis, a projection system was applied, where 3D-sections of the frontier are generated using virtual reality. To create a visual stereo effect, two projectors with polarizing filters are used. On a special screen, two images for the left and right eyes are simultaneously formed. The screen has a special metallic surface that preserves the polarization of the images to be presented to each eye. Among the priority proposals on the use of the results of system integration in research activities and in the innovations of distributed computer and telecommunication networks based on the use of DEA technology, a generalization in the field of advanced computer vision for cyber-physical systems is outlined [16].
1 Image Processing for Practical Applications
5
Chapter 12 describes an algorithm for converting old documents to digital format greatly simplifies their archiving and searching. This algorithm is a filter that divides an input image into fragments and then processes each fragment separately using a block-based method. Each image fragment contains many blocks. The blocks are processed separately, and the similarity of the blocks inside the fragment is measured by the basis of Euclidean distance between the centers of the blocks and the brightness distance between the blocks. Comparison of blocks is implemented in a fragment window, but not between adjacent pixels. In the course of this comparison, blocks with similar brightness levels have more weight when averaging a pixel value. This proposition allows to call this algorithm as a non-local method. Chapter 13 describes an extraordinary robot, which looks like a ball and controlled by Wi-Fi communication. One can find a detail construction of this robot called as spherobot. By the wireless protocol, the robot streams the video data and data from encoders to the server. The sensors data broker on the robot is the Raspberry Pi Zero W. The robot is used fisheye lens with 260° on the camera for the getting much information as possible forming by a camera. The axis of the camera is directed vertically up. The displacement map of the environment is obtained using two consecutive images from different viewpoints. In such manner, 3D scene reconstruction can be achieved using special lens of robot camera.
1.3 Conclusions This chapter provides a brief description of the chapters included in the book with original algorithms and practical implementations in the field of image processing for multiple tasks such as technical, control, medical, and so on. The contents of this book reflect the main directions of investigations carried out at present time which form the basis of future intelligent systems.
References 1. Radchenko, Yu., Bulygin, A.: Methods for detecting of structural changes in computer vision systems. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-1: Mathematical Theory, ISRL, vol. 75, pp. 59–90. Springer International Publishing, Switzerland (2015) 2. Bogoslovsky, A., Zhigulina, I., Maslov, I., Mordovina, T.: Frequency characteristics for video sequences processing. In: Damiani, E., Howlett, R.J., Jain, L.C., Gallo, L., De Pietro, G. (eds.) Smart Innovation, Systems and Technologies, SIST, vol. 40, pp. 149–160. Springer, Switzerland (2015) 3. Saverkin, O.V.: Comparative analysis of digital radar data processing algorithms. In: Proceedings 2nd International Workshop on Radio Electronics and Information Technologies, pp. 120–126 (2017) 4. Favorskaya, M., Pyataeva, A., Popov, A.: Texture analysis in watermarking paradigms. Procedia Comput. Sci. 112, 1460–1469 (2017)
6
L. C. Jain and M. N. Favorskaya
5. Favorskaya, M.N., Jain, L.C. Savchina E.I.: Perceptually tuned watermarking using nonsubsampled shearlet transform. In: Favorskaya, M.N., Jain L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 136, pp. 41–69. Springer International Publishing Switzerland (2018) 6. Zotin, A.: Fast algorithm of image enhancement based on multi-scale retinex. Procedia Comput. Sci. 131, 6–14 (2018) 7. Zotin, A.G., Proskurin, A.V.: Animal detection using a series of images under complex shooting conditions. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. XLII-2/W12, 249–257 (2019) 8. Favorskaya, M., Buryachenko, V.: Selecting informative samples for animal recognition in the wildlife. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies SIST, vol. 143, pp. 65–75. Springer, Singapore (2019) 9. Favorskaya, M.N., Pakhirka, A.I.: Animal species recognition in the wildlife based on muzzle and shape features using joint CNN. In: Proceedings of the 23rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Budapest, Hungary (in print) (2019) 10. Krasheninnikov, V.R., Yashina, A.S., Malenova, O.E.: Markers detection on facies of human biological fluids. Procedia Eng. 201, 312–321 (2017) 11. Krasheninnikov, V.R., Trubnikova, L.I., Malenova, O.E., Yashina, A.S., Albutova, M.L., Marinova, O.A.: Algorithm for detecting block-like cracks in facies of human biological fluids. Image Process. Earth Remote. Sens. Information Technology and Nanotechnology 2018 (IPERS-ITNT 2018), 193–199 (2018) 12. Krionozhko, V.E., Lychev, A.V.: Algorithms for construction of efficient frontier for nonconvex models on the basis of optimization methods. Dokl. Math. 96(2), 541–544 (2017) 13. Krivonozhko, V.E., Førsund, F.R., Lychev, A.V.: Measurement of returns to scale in radial DEA models. Comput. Math. Math. Phys. 57(1), 83–93 (2017) 14. Krivonozhko, V.E., Førsund, F.R., Lychev, A.V.: Measurement of returns to scale using a non-radial DEA model. Eur. J. Oper. Res. 232(3), 664–670 (2014) 15. Abrosimov, V., Ryvkin, S., Goncharenko, V., Rozhnov, A., Lobanov, I.: Identikit of modifiable vehicles at virtual semantic environment. In: International Conference on Optimization of Electrical and Electronic Equipment and Intl Aegean Conference on Electrical Machines and Power Electronics, pp. 905–910 (2017) 16. Ryvkin, S., Rozhnov, A., Lobanov, I., Chernyshov, L.: Investigation of the stratified model of virtual semantic environment for modifiable vehicles. In: 20th International Symposium on Electrical Apparatus and Technologies, pp. 1–4 (2018)
Chapter 2
New Methods of Forming and Measurement of Sub-pixel Shift of Digital Images Yuriy S. Radchenko and Olga A. Masharova
Abstract Methods of forming digital images with sub-pixel shift and measuring this shift are developed. Modification of discrete Chebyshev transformation is suggested to create the images with given sub-pixel shift. Optimal net of Chebyshev samples (secondary readings) in digital images is calculated. Samples are calculated in zeros of Chebyshev polynomials. Indicators of non-integer shift in a form of discriminators are suggested. Discriminators algorithms approximate Newton–Raphson method of estimating maximum likelihood. Estimation distribution for some types of discriminators in the presence of noise is obtained. We found that the estimation distribution is non-Gaussian, with “heavy tails”. Limiters producing the stable estimations in order to subdue big output signal values are presented. Theoretical distributions of stable estimations are obtained. By means of statistical modeling, a coincidence between experimental and theoretical characteristics is established. The suggested algorithms are easy to calculate. Keywords Sub-pixel image shift · Discrete Chebyshev transformation · Image position estimation · Discriminators · Non-Gaussian distribution of statistics · “heavy tails” of distribution · Statistical modeling
2.1 Introduction The problem of synthesis and analysis of algorithms for estimating the shift of images or visual objects in frames is relevant and demanded in practice. First, it is an important part of signal compression and recovery algorithms in video coding problems [1]. Second, these problems are the main ones when estimating the position in a frame in video surveillance systems [2, 3]. Also, these problems arise in digital compensation of camera shakes and correction of turbulent distortions of visual objects in images [4, 5]. In these algorithms, estimation is non-integer. In modern systems, an image shift should be estimated with sub-pixel accuracy [1, 4, 5]. Recently, the problem of super resolution of images has been actively studied [6–11]. In this task, it is necessary to reconstruct a frame with a higher resolution Y. S. Radchenko (B) · O. A. Masharova Voronezh State University, 1, University Square, Voronezh 394018, Russian Federation © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_2
7
8
Y. S. Radchenko and O. A. Masharova
using a set of low-resolution frames. Among such algorithms, an important place is occupied by methods that used a sub-pixel estimation of image shift [5–8]. To estimate the efficiency of sub-pixel estimations in [7, 9–11], algorithm of mean square error calculation is proposed. It is obvious that this characteristic of estimation accuracy is applicable only in the case of Gaussian errors. However, use a number of shift measurement methods leads to non-Gaussian estimation [4, 5, 7]. Thus, the development of new algorithms for formation of the image frame with sub-pixel shift, estimation of such shifts, and finding the exact characteristics of estimations are the crucial problems of digital image and video processing. For solving these problems, a modification of the discrete Chebyshev transformation or Generalized Discrete Cosine Transformation (GDCT) [12] is proposed and investigated. This algorithm allows us to generate digital images with a given sub-pixel shift and arbitrary scale without additional oversampling and interpolation. Some types of discriminators are proposed to be used for estimating non-integer shift. Discriminator algorithms approximate Newton–Raphson numerical method for finding the maximum likelihood estimation. The method requires significantly less computational resources than search algorithms [1] or Kalman algorithms [7]. In this chapter, we obtain the exact distribution of estimations on the output of discriminator at the presence noise. This distribution is non-Gaussian and has got “heavy tails” [13]. Therefore, it is proposed to use restrictions such as Tukey, Huber, and Hampel at the output of the discriminator [14, 15]. In this case, the algorithm generates the robust and consistent estimations [16]. Statistical modeling of proposed estimations is performed. It is established that the analytical law of estimation distribution coincides with experimental one. The remainder of the chapter is as follows. Section 2.2 presents a shift algorithm based on discrete Chebyshev transformation. Position estimation in noisy images is given in Sect. 2.3. Section 2.4 provides analyze of autocorrelation function. Section 2.5 includes the shift’s estimation using discriminator. Section 2.6 concludes the chapter.
2.2 Shift Algorithm Based on Discrete Chebyshev Transformation The implementation of a sub-pixel shift is necessary when constructing an estimation algorithm of image shift with fractional precision. One of the possible ways to create the images with a given fractional bias is to use GDCT. Description of GDCT is given in [12]. This chapter presents GDCT application in order to provide an arbitrary image shift and zoom. Let the signal s(x, y) representing a fragment u(x, y)I (x, y) of the image is observed in the sub-domain {x, y} ∈ 0 . Here, I (x, y) is an indicator function of sub-domain 0 . Decompose an image using the classical orthogonal Chebyshev
2 New Methods of Forming and Measurement …
9
polynomials of the first type. Let ax , a y are the characteristic dimensions of subdomain 0 . Let z 1 = x/ax , z 2 = y/a y . In this case, a pair of transformations can be written as 1 Tm (z 1 )dz 1 1 Tk (z 2 )dz 2 s(z 1 , z 2 ), (2.1) Cm,k = (dm dk )−1 −1 1 − z 12 −1 1 − z 22 R(z 1 , z 2 , τ ) =
M
Cm,k Tm (z 1 + τx )Tk z 2 + τ y ,
m,k
where τ = τx , τ y is the signal shift vector, dm =
π, m = 0 is the norm of π/2, m = 0
Chebyshev polynomial. As shown in [12], the integrals in the expression for Cm,k can be computed by Gauss–Chebyshev formula with N nodes. These formulas algebraic
have highest π (2n+1) ,n = 0... N− degree of accuracy. To do this, the counts at points z n = cos 2 N √ 2 1 is taken. Herewith, the weight functions 1/ 1 − z are disappeared. Let’s move on to the discrete images. Assume that the signal s(x, y) is uniformly sampled in a block of N 1 × N 1 points (pixels). The ratio ν = N /N 1 will be called the sampling coefficient. The recovered signal R(x, y, τ ) will be considered as uniformly discretized in the sub-domain with L × L size. Ratio γ = L/N 1 determines the geometric scaling of the restored block. If γ < 1, then the restored frame is reduced. If γ > 1, then the restored frame is enlarged. A pair of transformations Eq. 2.1 goes to discrete transformations. Ckm = gk gm
2 N
N −1 N −1 i=0 j=0
Si j
(i + 0.5) cos π k N
( j + 0.5) cos π m , N
N −1 N −1 2 2n Rnl = − τn gk gm Ckm cos k · arc cos N k=0 m=0 L −1
2l − τl , cos m · arc cos L −1
(2.2)
√
(2.3)
0.5 μ = 0 μ = k, m, i, j = 0 . . . N − 1, k, m = 0 . . . N − 1, 1 μ>0 n, l = 0 . . . L − 1. Parameters τn , τl can take non-integer values. If τn,l ≤ 2/(L − 1), then the sub-pixel shift would be implemented. In transformations (Eqs. 2.2 and 2.3), there are two conversion factors ν and γ. Experiments show that scaling an image Rnl with good quality without quantizing the spectral coefficients Ckm is possible before values γ ≤ 4−5. This method can be used where gμ =
10
Y. S. Radchenko and O. A. Masharova
Fig. 2.1 Two options of Chebyshev sampling of the primary samples grid: a sampling « down » , b sampling « up »
to estimate the inter-frame shift of image fragments with sub-pixel accuracy in video coding. Scaling can be used to control the shear parameters (τn , τl ) in modeling. Experiments show the influence of the sampling coefficient ν on the quality of restored block Rn,l . Two options of Chebyshev sampling of the primary samples grid are shown in Fig. 2.1. Here, the intersection points of the lines form the primary grid of samples. Points denote the positions of zeros of Chebyshev polynomials forming the samples grid. In Fig. 2.1a, parameter ν = 6/8 represents a sampling “down”. In Fig. 2.1b, parameter ν = 10/8 represents a sampling “up”. When Chebyshev sampling, the positions of new samples can be calculated using interpolation formulas by four nearest pixels. Variants of sampling “down” or “up” are determined by the nature of solved problem. If it is solving the problem of information compression (video encoding), then it is advisable to use ν < 1. As shows in experimental studies of codecs, an additional degree of image compression during sampling “down” together with quantization of spectra allows to significantly reduce the entropy of message. At the same time, the quality of the restored image is almost as good as Discrete Cosine Transformation (DCT) conversion at a high speed of information transfer and surpasses DCT, when image compression is strong. In problem of fractional shift of the fragment without loss of image quality, it is advisable to take ν ≥ 1. Figure 2.2 shows an example of signal recovery s(z) = cos az 2 without shift (dashed line) and with shift (dash-dotted line). The initial signal on the chart is represented by points with an interval of half a pixel, the shift of the signal corresponds to the value of half a pixel too. The size of the transformation matrix is N × N = (10 × 10).
2 New Methods of Forming and Measurement …
11
Fig. 2.2 Example of signal recovery with shift
Table 2.1 Quality metrics of the restored image N1/N, N1 = 8
PSNR
MSSIM
N1/N, N1 = 10
PSNR
MSSIM
N1/N, N1 = 12
PSNR
MSSIM
8/6
34.17
0.85
8/8
35.67
0.89
10/8
34.52
0.864
12/12
35.66
0.91
10/10
35.51
0.9
12/14
36.33
0.93
8/9
35.9
0.89
10/12
8/10
36.33
0.89
10/14
36.59
0.914
12/16
37.06
0.94
36.41
0.91
Table 2.1 shows the quality metrics of the restored image with different block sizes N 1 and values of sampling parameter ν. It seems from Table 2.1 that sampling “up” provides a smaller error of the recovered signal. Optimal ratios N 1, ν = N /N 1 can be selected from Table 2.1. The optimum parameters ratios of oversampling N /N 1 are resistant to the type of images. Experiments with images “Lenna”, “Landscape”, “City street (London)”, aerial photograph and others confirm this conclusion.
2.3 Position Estimation in Noisy Images
r ) in area r ∈ be given. It is a Let two-dimensional field ξ( r ) = s r, l0 + n(
mixture of useful signal s r, l0 , r ∈ 0 , 0 ∈ , with unknown position l0 = r ) with spectral power density N0 /2. It l0x , l0y , and Gaussian uncorrelated noise n( is necessary to estimate a vector l0 . Applicable the maximum likelihood algorithm
12
Y. S. Radchenko and O. A. Masharova
to find the estimation lm = lmx , lmy . It is required to form a Logarithm of the Likelihood Ratio Functional (LLRF), which is provided by Eq. 2.4.
1 2 ξ( r )s r, l d r − s r, l s r, l d r M l = N0 N0
(2.4)
If we neglect the change in the form of signal when shifting in the region , then the second addend in Eq. 2.4 can be ignored. we highlight the deterministic and fluctuation comThen in LLRF denoted as M(l) √ ponents. After that, Eq. 2.4 has normalized to the signal/noise ratio q = 2E s /N0 , we obtain = q S l x − l0x , l y − l0y + N l x , l y , M(l)
(2.5)
is the normalized Autocorrelation Function (ACF), − l , l − l where S l x 0x y 0y
S l − l0 ≤ 1, N l x , l y is the normalized noise function with zero mean value,
unit variance, and correlation function < N l1 N l2 >= S l1 − l2 . If we denote as τlx , τly , then we can record: the correlation interval M(l)
l x − l0x l y − l0y = S θx , θ y , N l x , l y ≡ N θx , θ y , S l x − l0x , l y − l0y ≡ S , τlx τly
l f x − l0x l f y − l0y l x − l0x l y − l0y = θx , θ y , = θ f x, θ fy , , , τlx τly τlx τly
lmx − l f x lmy − l f y x y = (δx , δy ) , (λmx , λm y = , , , τlx τly τlx τly where lmx , lmy → λmx , λmy is the Maximum Likelihood Estimation (MLE). mismatch by the parameter of the observed signal θ f x , θ f y isthe normalized and the receiver, x , y is the discriminator’s channels mistuning by parameter l x , l y . − → Using this conversion, l0 goes into θ 0 ≡ 0. Accordingly, LLRF provided by Eq. 2.4 can be written as: − → − → − → M( θ ) = q S( θ ) + N ( θ ).
2 New Methods of Forming and Measurement …
13
2.4 Analyze of Autocorrelation Function − → Investigate the behavior of normalized ACF S( θ ). Consider one of the possible types of images. Test image Cityscape is given in Fig. 2.3a. It is necessary to determine the location of the object (car in Fig. 2.3b) in a frame. Analysis of object’s ACF (Fig. 2.4) show that the two-dimensional ACF in the area can be factorized to the product of one-dimensional ACF of the maximum S θx , θy ≈ S(θx )S θy . However, for the objects with an arbitrary foreshortening, such factorization is in ACF. This turn ϕ · θ not directly applicable, since there are cross-products θ x y can be calculated. For ACF S θx , θ y , there is a pair of transformations: G ωx , ω y =
¨
S θx , θ y = (1/2π)
S θx , θ y exp − j θx ωx + θ y ω y dθx dθ y , ¨
2
G ωx , ω y exp j θx ωx + θ y ω y dωx dω y .
Fig. 2.3 Test image: a Cityscape image, b object of interest (car)
Fig. 2.4 Object’s ACF in two projections
(2.6)
14
Y. S. Radchenko and O. A. Masharova
Fig. 2.5 Samples of images with factorizable ACF: a urban landscape, b landscape, c model field
Then, the rotation angle is [15, 16]: tg(2ϕ) =
G 12 . G 11 − G 22
From a pair of ratios for S(θx , θ y ) ↔ G(ωx , ω y ) in Eq. 2.6, it is followed that G 11 = G 22 =
G 12 =
θ2x S(θx , θ y )dθx dθ y =
θ2y S(θx , θ y )dθx dθ y =
θ2x S(θx )dθx , θ2y S(θ y )dθ y ,
θx θ y · S(θx , θ y )dθx dθ y .
Thus, we can move from the rotated image of the object to the view with neededforeshortening. In this case, it is possible to bring ACF to mind S θ ≈ , θ x y S(θx )S θy for a wide class of images. Samples of images with factorizable ACF are shown in Fig. 2.5. Figure 2.5c is a field obtained by modeling Habibi’s algorithm. ACF of this field is factorized.
2.5 Shift’s Estimation by Using Discriminator In general case, using Newton-Raphson’s numerical algorithm we can write an expression for maximum likelihood estimation in the following form: ⎞ − → − → − → − → ⎜ dx ( θ )d yy ( θ ) − d y ( θ )dx y ( θ ) ⎟ −⎝
− → − → − → 2 ⎠ dx x ( θ )d yy ( θ ) − dx y ( θ ) ⎛
θmx = θ f x
, θ=θ f
2 New Methods of Forming and Measurement …
⎞ − → − → − → − → ⎜ d y ( θ )dx x ( θ ) − dx ( θ )dx y ( θ ) ⎟ −⎝
− → − → − → 2 ⎠ dx x ( θ )d yy ( θ ) − dx y ( θ )
15
⎛
θmy = θ f y
, θ=θ f
where
−
−
−
−
−
− → → → → 2 → → dx θ = ∂ M θ ∂θx , ∂θx , d y θ = ∂ M θ ∂θ y , dx x θ = ∂ 2 M θ
−
−
−
− → → → → (2.7) d yy θ = ∂ 2 M θ ∂θ2y , dx y θ = ∂ 2 M θ ∂θx ∂θ y .
For factorizable ACF, the ratio S(θx , θ y ) ≈ S(θx )S(θ y ) is satisfied. Therefore, − → we can assume that M( θ ) = M(θx )M(θ y ). In this case, the mixed derivatives in Eq. 2.7 become zero near the point of maximum and so two separate estimates are obtained. If LLRF is not factorized, then the separate estimations are quasi-optimal. Further, we assume that separated estimations θmx , θmy of the position of the fragment are performed. The coordinate indices x, y are omitted. Section 2.5.1 presents a description of the discriminator structure. Distribution law of estimation is considered in Sect. 2.5.2. The robust estimate of signal parameter is given in Sect. 2.5.3.
2.5.1 Discriminator Structure Consider typical discriminatory algorithms that approximate separate MLE provided by Eq. 2.8–2.11, where k1 , k2 , k3 , k4 are some constants. They depend on the processing algorithm, but do not affect to the distribution of estimation. I.
One-step Newton algorithm (“optimal discriminator”): M θ f λm θ f = k1 . M θf
(2.8)
II. Finite-difference discriminator: λm θ f , δ = k 2
M θf + δ − M θf − δ . M θ f + δ − 2M θ f + M θ f − δ
(2.9)
III. Sum-difference discriminator: M θf + δ − M θf − δ . λm θ f , δ = k 3 M θf + δ + M θf − δ
(2.10)
16
Y. S. Radchenko and O. A. Masharova
Fig. 2.6 Measurer discriminatory characteristics: a truncated bell, b uniform fragment with smoothed contour
IV. Discriminators with automatic gain control: M θf + δ − M θf − δ λm θ f , δ = k 4 . M θf
(2.11)
Assume that in algorithms provided by Eqs. 2.8–2.11 SNR q 1. In this case, a noise component in Eqs. 2.8–2.11 can be neglected. Then it turns out the deterministic discrimination characteristics. As an example, consider two most common signal models: truncated bell s(x, a) = I (x/a) exp −x 2 and an uniform fragment with smoothed contour s(x, a) = 1 1 + (x / a)10 at value a = 1.5. Here, I (z) = 1, |z| ≤ 1. The study of discriminatory algorithms provided by Eqs. 2.8–2.11 show that the sum-difference algorithm provided by Eq.2.10 has the largest linearity zone at different values of the parameter δ. In Fig. 2.6a, a discriminatory characteristic for the algorithm provided by Eq. 2.9 with a signal in the form of a truncated bell is given. When mistuning of channels of discriminator δ = 0, 5. Figure 2.6b corresponds to signal in the form of an uniform fragment with smoothed contour, when δ = 1. In Figs. 2.7a and 2.8a, the model fragments of images are presented, and in Figs. 2.7b and 2.8b their ACF are depicted. Fragments in Fig. 2.7a and ACF from Fig. 2.7b correspond to the differentiable model fields, while fragments in Fig. 2.8a and ACF from Fig. 2.8b correspond to the non-differentiable model fields. In Fig. 2.9a–b, contour sections of ACF from Fig. 2.7a–b are shown, respectively. Here, points denote true non-integer positions of ACF maxima. These positions were estimated using algorithm Eq. 2.9. Application of algorithm provided by Eq. 2.9 to the discrete image allow to receive an error of an order estimation 10−2 −10−3 in the absence of noises. While a searching for the maximum on a discrete grid gives the accuracy of estimation at 1 pixel. It was found that for non-differentiable model of the ACF (Fig. 2.8b) inter-pixel interpolation does not improve the accuracy of the estimate. To estimate the shift vector using a discriminator, the number of operations is proportional to 4N 2 . Here, N × N is the block size in the frame. In H.264-265 video
2 New Methods of Forming and Measurement …
Fig. 2.7 Differentiable model: a image, b ACF of image
Fig. 2.8 Non-differentiable model: a image, b ACF of image
Fig. 2.9 Contour sections of ACF for: a differentiable model, b non-differentiable model
17
18
Y. S. Radchenko and O. A. Masharova
codecs, the number of operations for sub-pixel shift estimation is significantly higher [1]. Therefore, the discriminators provide a faster real-time estimation of the shift than existing codecs.
2.5.2 Distribution Law of Estimation As can be seen from Eqs. 2.8–2.11, discriminatory statistics λm can be presented in the form of λm =
ξ1 q M 1 + U1 = , ξ2 q M 2 + U2
where U1 ∼ N (0, D1 ), U2 ∼ N (0, D2 ), M 1 , M 2 are the deterministic components of the numerator and denominator of Eqs. 2.8–2.11, D1 , D2 are the variances of random components U 1 , U 2 . For algorithms provided by Eqs. 2.8–2.11, Gaussian random variables U 1 , U 2 are independent of each other, therefore, < U 1 U 2 > = 0. According to Eqs. 2.8–2.11, values M 1 , M 2 , D1 , D2 , have the following magnitudes: • For the algorithm provided by Eq. 2.8. M1 = S θ f , D1 = 1, M2 = S θ f , D2 = 3. • For the algorithm provided by Eq. 2.9. M1 = S(θ f + δ) − S(θ f − δ), D1 = 2(1 − S(2δ)), 0 ≤ D1 ≤ 2, M2 = S(θ f + δ) − 2S(θ f ) + S(θ f − δ), D2 = 2[3 − 4S(δ) + S(2δ)], 0 ≤ D2 ≤ 6. • For the algorithm provided by Eq. 2.10. M1 = S(θ f + δ) − S(θ f − δ), D1 = 2(1 − S(2δ)), M2 = S(θ f + δ) + S(θ f − δ), D2 = 2(1 + S(2δ)), 2 ≤ D2 ≤ 4. • For the algorithm provided by Eq. 2.11. M1 = S(θ f + δ) − S(θ f − δ), M2 = S θ f , D2 = 1.
D1 = 2(1 − S(2δ)), 0 ≤ D1 ≤ 2,
By entering the notation. ξ2 D2 M2 M1 M1 M2 M1 t = √ , μ= , χ= √ , √ = = χλ0 , = λ0 , √ D1 M 2 D2 M2 D2 D2 D2
2 New Methods of Forming and Measurement …
19
Fig. 2.10 Distribution of estimation
we can write the probability density of λm in a view of Eq. 2.12. √ ∞ (t − qχ)2 + μ(λ t − qχλ0 )2 μ |t| exp − W (λ, q) = dt 2π −∞ 2
(2.12)
For analysis the effect of mismatch θ f and SNR value q on the characteristics of the distribution of λm , the dependencies W (λ, q) were constructed. In Fig. 2.10 , the dependence of the distribution of λm for SNR q = 2, 3, 5 and for parameter θ f = 1 is presented. The points of the experimental estimation of the probability density Y (λ) also can be seen here. The experimental probability density Y (λ) is the Parsen estimate of statistics λm with volume n = √[17] of sample 3000. The function K (u) = (1/ 2π h 2 ) exp −u 2 /2h 2 were used as a kernel, where h = 0.04. The correspondence of distributions is shown not only graphically, but also by Kolmogorov’s criterion at α ≤ 0.1. For the distribution Eq. 2.12 it is possible to obtain an explicit analytical form in √ μ 1 two cases: if SNR q = 0 and q 1. In the first case, W (λ, q = 0) = π 1+μλ 2. This is the Cauchy distribution. In the second case, after asymptotic integration by Laplace method [18], we obtain Eq. 2.13.
√ μqχ |1 + μλλ0 | 1 2 2 μ(λ − λ0 )2 q · exp − χ · W (λ, q) = √ 2 1 + μλ2 2π (1 + μλ2 )3/2
(2.13)
From the analysis of Eqs. 2.12–2.13 and statistical modeling, it follows that at finite SNR the distribution of the estimate has “heavy tails”. Therefore, the estimate has infinite variance and is untenable. h Consider the behavior of the exit probability Pa (q, h) = 1 − −h W (λ, q)dλ statistics λm over the limits [−h, h]. The results of calculation Pa (q, h) are shown in Fig. 2.11.
20
Y. S. Radchenko and O. A. Masharova
Fig. 2.11 Probability of finding statistics out of range [−h, h] at different signal/noise ratios
As seen from Fig. 2.11, in the behavior of probability Pa (q, h) it can be traced two patterns. At small levels h ≤ 1, the probability values are determined by the central part of the distribution and decrease rapidly with growth of h and q. At large values of h > 1, the behavior of probability Pa (q, h) is determined by slowly decreasing “heavy tails” with growth of h. Pa (q, h) decreases rapidly with an increasing of q at values h > 1. The problem of limiting large values of the estimate λm is arisen. Consider a modified assessment.
2.5.3 Robust Estimate of Signal Parameter Let’s apply a nonlinear transformation η = g(λ) to the statistics λm on the output of the discriminator in order to overwhelm large values of λ. Let’s use some recommended transformations [19, 20]. Estimated function of Tukey: η = g(λ) =
Denote P + =
∞
Wλ (u)du, P − =
a
λ, |λ| ≤ a , a = 1. 0, |λ| > a
−a −∞
(2.14)
Wλ (u)du. Then
Wη (u) = (P + + P − )δ(u) + I (u < a)Wλ (u),
(2.15)
where δ(u) is the delta function. Estimated function of Huber: η = g(λ) =
a · sign(λ), |λ| ≥ a , a = 1. |λ| < a λ,
(2.16)
2 New Methods of Forming and Measurement …
21
In this case, distribution Wη (u) has the form of Eq. 2.17. Wη (u) = P − δ(u + a) + P + δ(u − a) + I (u/a)Wλ (u)
(2.17)
Estimated function of Hampel. The nonlinear transformation in the case of the Hampel function has the form of Eq. 2.18. ⎧ λ, 0 L2 > L3 > L4
a -М
a
b l+δn
М
r-δn
Fig. 3.7 Video signal of the row taking into account the horizontal changes in object’s image projections
The function ϕ(0, n) for this type of the motion is described by Eq. (3.9), where δn affects only absolute value of the function. ϕ(0, n) =
l+δ n −1 m=−M
2ma + 2
r −δn m=l+δn
2mb + 2
M
2ma 2 =
m=r −δn +1
= a 2 − b2 (2δn + l − r − 1)(l + r )
(3.9)
Figure 3.8 illustrates one frame of the real video signal. The video sequence contains the object, which is moving from the right to the left side influenced by the motion blur. The levels of the video signals of the background and object are equal a = 241 and b = 114, respectively, the length of the frame row is equal 2 M + 1 = 1281. Parameters l = 95 and r = 626 are determined by those frame of video sequence, when the object reached the frame edge. The value d = 90 is found by frame subtraction. Figure 3.9 illustrates the function ϕ(0, n) for following cases: solid line represents the function calculated using Eq. (3.2); dashed- and dotted-lines represent the functions obtained by using Eqs. (3.4) and (3.7), respectively. Fig. 3.8 An example of the frame with the moving object
3 The Characteristics of the Phase-Energy Image Spectrum
2,5
31
φ(0,n) ×1010
2 1,5 1 0,5 0 -0,5 -1 -1,5 -2 1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
n
Fig. 3.9 The zero-phase amplitude functions of the phase-energy spectrum using different equations
Analysis of the functions confirms that the equations presented in this study characterize the real processes satisfactorily. The abscissas of maxima and minima coincide in all cases. The mismatch in the absolute values of the ordinates of maxima and minima are caused by the motion blur of the analysed object. Differences between functions defined for cases of taken motion blur into account or not can be explained by the wide area of transition pixels on the edge of the image influenced by motion blur and by non-uniformity of the object and background brightness. In general, motion blur may be neglected for the described above video sequence and, thus, a simpler model can be used.
3.3 The Model of Two-Dimensional Phase-Energy Spectrum Let us generalize the phase-energy spectrum for the two-dimensional case [15]. Figure 3.10 illustrates the image, which contains (2M + 1) × (2N + 1) counts of Fig. 3.10 The image of (2M + 1) × (2N + 1) pixels size
N
y fk,r M x
-M -N
32
A. V. Bogoslovsky et al.
the video sequence. 2D-Fourier transform F˙ ϕx , ϕ y of this image can be written as Eq. (3.10), where ϕx , ϕ y ∈ [−π, π] are the normalized spatial frequencies; i 2 = −1. F˙ ϕx , ϕ y =A ϕx , ϕ y + i B ϕx , ϕ y = =
N M k=−M r =−N
N M f k,r cos kϕx + r ϕ y + i f k,r sin kϕx + r ϕ y k=−M r =−N
(3.10) In the previous study [3], the vector function I ϕx , ϕ y was determined with com ∂(ϕx ,ϕ y )
∂(ϕx ,ϕ y ) ponents I ϕx , ϕ y = , where ϕx , ϕ y ; S ϕ , ϕ S ϕ , ϕ x y x y ∂ϕx ∂ϕ y is the phase-frequency image spectrum, S ϕx , ϕ y is the power spectrum of the image. However, the components of this vector were considered only along some columns and rows and, thus, represented the phase-energy spectrums of columns and rows. Figure 3.11 shows the vector v = {k0 , r0 }. The phase-energy spectrum along ∂ ϕ ,ϕ arbitrary direction is defined by Eq. (3.11), where ( ∂vx y ) is the derivative of the function ϕx , ϕ y along vector v. ∂ ϕx , ϕ y · S ϕx , ϕ y · | v| Iv ϕx , ϕ y = ∂v
(3.11)
Equation (3.11) can be representing as a scalar product (Eq. 3.12). (3.12) Iv ϕx , ϕ y = v · A ϕx , ϕ y grad B ϕx , ϕ y − B ϕx , ϕ y grad A ϕx , ϕ y The first factor of Eq. (3.12) coincides with one-dimensional PES if the direction of the vector v coincides with the direction of O ϕx or O ϕ y axis. Thus, one-dimensional PESs can be considered as components of the vector (Eq. 3.13), and vector I ϕx , ϕ y can be considered as the two-dimensional PES. Fig. 3.11 Spatial frequency domain
φy
-π
r0
v
π
π φx
k0
-π
3 The Characteristics of the Phase-Energy Image Spectrum
33
I ϕx , ϕ y = A ϕx , ϕ y grad B ϕx , ϕ y − B ϕx , ϕ y grad A ϕx , ϕ y general case, the components of the vector field In I1 ϕx , ϕ y ; I2 ϕx , ϕ y are determined by Eq. (3.14).
(3.13)
− → I ϕx , ϕ y
=
⎧ M N M N ⎪ ⎪ ⎪ ⎪ I1 ϕ x , ϕ y = l f k,r fl, p cos (k − l)ϕx + (r − p)ϕ y ⎪ ⎪ ⎨ k=−M r =−N l=−M p=−N M ⎪ ⎪ ⎪ ⎪ ⎪ I2 ϕ x , ϕ y = ⎪ ⎩
N
M
N
p f k,r fl, p cos (k − l)ϕx + (r − p)ϕ y
k=−M r =−N l=−M p=−N
(3.14) − → For example, the components of the vector field I ϕx , ϕ y for a case of M = N = 1 can be found using Eqs. (3.15)–(3.16). 2 + f2 + f2 2 2 2 I1 ϕx , ϕ y = f 1,1 1,0 1,−1 − f −1,1 − f −1,0 − f −1,−1 + + 2 f 1,1 f 1,0 + f 1,−1 f 1,0 − f −1,1 f −1,0 − f −1,−1 f −1,0 cos ϕ y + + f 1,1 f 0,1 + f 1,0 f 00 + f 1,−1 f 0,−1 − f −1,1 f 0,1 − f −1,0 f 0,0 − f −1,−1 f 0,−1 cosϕx + + f 1,1 f 0,0 + f 1,0 f 0,−1 − f −1,−1 f 0,0 − f −1,0 f 0,1 cos ϕx + ϕ y + + f 0,1 f 1,0 + f 0,0 f 1,−1 − f −1,0 f 0,−1 − f −1,1 f 0,0 cos ϕx − ϕ y + + f 1,1 f 0,−1 − f −1,−1 f 0,1 cos ϕx + 2ϕ y + + f 1,−1 f 0,1 − f −1,1 f 0,−1 cos ϕx − 2ϕ y + + 2 f 1,1 f 1,−1 − f −1,1 f −1,−1 cos 2ϕ y (3.15) 2 + f2 + f2 2 2 2 I2 ϕx , ϕ y = f 1,1 0,1 −1,1 − f −1,−1 − f 0,−1 − f 1,−1 + + 2 f 1,1 f 0,1 + f −1,1 f 0,1 − f 1,−1 f 0,−1 − f −1,−1 f 0,−1 cos ϕx + + f 1,1 f 1,0 + f 0,1 f 00 + f −1,1 f −1,0 − f 1,−1 f 1,0 − f 0,−1 f 0,0 − f −1,−1 f −1,0 cosϕ y + + f 1,1 f 0,0 + f 0,1 f −1,0 − f −1,−1 f 0,0 − f 0,−1 f 1,0 cos ϕx + ϕ y + + f 0,0 f −1,1 + f 1,0 f 0,1 − f 0,0 f 1,−1 − f 0,−1 f −1,0 cos ϕx − ϕ y + + f 1,1 f −1,0 − f −1,−1 f 1,0 cos 2ϕx + ϕ y + + f 1,0 f −1,1 − f −1,0 f 1,−1 cos 2ϕx − ϕ y + + 2 f 1,1 f −1,1 − f 1,−1 f −1,−1 cos 2ϕx (3.16)
The one-dimensional PES for the same case of M = N = 1 is described by equation 2 + f 0 ( f 1 − f −1 ) cos ϕ. I (ϕ) = f 12 − f −1 The curl of the vector field (Eq. 3.14) is the cross product (Eq. 3.17). rot I ϕx , ϕ y = 2grad A × grad B
(3.17)
The curl of the flat field has only one non-zero component, which can be calculated by Eq. (3.18).
34
A. V. Bogoslovsky et al.
Fig. 3.12 Circuit of integration in the spatial domain
π
φy
α -π
L α
0
π φx
-π rot I ϕx , ϕ y = =
N M N M
(lr − kp) f k,r fl, p sin 2kϕx + 2r ϕ y =
k=−M r =−N l=−M p=−N M N M N
=
(lr − kp) f k,r fl, p ×
k=−M r =−N l=−M p=−N
× sin (k − l)ϕx + (r − p)ϕ y + sin (k + l)ϕx + (r + p)ϕ y
(3.18)
Figure 3.12 shows the closed-circuit L. The circulation of the vector I ϕx , ϕ y along this circuit is calculated using Eq. (3.19).
I1 ϕx , ϕ y dϕx + I2 ϕx , ϕ y dϕ y =
C= L
=4
M N M N
(lr − kp) f k,r fl, p ×
k=−M r =−N l=−M p=−N
α(r + p) α(k + l + r + p) 1 α(k + l) sin sin + sin 2 2 2 (k + l)(r + p) α(r − p) α(k − l + r − p) 1 α(k − l) sin sin (3.19) + sin 2 2 2 (k − l)(r − p) ×
In particular, for α = π the circulation is calculated using Eq. (3.20). ⎞
⎛ ⎜ ⎜ M ⎜ C = k⎜ ⎜ 2π k=−M ⎜ ⎝
r + p = 2n + 1 n = 0, ±1, . . . , ±(N − 1)
f k,r f −k, p −
r − p = 2n + 1 n = 0, ±1, . . . , ±(N − 1)
⎟ ⎟ ⎟ f k,r f k, p ⎟ ⎟+ ⎟ ⎠
3 The Characteristics of the Phase-Energy Image Spectrum
35 ⎞
⎛ ⎜ ⎜ N ⎜ + r⎜ ⎜ r =−N ⎜ ⎝
k + l = 2n + 1 n = 0, ±1, . . . , ±(M − 1)
f k,r fl,−r −
k − l = 2n + 1 n = 0, ±1, . . . , ±(M − 1)
⎟ ⎟ ⎟ f k,r fl,r ⎟ ⎟ ⎟ ⎠
(3.20) In the case if α = π and M = N = 1, Eq. (3.17) can be transformed to Eq. (3.21) used to calculate the circulation. C/2π = f −1,0 f −1,1 + f −1,−1 − f 1,0 f 1,1 + f 1,−1 − − f 0,1 f 1,1 + f −1,1 + f 0,−1 f 1,−1 + f −1,−1
(3.21)
The circulation circuit covered all spatial frequencies ϕx , ϕ y ∈ [−π, π] is equal zero. Figure 3.13 illustrates the image part of the size of 7 × 7 counts (pixels) of video signal. The pixels included to all product from Eq. (3.21) are connected by double arrows. The analysis of Fig. 3.13 shows that the central count f 0,0 is excluded from the circulation. The products of the counts laying on the axis are absent in Eq. (3.21). n = The circulation (Eq. 3.21) can be calculated using Eq. (3.22), where S2i−1,0 M−2i+1 N −2 j+1 m f m, n f m,n+2 j−1 are the counts of m=−M f m,n f m+2i−1,n , S0,2 j−1 = n=−N autocorrelation coefficient for the n row and m column, respectively. n C −n −n n = + S2M−1,0 − S2M−1,0 + n 2 S2i−1,0 − S2i−1,0 2π n=1 i=1 N
+
M−1
M N −1 −m −m m m + S m 2 S0,2 − S − S j−1 0,2N −1 0,2 j−1 0,2N −1
(3.22)
m=1 j=1
The vector field of PES is the two-dimensional function of the frequencies ϕx , ϕ y . It contains complete information about the image. Thus, it is possible to use the full capacity of the vector analysis in image processing. The proposed approach can be intuitively generalized for video sequences.
3.4 Conclusions The phase-energy spectrum is the real function with the features of the phasefrequency, as well as, the power spectra. It is generalized for images and video sequences and contained complete information about processed structures. However, in contrast to well-known characteristics, the phase-energy spectrum is the vector function of the real arguments. So, it is possible to use the methods of the field
36
A. V. Bogoslovsky et al.
Fig. 3.13 The image with a size of 7 × 7 pixels (M = N = 3). The weight coefficients of the products are colour-coded as following: «6»—red, «4»—blue, «3»—green, «2»—yellow, «1»—black. The solid lines connect the counts, which have a positive sign of a number after multiplication, while the dashed lines connect the counts, which have a negative sign of a number after multiplication
theory for its analysis and calculate different parameters as, e.g. vorticity. The use of the vector field expands the possibilities for processing and understanding the video information.
References 1. Martynova, L.A., Koryakin, A.V., Lantsov, K.V., Lantsov, V.V.: Determination of coordinates and parameters of movement of object on the basic of processing of images. Comput. Opt. 36(2), 266–273 (in Russian) (2012) 2. Favorskaya, M.: Motion estimation for objects analysis and detection in videos. In: Kountchev, R, Nakamatsu, K. (eds) Advances in Reasoning-Based Image Processing Intelligent Systems, ISRL, vol. 29, pp. 211–253. Springer, Berlin Heidelberg (2012)
3 The Characteristics of the Phase-Energy Image Spectrum
37
3. Smirnov, P.V.: Allocation on sequence of images the area of moving object. Izv. SSC RAN 16(6), 595–599 (in Russian) (2014) 4. Vaca-Castano, G., Lobo, N.D., Shah, M.: Holistic object detection and image understanding. Comput. Vis. Image Underst. 181, 1–13 (2019) 5. Beymer, D., Poggio, T.: Image representations for visual learning. Science 272(5270), 1905– 1909 (1996) 6. Ikeuchi, K.: Computer Vision. Springer, Boston (2014) 7. Granlund, G.H., Knutsson, H.: Vector and tensor field filtering. In: Granlund, G.H., Knutsson, H. (eds.) Signal Processing for Computer Vision, pp. 343–365. Springer, Boston (1995) 8. Bogoslovsky, A., Zhigulina, I., Maslov, I., Mordovina, T.: Frequency characteristics for video sequences processing. In: Damiani, E., Howlett, R.J., Jain, L.C., Gallo, L., De Pietro, G. (eds.) Smart Innovation, Systems and Technologies, SIST, vol. 40, pp. 149–160. Springer, Switzerland (2015) 9. Bogoslovsky, A.V.: Processing of Multidimensional Signals, vol. 1: Linear Multidimensional Discrete Processing of Signals. Methods of the Analysis and Synthesis. Moscow, Radiotekhnika (in Russian) (2013) 10. Bogoslovsky, A.V., Zhigulina, I.V.: Power approach to processing of signals and video sequences. In: 15th International Conference Digital Signal Processing and Its Applications, pp. 262–264. Moscow (in Russian) (2013) 11. Zhigulina, I.V.: Power characteristics of images and video sequences. In: 13th International Conference Television: Transfer and Processing of Images, pp. 128–131. St.-Petersburg (in Russian) (2016) 12. Shahshahani, M., Targhi, A.T.: A simple set of numerical invariants for the analysis of images. Imaging Syst Technol. 16, 240–248 (2006) 13. Bogoslovsky, A.V., Zhigulina, I.V., Sukharev, V.A., Ponomarev, A.V.: Correlation analysis of limited in space 1D signals. Radiotekhnika 12, 4–7 (in Russian) (2017) 14. Bogoslovsky, A.V., Sukharev, V.A., Zhigulina, I.V.: Video sequences analysis for periodic or quasi-periodic motion detection. Radiotekhnika 11, 7–12 (in Russian) (2018) 15. Bogoslovsky, A.V., Sukharev, V.A., Zhigulina, I.V.: A vector field of the phase-power spectrum of an image and a video sequence. Radiotekhnika 11, 13–17 (in Russian) (2018)
Chapter 4
Detectors Fields Andrei V. Bogoslovsky, Andrey V. Ponomarev and Irina V. Zhigulina
Abstract In this chapter, we proposed biological-like methods of video content analysis. The method is based on two-zone structures. We also introduced a new term—detectors fields, which represents a set of two-zone structures (detectors). Moreover, we proposed and described a drift mechanism of the detectors field. We show that using the detectors fields allows to reduce the size of the processed image but keep all significant features. Thus, it is possible to reduce information flow to the next step of image processing. Redundancy reduction facilitated the implementation of quasi-optimal real-time filtering similar to Wiener–Hopf method. It is shown that processing of the distorted image by detectors field enables to determine characteristics of motion blur and compensate it in the output image. Keywords Detectors field · P-, n-detector · Drift · Image · Redundancy reduction · Digital filter · Object detection · Motion blur
4.1 Introduction Artificial neural networks inspired by the processing of visual information in the visual cortex [1–3] capable of analysis not only image primitives but also higherorder image features. However, they have a degree of limitation comparing to human vision [4–6]. Retinal processing is usually skipped or at best reduced to ON and OFF centre-surround receptive field selection [7–9]. Thus, the important functions of the visual system relevant to low-level visual processing are often ignored in computer vision implementation. These functions might include: • Data compression. Information flow to the retina is about 10 GB/s, however, already from photoreceptors to the optic nerve this value reduces to 6 Mb/s and only 10 kb/s arrive to visual cortex [10, 11].
A. V. Bogoslovsky · A. V. Ponomarev · I. V. Zhigulina (B) Military Educational and Scientific Center of the Air Force “N.E. Zhukovsky and Y.A. Gagarin Air Force Academy”, 54a, Staryh Bolshevikov Str., Voronezh 394064, Russian Federation © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_4
39
40
A. V. Bogoslovsky et al.
• Eye movement. The human eye performs involuntary movement such as ocular micro tremor, saccades, ocular drift, etc. Without relative target motion, stationary objects are not perceived by human retina [12]. • Overlapping of ON and OFF subfields. The retina has a heterogeneous structure with the high resolution in the centre and lowers on the periphery and, thus, provides detailed information with a relatively small size of the visual system and low energy consumption. Due to the retinal processing, the human brain has an appropriate size and energy consumption [13–15] and should be also considered in designing the technical system. Thus, pre-processing of visual input can remarkably reduce information flow with maintaining efficient coding during such complicated tasks as, e.g. object detection within an image or video sequence. One of the prospects in that regard is the development of biological-like methods based on moving elements with a two-zone structure. The reminder of the chapter is organized as follows. Section 4.2 describes the primitive detectors field. Drift of the detectors field is considered in Sect. 4.3. Section 4.4 describes two-dimensional discrete filtering of detectors fields for output signals. Experimental studies are provided by Sect. 4.5. Using detectors field filtering in images affected by motion blur is considered in Sect. 4.6. Conclusions are drawn in Sect. 4.7.
4.2 The Primitive Detectors Field Detectors field is an adaptive computational environment or computational forming environment (in the case of implementation directly on a detector) for object detection. The distinguishing feature of detectors field is size reduction of the image. Detectors field consists of two-zone structures able to overlap and vary in size. The retinal receptive field can be considered as a prototype of the detectors fields [16]. Figure 4.1 illustrates the structure of a simple detectors field with a size of (2M + 1) × (2N + 1) pixels. Each element of the detectors field (i.e. detector) has a double-zone structure similar to ON- and OFF-centres of the retinal ganglion cells which are activated or inhibited in a response to visual stimuli. This is taken into account in the detectors field by the sign of the response. Let us consider central and peripheral rectangle zones with a size of Z c = m × n, Z p = 3m × 3n. Let’s assume that the central zone gives a negative response. Then, Eq. 4.1 describes the response of the (i, j) detector, where fl,v is the video signal image counts, or counts of its pixels, l ∈ [−M; M], v ∈ [−N ; N ]. θi, j =
(l,v)∈Z p
fl,v − 8
(l,v)∈Z c
fl,v
(4.1)
4 Detectors Fields
41
Fig. 4.1 The structure of the detectors field: Z c , Z p are central and peripheral zones of the detector
Detector N
Y
Zp
Zc
M
M
X
N The detector response, as well as, the response of ON- or OFF-centres to the uniform light should be equal zero. It is important to take into account that there is a different number of pixels in the peripheral and central zones. Thus, the second sum in Eq. 4.1 has a coefficient. The response of each 2N(i, j) detector can be written as +1 × . counter of a new reduced size image 2M+1 3m 3n Generally, an object edge orientation relative to the detector is arbitrary (see Fig. 4.2). The size of the detectors is very small, so the object edge can be considered as a line. It is possible to define the position of the object relative to the detector by two points of intersection with the edge of the detector. Moreover, we can assume that a and b are constant within one detector. Then, the detector response can be found using Eq. 4.2, where Sb and Sa are the areas of all parts of the detector taken by signals b and a, respectively, Sbc and Sac are the areas of the central part of the detector taken by signals b and a, respectively. Fig. 4.2 A possible contrast edge orientation relative to the detector
a
b
Sb
S bc
Sac
Sa
42
A. V. Bogoslovsky et al.
Fig. 4.3 Example of detectors field: a part of the detectors field, b its response, c case of missing edge
θ = (Sb − 9Sbc )b + (Sa − 9Sac )a = (b − a)(Sb − 9Sbc ) =(a − b)(Sa − 9Sac ) (4.2) It follows from Eq. 4.2 that the detector gives a response to contrast = (b − a), as well to the brightness jump orientation within the detectors field. Figure 4.3a illustrates a part of the detectors field consisted of 16 detectors. The dark region corresponds to a = 20, while the light region corresponds to a = 200. The response of the detectors is shown in Fig. 4.3b. The θ value and its coefficient indicate the edge in the detectors field. Missing edge and even objects are possible for a primitive detectors field. The response of the detector can be equal to zero not only under the uniform light condition. This case is shown in Fig. 4.3c and the condition of a missing edge is described by Eq. 4.3. (Sb − 9Sbc ) = (Sa − 9Sac ) = 0
(4.3)
The human visual system solves this problem with one of involuntary eye movement called ocular drift. In computer system, it can be solved by the mutual movement of detectors field relatively to the image either programmatically or mechanically. The distinctive feature of the detectors field effect is the edge detection. It is important that as a larger aperture of the detector as smaller the size of the output image. Nevertheless, the shapes of the objects remain the same.
4.3 Drift of the Detectors Field Ocular drift is the human visual system is a necessary mechanism in sharp edge detection. An eye performs micro roaming motion. Herewith, there is a forward shift
4 Detectors Fields
43
Fig. 4.4 The drift of the detectors field
of the image and selectable edges within several receptive fields. This ensures a constant edge visibility regardless of their orientations [17]. A little shift of detectors field within horizontal or vertical plane allows realization drift mechanism in the computer vision systems. At the same time, it is possible to remove the negative effect from Eq. 4.3. In this case, an important parameter is an offset. Based on researches in the retina, it is known that the centers of the ganglion cells’ receptive field of the same type are not overlapped. The centres of the receptive field are placed at a distance of one central zone diameter [18]. Thus, the offset of the detectors field should be determined by the size of the detector central zone, which is {0; ±2n} pixels vertically and {0; ±2m} pixels horizontally. The offset example of one detector is shown in Fig. 4.4. The drift of the detectors field with close central zones one to another without overlapping provides counts for maintaining objects edges integrity in the image. The field’s drift causes to increasing of counts by nine times compared to primitive 2N +1 × pixels. detectors field. The size of the output image is 2M+1 m n Figure 4.5 illustrates the work of detectors field for one-dimensional case. Figure 4.6 shows an example of standard test image “Lenna” with a resolution of 512 × 512 pixels, the results of edge detection using different methods of edge detection [19] and the results of applying detectors fields. By visual comparison of images in Fig. 4.6, one can conclude that detectors fields allow remaining contours of the original image, however, dramatically reduces image size. This solves the scaling problem in the object recognition by the prototype. Moreover, consecutive «MaxPooling» operations used in convolutional neural networks are no longer required for feature map size reduction [20].
4.4 Two-Dimensional Discrete Filtering of Detectors Fields for Output Signals The quasi-optimal filtering based on the measurement of energy signatures of input and prototype images using “image-background” applicative model has some limitations [21], for example, variance to image scale and complicated calculation in
44
A. V. Bogoslovsky et al.
Fig. 4.5 Image processing by the detectors field (one-dimensional case): a counts f input [m] of input image row, b the position of one-dimensional detectors during drift (length is 3 pixels), c counts f [m] provided by detectors while drift, d output f output [m] counts of the detectors field
order to determine energy signatures, which are dependent on the input and prototype image sizes. These limitations can be resolved by pre-detecting the image. Thus, the filtering process can be divided into 4 steps. First, an array of So elements of prototype energy signatures are created (i.e. counts array of autocorrelation coefficient). The array elements are calculated M using NEq. 4.4, 2 where i, i + k1 ∈ [−Mo , Mo ], j, j + k2o ∈ [−No , No ], E input = i=−M j=−N f i, j o is the energy of an input image, f i, j is the counts of prototype video signal.
s o (k1 , k2 ) =
⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
1 E input
M o −k1 N o −k2 i=−Mo j=−No
o f i,o j f i+k , k1 , k2 ≥ 0 1 , j+k2
(4.4) 1 E input
M o −k1
No
i=−Mo j=−No −k2
o f i,o j f i+k , k1 ≥ 0, k2 < 0 1 , j+k2
For k1 < 0 array elements can
be determined
by knowing that the energy spectrum is centrosymmetric: s o k1 , k2 = s o −k1 , −k2 . response counts determination, the array should consist of
impulse oFor o . If the apers k1 , k2 elements. So, array should be represented as a vector S ture is known, the “spiral scanning” is optimal to use [22] because increasing of the absolute value of the array indexes starts from the centre to periphery. Thus, Eq. 4.5 for the prototype the vector is as following:
4 Detectors Fields
45
Fig. 4.6 Examples of edge detection using different methods: a grayscale standard test image, b Robert’s cross, c Sobel filter, d Prewitt operator, e Canny edge detector, f wavelet transform, g convolutional neural network, h detectors field (detector 6 × 6), i detectors field (detector 9 × 9)
T
o = s o (0, 0); s o (0, −1); s o (−1, −1); s o (−1, 0); s o (−1, 1); . . . , S
(4.5)
where s o (k1 , k2 ) = s o (w), k1 ∈ (−2N , 2N ), k2 ∈ (−2M, 2M), w ∈ 0, 1, 2, . . .. The second step is choosing the filter aperture. The aperture size depends on the prototype image size. The better precision of object detection can be achieved by a larger number of elements related to the prototype in the aperture. Hence, aperture dimension can be found using Eq. 4.6, where ro = max(Mo , No ). Ro = (2ro + 1)2
(4.6)
46
A. V. Bogoslovsky et al.
Fig. 4.7 Illustration of S-matrix elements index determination
S
Indexes are determined o by deconvolution of S
1
Indexes are determined by deconvolution of S o vector Ind
Ind
ex e de s of by ter el k1 min eme an ed nts d k2
ex e d so by etermf ele k1 ine men an d ts d k2
1
The third step of the filtering process is counts determination of the impulse response. For this, we need to solve the system of linear equations [22], which is is a shown as a matrix in Eq. 4.7, where S is a matrix of energy signatures s(l, p), H column vector of h i, j counts of the impulse response, which elements are arranged according to Eq. 4.5. =S o S·H
(4.7)
The order of sl, p elements is shown schematically in Fig. 4.7. o the column the vector dimension. The elements The rank of S-matrix relates to S of S-matrix are determined with Eqs. 4.4 and 4.5, where the f i, j are the counts of the input image. The order of the matrix elements is as follows. The indexes of the first o -vector. The indexes row and first-column elements are similar to the elements of S l, p of all other S-matrix symmetrical to the main diagonal of the matrix are formed o -vector arguments, i = 1, 2. according to Eq. 4.8, where k1 (w), k2 (w) are S
l = k1 (l) − k1 ( p) p = k2 (l) − k2 ( p)
(4.8)
Thus, the matrix S = s (l, p) for the input image can be written as follows. ⎛
⎞ 1 s(0, −1) s(−1, −1) s(−1, 0) s(−1, 1) · · · ⎜ s(0, −1) 1 s(−1, 0) s(−1, 1) s(−1, 2) · · · ⎟ ⎜ ⎟ ⎜ ⎟ 1 s(0, −1) s(0, −2) · · · ⎟ ⎜ s(−1, −1) s(−1, 0) ⎜ ⎟ S =⎜ s(−1, 0) s(−1, 1) s(0, −1) 1 s(0, −1) · · · ⎟ ⎜ ⎟ ⎜ s(−1, 1) s(−1, 2) s(0, −2) s(0, −1) 1 ···⎟ ⎝ ⎠ .. . ··· ··· ··· ··· ··· Finally, the video image should be processed by the synthesized linear digital filter.
4 Detectors Fields
47
4.5 Experimental Studies Figure 4.8 illustrates an input image obtained from a drone. The result of contour detection using the detectors field as a reduced size image is shown in Fig. 4.9. After image pre-processing by the detectors field, the digital filtering should be performed to select the regions of interests. The filter aperture is based on the energy signatures. The personal cars were chosen as an example of a region of interest. The prototype is an image containing the main topological elements of object searching, i.e. minimal set of object’s shape features, and is illustrated as a white contour on the black background. Figure 4.10a shows the results of image processing implemented with the filter, which aperture size aligned with the size of the prototype image, while Fig. 4.10b illustrates the results of the following threshold processing. After applying the threshold, the object can be defined as detected if the number of points within the prototype size gives an indication of the non-random threshold crossing. Then, the centroid should be defined to determine the object position (Fig. 4.11a). Single false positives caused to a random threshold crossing are not taken into account. The resulting object detection is shown in Fig. 4.11b. Fig. 4.8 Input image
48
A. V. Bogoslovsky et al.
Fig. 4.9 The result of detectors field processing (the image is enlarged in five times)
Fig. 4.10 Region of interest selection as a result of image filtering: a result of filtering with aperture aligned with the size of the prototype image, b result of threshold implementation
4 Detectors Fields
49
Fig. 4.11 The result of localization: a centroid determination, b visualisation of the result (a) in the image
4.6 Using Detectors Field Filtering in Images Affected by Motion Blur For the human observer the motion blur problem has a smaller effect on the process of image understanding than for the systems of computer vision. Failure and lack of motion blur compensation can be led to significant errors in object detection. The detectors field consists of two types of detectors: 1. p-detector. The pixels of the central cell are added up with positive coefficients, while pixels from the periphery are added up with negative coefficients. 2. n-detector. The pixels of the central cell are added up with negative coefficients, while pixels from the periphery are added up with positive coefficients. Figure 4.12 illustrates the test image affected by the ideal horizontal motion blur. Figure 4.13 illustrates the results of applying detectors field with p- and n-detectors, respectively. The size of the input image is
2N×+1(2N + 1) = 1200 × 800
(2M + 1) × = 300 × 200 pixels. pixels and the size of processed images is 2M+1 m m Figure 4.13 suggests that both, p- and n-detectors fully offset ideal motion blur. The processed images allow determining an orientation and the size of the motion blur which are d = d × m = 70 pixels in our example. However, the blur areas in the output images (see Fig. 4.13) have a significant difference. The result of the detectors field edge detection is dependent on detector type. Hence, it is expedient to use together detectors field with p- and n-detectors for object analysis within the image. Figure 4.14 shows the image with different motion blur effects. The parts of the real scene, which were moving during image acquisition, have different brightness on the obtained image.
50
A. V. Bogoslovsky et al.
Fig. 4.12 The input images affected by motion blur
Fig. 4.13 Processed images formed by detectors field with: a p-detector, b n-detector
Fig. 4.14 Image with different motion blur effects
4 Detectors Fields
51
Fig. 4.15 The result of applying detectors with: a p-detector, b n-detector
Figure 4.15 shows that detectors fields with different detectors select different image edges. The difference in edge detection caused by image content as well as drift mechanism of detectors the field. Thus, the detectors field method allows to compensate different types of motion blur and determine its characteristics. P- and n-detectors can be efficiently used in computer vision similar to work of ON- and OFF-centres of receptive fields in the human retina.
4.7 Conclusions Digital filtering using the detectors fields reduces the size of image at the preprocessing step and, at the same time, detect all significant edges in an image. The proposed approach is scale invariant and allows to perform image processing in real time thanks to image size reduction. The testing results of the detectors fields approach let us make two main conclusions: • Double-zone basis of the detectors field provides edge detection in an image with remaining all significant characteristics of image structure. • Using the working principle of ON- and OFF-centres of retinal ganglion cells with non-overlapping receptive field let achieve a valid structure of detectors field. This filter can be applied in an image pre-processing step to reduce image size. Advantage of using this filter is the decline of computing numbers in the software implementation of the detectors field drift. Moreover, it is possible to reduce computing costs by several orders by applying mechanical parts of the analysed scene offset relative to the detectors field (i.e. similar to ocular drift).
52
A. V. Bogoslovsky et al.
References 1. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 106(1), 106–154 (1962) 2. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat. J. Neurophysiol. 28(2), 229–289 (1965) 3. Haykin, S.S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall International Editions Series (1999) 4. Gonzalez, P.R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall, Upper Saddle River (2002) 5. Chochia, P.A.: Some object detection algorithms based on a two-scale model of the image. Inf. Process 14(2), 117–136 (in Russian) (2014) 6. Ifeachor, E.C., Jervis, B.W.: Digital Signal Processing: A Practical Approach, 2nd edn. Prentice Hall (2002) 7. Fukushima, K., Miyake, S., Takayuki, I.: Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. 13(5), 826–34 (1983) 8. Boahen, K.A:. A retinomorphic vision system. IEEE Micro 16(5), 30–39 (1996) 9. Boahen, K.A., Andreou, A.G., Boahen, K.A.: A contrast sensitive silicon retina with reciprocal synapses. In: Advances in Neural Information Processing Systems, vol. 4, pp. 764–772. Morgan Kaufmann, CA (1992) 10. Changizi, M.: The Vision Revolution: How the Latest Research Overturns Everything We Thought We Knew About Human Vision. BenBella Books (2009) 11. Shulgovsky, V.V.: Basics of Neurophysiology. Aspect Press, Moscow (in Russian) (2000) 12. Helmholtz, H.L.: About human vision. Recent advances in the theory of vision. Librokon, Moscow (in Russian) (2011) 13. Hubel, D.H.: Eye, Brain, and Vision, 2nd edn. (Scientific American Library, No 22). W. H. Freeman (1995) 14. Hubel, D.H., Wiesel, T.N.: Brain and visual perception: the Story of a 25-year collaboration. Oxford University Press, USA (2005) 15. Faugeras, O.D.: Digital color image processing within the framework of a human visual model. IEEE Trans. ASSP 27(4), 380–393 (1979) 16. Ponomarev, A.V., Bogoslovskiy, A.V., Zhigulina, I.V.: Detector fields. Radiotec. 7, 129–136 (in Russian) (2018) 17. Redkozubov, A.: Consciousness logic. Available at: http://www.aboutbrain.ru. Accessed 12 May 2019 18. Packer, O., Dacey, D.: Receptive field structure of H1 horizontal cells in macaque monkey retina. J. Vis. 2(4), 279–292 (2002) 19. Zemlyanoy, I.S.: The study of methods for the selection and use of support contours for the purpose of face recognition, in The 58th Scientific Conference with International Participation, pp. 1–4 (in Russian) (2015) 20. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Back propagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989) 21. Bogoslovskiy, A.V., Pantukhin, M.A., Zhigulina, I.V.: Selection the filter’s aperture to high spatial frequencies filtering matched with an input image’s properties. Radiotec, Moscow (in Russian) (2016) 22. Bogoslovsky, A.V.: Processing of Multidimensional Signals, vol. 1: Linear Multidimensional Discrete Processing of Signals. Methods of the Analysis and Synthesis. Radiotekhnika Publ., Moscow (in Russian) (2013)
Chapter 5
Comparative Evaluation of Algorithms for Trajectory Filtering Konstantin K. Vasiliev and Oleg V. Saverkin
Abstract The work is dedicated to the study of trajectory filtering algorithms based on the use of Kalman filter. A new algorithm for estimating trajectory parameters was synthesized. The algorithm is based on the model in the body-fixed frame during observations in the spherical coordinate system. A mathematical modelling was performed to obtain and analyze the results of trajectory filtration using the known linear and nonlinear Kalman filters and proposed algorithm. The study is carried out by mathematical modelling via MATLAB environment. It is established that the use of filtering with adjustment in body-fixed coordinates is more effective than the algorithm based on the known Kalman filters. Such result is explained that the proposed algorithm allows to take into account the nature of the motion of the tracked object more fully. At the same time, it combines the simplicity of linear filtering with a separate estimation for each coordinate. Thus, it is more preferred for practical application. Keywords Digital signal processing · Trajectory processing · Kalman filter · Linear filter · Nonlinear filter · Body-fixed frame · Mathematical modelling
5.1 Introduction A number of methods of trajectory filtration are known, the main of which are based on modifications of algorithms of Kalman vector estimation. Application of nonlinear Kalman Filter (NF) can be close to the optimal solution because observations are made in a spherical coordinate system and estimation of the parameters of the trajectories to be tracked is carried out in a rectangular coordinate system. However, this approach requires additional computational costs and is very challenging to implement [1–5]. In works [4, 5], researches of efficiency of the trajectory filtration algorithms that are applied to the two-coordinate radar, as well as, an algorithm based on Kalman nonlinear filter that was applied to a three-coordinate radar were K. K. Vasiliev · O. V. Saverkin (B) Ulyanovsk State Technical University, 32, Severny Venetz Str., Ulyanovsk 432027, Russian Federation © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_5
53
54
K. K. Vasiliev and O. V. Saverkin
carried out. However, the comparative modelling of Kalman’s nonlinear and linear filter algorithms for three-coordinate radar was not considered. The aim of the research is a comparative analysis of the effectiveness of the proposed modifications of linear and nonlinear Kalman filters for various types of trajectories of radar targets. This chapter is organized as follows. Section 5.2 provides the target motion models. Trajectory filtration algorithms are discussed in Sect. 5.3. The proposed bodyfixed frame algorithm is given in Sect. 5.4, while the comparative analysis of filtration efficiency is presented in Sect. 5.5. Section 5.6 concludes the chapter.
5.2 Target Motion Models Operation of most algorithms of the trajectory tracking is based on the use of various mathematical models, with which it is possible to accurately approximate the actual motion of the target and the process of its observations by the radar and then to refine the obtained measurements during the filtration via assessing the degree of their suitability for the model. The combination of optimally selected models of motion and observation underlies most methods of trajectory tracking. Consider the mathematical models of the motion of objects and observations with that applied to a three-coordinate radar station. As a model of motion of the tracked object, we use Markov random sequence given up to the stochastic equation [1–3]: x¯i = ℘i x¯i−1 + ξ¯i , i = 1, 2, . . . k,
(5.1)
T where x¯i = xi yi z i vxi v yi vzi , xi , yi , z i are the Cartesian coordinates of the object position, vxi , v yi , vzi are the projections of the velocity on to X, Y, and Z axes, respectively, ⎛
1 ⎜0 ⎜ ⎜ ⎜0 ℘i = ⎜ ⎜0 ⎜ ⎝0 0
0 1 0 0 0 0
0 0 1 0 0 0
Ti 0 0 1 0 0
0 Ti 0 0 1 0
⎞ 0 0⎟ ⎟ ⎟ Ti ⎟ ⎟, 0⎟ ⎟ 0⎠ 1
T Ti is the time, for which the object position has changed, ξ¯i = 0 0 0 ξvxi ξvyi ξvzi is the white Gaussian noise with covariance matrix:
5 Comparative Evaluation of Algorithms for Trajectory Filtering
⎛
0 ⎜0 ⎜ ⎜ ⎜0 Vξ i = ⎜ ⎜0 ⎜ ⎝0 0
0 0 0 0 0 0
55
⎞ 0 0 0 0 0 0 0 0 ⎟ ⎟ ⎟ 0 0 0 0 ⎟ ⎟, 0 γ 2 σxi2 0 0 ⎟ ⎟ 0 0 γ 2 σ yi2 0 ⎠ 0 0 0 γ 2 σzi2
γ is the relative average change
in target speed
of flight through
during the time 2 Ti 2 Ti 2 Ti the radar coverage area, σxi = vx0 T , σ yi = v y0 T , σzi = vz0 are the Root T Mean Square (RMS) deviations of the projections of velocity, vx0 , v y0 , vz0 are the initial values of the projections of velocity, T is the radar scan time. The use of this model allows to simulate a motion of an object of various degrees of complexity: from rectilinear uniform to motion with large accelerations and maneuvering. If γ = 0, then vi = v0 , that is, the speed does not change. At γ = 0.01, the speed will change by 1%, etc. Thus, it is possible to estimate the operation of the filter under different conditions. As a model for observing an object using three-coordinate radar, we use the following expression: z¯ i = h(x¯i ) + n¯ i ,
(5.2)
T where z¯ i = z Ri z αi z βi , z Ri are the distance observations, z αi are the bearing observations, z βi are the elevation observations, ⎛
xi2 + yi2 + z i2
⎜ yi h(x¯i ) = ⎜ ⎝ arctan xi arctan √ z2i
⎞ ⎟ ⎟, ⎠
xi +yi2
T n¯ i = n Ri n αi n βi are the additive noise with zero mean and covariance matrix: ⎞ σ R2 0 0 Vn = ⎝ 0 σα2 0 ⎠, 0 0 σβ2 ⎛
σ R , σα and σβ are the RMS deviations of the source of observations by distance, bearing and elevation, respectively.
56
K. K. Vasiliev and O. V. Saverkin
5.3 Trajectory Filtration Algorithms Estimation of the trajectory of a moving object consists in determining the numerical values of its motion parameters. Consider NF algorithm [3–5]. Using observations (Eq. 5.2), we calculate the estimation of the motion parameters of the target: xˆ¯i = xˆ¯i + Pi HiT Vn−1 z¯ i − h¯ i ,
(5.3)
where xˆ¯i = ℘i xˆ¯i−1 is the vector of prediction values in Cartesian coordinates at the ith step, h¯ i is the vector of prediction values in spherical coordinates at the ith step. The covariance matrix of estimation errors is calculated by formula [3]: −1 Hi Pi , Pi = Pi − Pi HiT Hi Pi HiT + Vn
(5.4)
where Pi = ℘i Pi−1 ℘iT + Vξ i is the covariance matrix of prediction errors, ⎛ Hi =
dh(x¯i ) = d x¯i
√
xi 2 2 xi +yi ⎜ yi ⎜ − x 2 +y 2 ⎜ i√ i ⎝ 2 xi z i / xi +y 2 − x 2 +y 2 +z 2 i i i i
√
−
yi
2 2 xi +yi xi 2 2 xi +yi 2 2 yi z i / xi +yi 2 2 2 xi +yi +z i
√
√ √
z i 2 2 xi +yi
0
2 2 xi +yi 2 2 2 xi +yi +z i
000
⎞
⎟ 0 0 0⎟ ⎟. ⎠ 000
The described algorithm is the most challenging both in the implementation and adjustment [4, 5]. However, this approach makes it possible to take fuller account of the nature of the observational models. One of the important simplifications laid down in Kalman filter is the assumption of the linear character of the equations of motion and observation. To reduce computational costs, it is proposed to use a linear filter, to which input linearized observations arrive. To take into account nonlinear dependencies and achieve adequate accuracy, we perform the following transformations: ⎛
z¯ i
⎞ ⎛ ⎞ z xi z Ri cosz αi cosz βi = ⎝ z yi ⎠ = ⎝ z Ri sinz αi cosz βi ⎠, z zi z Ri sinz βi
where z xi , z yi and z zi are the observations of Cartesian coordinates. As a result of the transformations, the observation model takes the following form: z¯ i = C x¯i + n¯ i , ⎛
(5.5)
⎞ 100000 T where C = ⎝ 0 1 0 0 0 0 ⎠ is the conversion matrix, n¯ i = n xi n yi n zi is the 001000 additive noise with zero mean and covariance matrix:
5 Comparative Evaluation of Algorithms for Trajectory Filtering
57
⎛
⎞ 2 σnxi Bx yi Bx zi 2 Vni = ⎝ Bx yi σnyi B yzi ⎠, 2 Bx zi B yzi σnzi where Bx yi , Bx zi , and B yzi are the covariance of observations, 1 Bx yi = M n xi n yi = sin2z αi σ R2 cos2 z βi + z 2Ri σβ2 sin2 z βi − z 2Ri σα2 cos2 z βi , 2 Bx zi =
1 sin z αi sin 2z βi σ R2 − σβ2 z 2Ri , 2
B yzi =
1 cos z αi sin 2z βi σ R2 − σβ2 z 2Ri , 2
variances of observation inaccuracies in Cartesian coordinates are: 2 = σ R2 cos2 z αi cos2 z βi + z 2Ri σα2 sin2 z αi cos2 z βi + z 2Ri σβ2 sin2 z βi cos2 z αi , σnxi 2 σnyi = σ R2 cos2 z βi sin2 z αi + z 2Ri σα2 cos2 z αi cos2 z βi + z 2Ri σβ2 sin2 z βi sin2 z αi , 2 σnzi = σ R2 sin2 z βi + z 2Ri σβ2 cos2 z βi .
To estimate the motion parameters of the objects based on the next observation, we use the following expression:
xˆ¯i = xˆ¯i + Pi C T Vni−1 z¯ i − C xˆ¯i .
(5.6)
In this case, the covariance matrix of estimation errors can be found from the following expression: −1 Pi = Pi E + C T Vni−1 C Pi , where E is the unit matrix. This approach is easier for implementation and adjustment and requires O N N fewer multiplication operations, where N is the number of observed parameters.
5.4 Body-Fixed Frame Consider the structural features and comparative efficiency of the proposed linear Kalman Filter (LF) and algorithm for trajectory filtration in the body-fixed frame. Note that in LF algorithm, the constant values of acceleration on three coordinates
58
K. K. Vasiliev and O. V. Saverkin
ax , a y , az enter into the formula for covariance matrices of errors of predictions Pi = ℘i P(i−1) ℘iT + ϑi Vξ i ϑiT . These constants are used in calculation of the gain constants Bi . Now assume the target motion is described by the nonlinear stochastic equations T with body-fixed coordinates included in the state vector x¯i = xi yi z i Vi K i φi . The proposed approach consists in the quasi-linearization of the equations for the projections of new coordinates onto the axis of the Cartesian system [6]. After the transformations, we obtain the following nonlinear equations of autoregression taking into account the small values of random additives aTi ξV i v K Ti ξ K i vφ Ti ξφi : xi = x(i−1) + vx(i−1) Ti , yi = y(i−1) + v y(i−1) Ti , z i = z (i−1) + vz(i−1) Ti , v¯ xi = v¯ x(i−1) + Ix K v¯ x(i−1) ξ¯ K i ,
(5.7)
where T T v¯ xi = vxi v yi vzi , ξ¯ K i = aTi ξV i v K Ti ξ K i vφ Ti ξφi ,
The proposed equations can also be written in the form of a single vector nonlinear stochastic equation of the form: x¯i = ℘i x¯(i−1) + ϑi x¯(i−1) ξ¯i , i = 1, 2, . . . k, 0 is the 6 × 3 matrix function of the state where ϑi x¯(i−1) = Ix K vx(i−1) ϑV i vector x¯(i−1) , 0 is the 3 × 3 zero matrix.
5 Comparative Evaluation of Algorithms for Trajectory Filtering
59
In this case for the new Body-Fixed (BF) algorithm only the covariance matrix of prediction errors will change in the equations describing the linear Kalman filter:
Pi = ℘i P(i−1) ℘iT + Ix K vˆ¯ x(i−1) Vξ i IxTK vˆ¯ x(i−1) .
(5.8)
Under such approach, the values of accelerations ax , a y , az are recalculating based on estimated values of the ground speed, course angle and climb angle at the previous filtration step: 2 2 2 ˆ 2 ˆ a cos2 Kˆ (i−1) cos2 φˆ (i−1) + v2K Vˆ(i−1) φ sin cos K (i−1) (i−1) aˆ xi = , 2 + vφ2 Vˆ(i−1) cos2 Kˆ (i−1) sin2 φˆ (i−1) 2 2 ˆ 2 2 ˆ 2 ˆ a sin K (i−1) cos2 φˆ (i−1) + v2K Vˆ(i−1) φ cos cos K (i−1) (i−1) , aˆ yi = 2 + vφ2 Vˆ(i−1) sin2 Kˆ (i−1) sin2 φˆ (i−1) 2 a 2 sin2 φˆ (i−1) + vφ2 Vˆ(i−1) cos2 φˆ (i−1) . aˆ zi = The introduction of these estimates can be considered as an adaptation of the parameters ax , a y , az at each filtration step. that
It should be noted
if only the diagonal elements of the covariance matrix T ˆ ˆ ϑi x¯(i−1) Vξ i ϑi x¯(i−1) are nonzero, the linear equations suitable to the model of a simple linear filter with a separate estimation for each coordinate will be used in the resulting algorithm. In the case of absence of observations of the target velocity component, the matrix inversion operation is not required when calculating the coefficients Bi . This makes it possible to implement the system of trajectory filtration with minimal computational costs [6]. Consequently, BF algorithm combines the advantages of both algorithms.
5.5 Comparative Analysis of Filtration Efficiency To perform a comparative analysis and research of the effectiveness of the proposed modifications of linear and nonlinear Kalman filters, a mathematical model was constructed in MATLAB environment. The model allows to: • Simulate different types of the object motion. • Simulate observations from the radar with specified accuracy characteristics. • Plot the original trajectory of the target, observations from the simulated radar and the trajectory reconstructed from results of processing the radar data.
60
K. K. Vasiliev and O. V. Saverkin
The simulation of observations was carried out in the spherical coordinate system for the radar with an observation RMS error in the range of 150 m, RMS course and climb angles are equaled 1° [7]. Each filter was tuned to track the object moving at the average speed of 300 m/s. The radar is at the origin. Figure 5.1 shows the time dependent behaviour of the square root values of the mean square error of estimating the coordinate x of the tracked object for the object moving at the initial speed of 300 m/s (Fig. 5.1a), 30 m/s (Fig. 5.1b), and 900 m/s (Fig. 5.1c). In Fig. 5.1a, all of the filters have close efficiency under such conditions. However, it should be noted that BF algorithm has slightly less accuracy. The most representative results are found in cases when the tracked object has motion parameters different from those contained in the filters (Fig. 5.1b, c). We can see from the presented graphs that the estimates of the coordinate x from BF are up to about 25% more accurate than the estimates of NF and LF. Figure 5.2 shows the time dependent behaviour of the square root values of the mean square error of estimating the speed value of the tracked object. We can see from the presented graphs that BF exceeds in accuracy from LF and NF.
Fig. 5.1 The time dependent behaviour of the square root values of the mean square error of estimating the coordinate x of the tracked object with the initial speed of: a 300 m/s, b 30 m/s, c 900 m/s
5 Comparative Evaluation of Algorithms for Trajectory Filtering
61
Fig. 5.2 The time dependent behavior of the square root values of the mean square error of estimating the speed of the tracked object with the initial speed of: a 300 m/s, b 30 m/s, c 900 m/s
The value of the root of the mean square error of estimating the speed value for BF algorithm is more than two times less in comparison with NF and LF especially, when the tracked object has motion parameters different from those contained in the filters (Fig. 5.2b, c).
5.6 Conclusions The obtained results allow to conclude that the trajectory filtration in the bodyfixed frame is slightly inferior in accuracy of coordinate estimation to NF and LF algorithms only in the case of if the average speed of the target motion coincides with the setting of the filter. Under other conditions, BF is more efficient because it exceeds both LF algorithm and NF algorithm in the accuracy of estimating the coordinates and speed value of the target, and, at the same time, combines the simplicity of implementation and filter performance with a separate estimation for each coordinate. In addition, the use of filtering with adjustment in the body-fixed frame makes it
62
K. K. Vasiliev and O. V. Saverkin
possible to better take into account the intensity of maneuvering and, therefore, to track objects in a larger speed range using a single filter as a part of the trajectory parameters estimation algorithm.
References 1. Bar-Shalom, Y., Li, X.R., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. Wiley & Sons (2001) 2. Li, R.X., Jilkov, V.P.: Survey of maneuvering target tracking. Part I: Dynamic models. IEEE Trans. Aerosp. Electr. Syst. 39(4), 1333–1364 (2003) 3. Vasil’ev, K.K.: Optimal Signal Processing in Discrete Time: Textbook. Moscow: Publishing House Radiotekhnika (in Russian) (2016) 4. Vasil’ev, K.K., Luchkov, N.V.: Trajectory processing based on nonlinear filtering. Autom. Control Process. 1(47), 4–9 (in Russian) (2017) 5. Saverkin, O.V.: Comparative analysis of digital radar data processing algorithms. CEUR Workshop Proceedings, REIT 2 2017 - Proceedings of the 2nd International Workshop on Radio Electronics and Information Technologies, 2005 pp. 120–126 (2017) 6. Vasilyev, K.K., Mattis, A.V.: Associated stochastic models of radar target motion. Autom. Control Process. 4(50), 14–18 (in Russian) (2017) 7. Noor Pramadi, M.M., Lestari, A.A.: Radar dan stealth. Untuk Kalangan Terbatas, Tentara Nasional Indonesia, Edisi 2 (2019)
Chapter 6
Watermarking Models of Video Sequences Margarita N. Favorskaya
Abstract Video watermarking involves several cases due to a wide application of various compression techniques practically for all types of video content. In this chapter, we built the models for multilevel adaptive watermarking schemes of the uncompressed and compressed video sequences in the highlights of H.264/SVC standard. We analyzed the architecture of SVC encoder, as well as, three strategies for watermark embedding using H.264/SVC: watermark embedding before video encoding (Strategy 1), integrated watermark embedding and coding (Strategy 2), and compressed-domain embedding after encoding (Strategy 3). Also, basic requirements for the watermarking schemes for video sequences are discussed in detail. Keywords Video watermarking · Transmitting specification · Authentication · Copyright protection · Adaptive watermarking · Multilevel protection · Geometric attacks
6.1 Introduction The growing volume of multimedia content transmitted through the unprotected Internet channels causes a necessity of its copyright protection, content authentication, and ownership identification. The information encryption and information hiding are two main branches of security systems. The information encryption is based on cryptography, public or private, while the information hiding includes two sub-branches steganography (linguistic and technical) and watermarking (robust and fragile). The main purposes of watermarking applications are the authentication and copyright protection of the contents. The logo, tag, label, or technical information, for example about shooting, can be embedded into videos according to the blind, semi-blind, or non-blind techniques. The blind technique, when the original video does not require, prevails. The semi-blind technique claims some associated information, at that time the non-blind technique is the most robust to attacks and having M. N. Favorskaya (B) Institute of Informatics and Telecommunications, Reshetnev Siberian State University of Science and Technology, 31, Krasnoyarsky Rabochy Ave., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_6
63
64
M. N. Favorskaya
the highest transmitting cost due to a necessity of keeping the original videos during a watermark extraction. Various watermarking schemes for videos in the uncompressed and compressed domains were proposed since 2000s [1–3]. Traditionally, the suitable frames for a watermarking are selected based on the scene change detection [4] and motion vector estimation [5] for the uncompressed and compressed domains, respectively. However, the issue of robustness to the common image processing attacks, geometric attacks, and specific for video sequences attacks is not highlight well in the watermarking schemes. The problems appear, when a watermarked video sequence is compressed by the unknown codec or re-compressed by the intentional attacks. In this chapter, we propose several watermarking models of adaptive embedding for video sequence in the uncompressed and some compressed domains. Also, we discuss different criteria for embedding as the recommendation measures for the multilevel protection. The remainder of this chapter is as follows. Section 6.2 provides the related work in video watermarking. Sections 6.3 and 6.4 present the watermarking models of videos in the uncompressed and compressed domains, respectively. Basic requirements for the watermarking schemes are discussed in Sect. 6.5. Section 6.6 concludes the chapter.
6.2 Related Work There are different schemes for embedding/extraction in the compressed videos. The first watermarking algorithms based on the motion vector estimations for videos compressed by MPEG-4 had been proposed since 1997 [6]. For this purpose, a single macroblock, a component with large magnitude and small phase difference [7], or four neighboring blocks defined by block-matching algorithm [8] were used as an embedded location. Thereinafter, more complex algorithms were proposed, for example in [9]. These authors portioned a video sequence into a series of shots. The motion vectors of predicted frames (P-frames or P-slices) had been decoded and classified according to their histograms. Then a watermark was embedded into the classified motion vectors, which had been processed separately in different shots. Finally, the modified motion vectors had been re-coded to form the watermarked video. The original procedure called as Swarm Intelligence based Fuzzy C-Means (SI-FCM) for embedding the motion vectors in the compressed domain video bit streams was suggested in [10]. The motion vectors are trained by SI-FCM clustering in order to select the appropriate cluster centroid of each motion vector. The embedding was implemented into the magnitudes of the selected motion vectors using the spatial watermarking techniques, viz. the most significant bit or the least significant bit. Sometimes, a watermark is embedded into the phase angle between two consecutive candidate motion vectors, which are selected using their magnitudes [11].
6 Watermarking Models of Video Sequences
65
However, the watermark embedded in the motion vectors is fragile and can be easily removed. The robustness is achieved, when a watermark is embedded in the Digital Cosine Transform (DCT) domain of the I-slices. Thus, DCT applied the smaller block-size of 4 × 4 elements reduces the ringing artifacts but becomes more sensitive to various attacks. In [12], a genetic algorithm was applied for selection such key frames, previously extracted, which have the best quality after embedding. The objective optimization function was a mean absolute difference between the frames. 25 frames in the one-second video were simulated as chromosomes. Two operations, crossover and mutation, provided a randomness in the optimization. As a result, some unproductive chromosomes were removed, and some chromosomes with high fitness function value for the next generation were selected. Then a watermark was inserted randomly into the selected sub-bands according to a secret watermarking key. The transform domain methods for the compresses-domain video watermarking focused on MPEG-X or H.26X encoding standards prevail regarding the spatial domain methods. The transform domain methods are complex and time-consuming but provide better quality results after embedding and better robustness for the common processing and geometric attacks. Recently, DCT and other transforms are successfully applied in uncompressed and compressed domains.
6.3 Watermarking Model of Videos in Uncompressed Domain Let us consider the case closed to the watermarking scheme of still images, viz. the uncompressed video as a basic case taking into consideration the adaptive watermarking and watermarking oriented on human visual system. The generalized model for information hiding into the uncompressed video M UCOM (in each frame) includes three components: a set of watermarks S WM , a set of available areas for embedding S AR , and a set of operators S OP . MU C O M = (SW M , S A R , SO P )
(6.1)
Usually several types of watermarks in the framework of multilevel protection are embedded. This process requires a careful distribution of areas for embedding. Fragile watermark WM FR shows that Internet attacks were applied to the transmitted frames. It is considered that a fragile watermark is distorted under any type of Internet attacks. However, it is not right because some cases of translation, cropping, and frame re-ordering do not cause a distortion of the fragile watermark. Thus, the additional checking of attacks’ availability is necessary. The hidden watermarks can be visual WM VS and textual WM TX . Also, an encryption EN may be applied to these watermarks W MVC S and W MTCX , respectively, as an additional level of protection,
66
M. N. Favorskaya
with the following decryption DE. Each of hiding watermarks ought to be transformed in a binary code. However, the simplest transform of textual watermark in a binary code is very vulnerable for Internet attacks. That is why, a representation of textual watermark as an image with low resolution is a better decision in the sense of robustness. The main statement is such that the areas of different watermarks (their number is defined by the solving task and volume of initial information for embedding) cannot overlap. The areas for embedding ARPR are chosen according to the main rules of invisibility [13, 14]: • • • •
Choose the stable area for WM FR . Choose the high textural areas for WM TX and WM VS . Choose the areas with blue component domination for WM TX and WM VS . Do not choose the salient regions.
Such adaptive watermarking has the main disadvantage. It is necessary to form the chains of image cells’ coordinates as a part of secrete key K A . The recommended size of cells depends from frame resolution but not less than 8 × 8 elements, otherwise a digital frequency transform will be too approximate and cannot provide enough coefficients for embedding. Preparing of a watermark into the appropriate form for embedding/extraction PR, embedding a watermark into a frame EM, extraction a watermark from the watermarked frame after its transmission through the Internet channels EX, and evaluation of the extracted watermarks EV are the main operators of any watermarking model. Thus, the generalized model (Eq. 6.1) can be re-written in a view of Eq. 6.2: MU C O M = ({SW M , A R P R (SW M , K A )}, {P R (SW M , E N (SW M )), E M SWB M , E X SWE M , E V SWD M , D E SWD M ,
(6.2)
SW M = {W M F R , W MV S , W MT X },
(6.3)
where
C S WM is the set of original watermarks, SW M is the set of encrypted watermarks, B SW M is the set of watermarks in a binary view,SWE M is the set of embedded watermarks
in the frequency coefficients. The watermark representations are changed by following manipulations, where K V and K T are the parts of secrete keys respect to visual and textual watermarks, ψ and ψ−1 are the functions of the direct and inverse encryption: 1. From the visual or textual view to the encrypted view (optionally): E N = ψ(K V (W MV S ), K T (W MT X )) → W MVC S , W MTCX .
(6.4)
6 Watermarking Models of Video Sequences
67
2. From the visual, textual, or encrypted view to the binary view: P R = W M F R , W MV S ∩ W MVC S , W MT X ∩ W MTCX → W M FB R , W MVB S , W MTBX .
(6.5) 3. From the binary view to the embedded view: E M = W M FBR , W MVBS , W MTBX → W M FE R , W MVE S , W MTEX .
(6.6)
4. From the embedded view to the distorted view (under Internet attacks): E X = W M FE R , W MVE S , W MTEX → W M FDR , W MVDS , W MTDX .
(6.7)
5. From the distorted view to the visual, textual, or encrypted view: E V = W M FDR , W MVDS , W MTDX → W M F R , W MV S ∩ W MVDS , W MT X ∩ W MTDX .
(6.8) 6. From the decrypt view to the visual or textual view (optionally): D E = W MVDS , W MTDX → ψ −1 K V W MV S , K T W MT X .
(6.9)
Here, W MV S and W MT X are the extracted and decrypted watermarks. For blind schemes, a watermark restoration is executed by the complex algorithms without use of the host frames [14].
6.4 Watermarking Models of Videos in Compressed Domain Lossy video compression implies one of the codec versions based on Moving Picture Experts Group (MPEG) standards, such as H.262 MPEG-2 Video, H.263, MPEG-4 Visual, and H.264/AVC (Advanced Video Coding). The H.264/AVC includes two layers: Network Abstraction Layer (NAL) and Video Coding Layer (VCL). NAL is organized in the units, each of one starts with a one-byte header showing a type of payload data, while the remained bytes store the content. VCL NAL units include the coded slices, and non-VCL NAL units contain additional information about parameter sets and supplemental enhancement information assisting the decoding process. VCL of H.264/AVC is similar to the prior video coding standards, such as H.261, MPEG-1 Video, H.262 MPEG-2 Video, H.263, or MPEG-4 Visual, but has the useful properties of flexibility and adaptability. H.264/AVC supports the traditional concept of subdivision into macroblocks and slices. H.264/SVC provides three types of scalability called as the temporal, spatial, and quality expressed by
68
M. N. Favorskaya
signal to noise ratio. Herewith, the spatial scalability supports the traditional approach of multilayer coding, when the motion-compensated prediction and intra-prediction are implemented for single-layer coding. At that time, new formats such as High Definition (HD), Full HD, and Ultra HD are based on High Efficiency Video Coding (HEVC) format. HEVC or H.265/HEVC achieved significant improvement in coding efficiency, especially in the perceptual quality with more than 50% bit rate respect to H.264/AVC [15]. Such improvement of the coding efficiency is obtained through the intensive computation complexity. The macroblocks are organized into three basic slices parsed independently of other slices: • I-slice: intra-picture predictive coding using spatial prediction from neighboring regions. • P-slice: intra-picture predictive coding and inter-picture predictive coding with one prediction signal for each predicted region. P-slices and B-slices represent a motion-compensated prediction with multiple reference pictures using variable block sizes. • B-slice: intra-picture predictive coding, inter-picture predictive coding, and interpicture bi-predictive coding with two prediction signals that are combined with a weighted average to form the region prediction. Video is ordered into Groups Of Pictures (GOPs), whose slices can be encoded in the sequence: [I, B, B, P, B, B, P, B, B]. The temporal redundancy between the slices is exploited using a block-based motion estimation that is applied on the macroblocks in P-slice or B-slice and searched in the target slice/slices. The watermarking pipeline for the compressed video sequences implies three basic strategies, which are considered in Sect. 6.4.1. The corresponding models for these strategies are proposed in Sect. 6.4.2.
6.4.1 Watermarking Schemes for Compressed Video Sequences Let us consider the watermarking schemes in a highlight of SVC encoder. H.264/SVC is an extension of H.264/AVC and involves several layers within the encoded stream. Similar to the underlying H.264/AVC standard, H.264/SVC architecture includes the VCL and NAL layers. VCL represents the coded source content, while NAL units are classified into VCL NAL units with coded video data and non-VCL NAL units with associated additional information. The architecture of SVC encoder involves a Base layer (Layer 0) and several enhancement layers (Layers 1, 2,…, n). The simplified architecture is depicted in Fig. 6.1. Meerwald and Uhl [16] were the first, who formulated three strategies of a watermark embedding using SVC spatial scalability (Fig. 6.2). The Strategy 1 (Fig. 6.2a) permits to use the conventional watermarking schemes but cannot provide a control over the resulting bit-stream. It is difficult to detect a
6 Watermarking Models of Video Sequences
69
Fig. 6.1 The simplified architecture of SVC encoder
Fig. 6.2 Three strategies for watermark embedding using H.264/SVC: a watermark embedding before video encoding (Strategy 1), b integrated watermark embedding and coding (Strategy 2), c compressed-domain embedding after encoding (Strategy 3)
70
M. N. Favorskaya
watermark in the compressed domain because the lossy compression and downsampling of the full-resolution video impact on the embedded watermark significantly. A robust watermarking scheme resilient to H.264/SVC but without scalability support was introduced in [17]. The Strategy 2 (Fig. 6.2b) is the most promising for a watermarking due to the integrated H.264/SVC video encoding and watermarking offer a control over the bit-stream. Park and Shin [18] presented a combined scheme of encryption and watermarking to provide the access right and the authentication of the video simultaneously. The proposed scheme protected the data content in a more secure way since the encrypted content was decrypted after the watermark detection. In [19], the proposed authentication scheme used the prediction modes and residuals, while the proposed copyright protection scheme utilized DCT domain of uncompressed I-frames. Note that I-frames in H.264/SVC include the most important information and are transmitted to all content user. The Strategy 3 (Fig. 6.2c) is the most complex for implementation due to the interlayer prediction structure of H.264/SVC. This approach requires a drift compensation to minimize the error propagation and employs three types of embedding, viz. the motion compensated intra frames or residual frames, motion vector, or modifying the encoded bit streams. The embedding in compression domain modifies a video coding pipeline by inserting the watermark modules within it. Therefore, these schemes are always dependent on the given video coding algorithms and provide lesser flexibility.
6.4.2 Watermarking Models for Three Strategies The watermarking Strategy 1 does not differ significantly from the watermarking of the uncompressed video scheme except on two operators—Coder and Decoder. Equation 6.2 can be re-written as follows: MC O M1 = ({SW M , A R P R (SW M , K A )}, {P R (SW M , E N (SW M )), , E M SWB M I , Coder, E X SWE M , Decoder, E V SWD M , D E SWD M (6.10) where MC O M1 is the model for Strategy 1, E M SWB M I is the embedding operator applied only for I-slices. The main operators (Eqs. 6.3–6.9) remain the same as in the model for the uncompressed video sequence adding two operators. The model MC O M2 for Strategy 2 has a view of Eq. 6.11. MC O M2 = ({SW M , A R P R (SW M , K A )}, {P R (SW M , E N (SW M )), B E D D , Decoder, E X SW Coder ∩ E M SW M M , E V SW M , D E SW M I
(6.11)
6 Watermarking Models of Video Sequences
71
The model MC O M3 for Strategy 3, first, implements a video compression and, second, the watermarks embedding in the compressed video sequence. Here, all three types of slices: I-slice, P-slice, and B-slice can be used for embedding (Eq. 6.12). MC O M3 = ({SW M }, {P R (SW M , E N (SW M )), , B E D D , E X S , D E S Coder, E M SW , Decoder, E V S M WM WM WM I,P,B
(6.12) where E M SWB M I,P,B means a possibility to embed of watermarks in all types of slices. The watermarking process of compressed video sequences (Strategies 1 and 2) provides the worse results regarding to the watermarking process of uncompressed video sequences, especially when the lossy compression techniques are applied. Compression may damage the embedding information randomly. At this sense, Strategy 3 saves the watermarks more carefully. However, visual quality of video sequence can suffer.
6.5 Basic Requirements for Watermarking Schemes Quality of the watermarking schemes is evaluated according to the basic requirements mentioned below: • • • • •
Imperceptibility. Robustness. Payload capacity. Complexity. Security.
Imperceptibility or invisibility means the perceptual similarity between the watermarked frames and original frames. The quantitative evaluation of the perceptual quality estimates using the Peak Signal to Noise Ratio (PSNR) and/or Structural Similarity Index Measure (SSIM) metrics. PSNR metric is calculated by Eq. 6.13, where f (x, y) and f (x, y) are the values of intensities at position (x, y) describing the host and watermarked images with sizes M × N pixels, respectively. P S N R = 10 log10 1 M×N
max f 2 (x, y) N M
( f (x, y) − f (x, y))2
(6.13)
x=1 y=1
The SSIM and Mean SSIM (MSSIM) are estimated by Eqs. 6.14 and 6.15, where μx and μy denote the mean of x and y, σx and σy are the variance of x and y, σxy denotes covariance, C1 and C2 are the small positive constants, R is the number of
72
M. N. Favorskaya
regions, where the SSIM is calculated. The value of mean SSIM is in interval [0, 1], and SSIM = 1 means that there are no deference between two images. SS I M =
2μx μ y + C1 μ2x + μ2y + C1
σx y + C 2 · σx σ y + C 2
R 1 M SS I M = SS I M(xi , yi ) R i=1
(6.14)
(6.15)
Robustness estimates the distortions under intentional or unintentional (accidental) attacks. The intentional attacks are directed against single frame or video sequence and divided into common image processing and geometric attacks. However, different permutations attacks, which cannot be concerned to the mentioned below main types of attacks, are also available. The unintentional attacks are common processing attacks against video sequence including the glossy copying, JPEG compression, change of frame rate, and change of resolution. Verification of watermark robustness is estimated using the Normalized Cross Correlation (NCC), Bit Error Rate (BER), or Bit Correct Ratio (BCR) metrics for embedded and extracted watermarks. NCC metric is calculated by Eq. 6.16, where W (i, j) and W (i, j) denote the pixel values at coordinates (i, j) in the original and extracted watermarks with sizes (K × L), respectively.
K L
j=1 W (i, j) · W (i, j)
K L 2 i=1 j=1 W (i, j)
i=1
N CC =
(6.16)
Value range of NCC is [0, 1]. The higher value of NCC, the stronger robustness of watermarking scheme. BER metric shows the error rate between the extracted and original watermarks. The smaller BER, the better the robustness. Its computation is provided by Eq. 6.17.
K L BE R =
i=1
j=1
W (i, j) − W (i, j) K×L
· 100%
(6.17)
BCR metric is estimated the difference between the original and extracted watermarks. BCR value ranges in [0, 1], where values close to 1 means that the extracted watermark is similar with the original watermark and the robustness is high. BCR metric is calculated by Eq. 6.18, where a sign ⊕ denotes the exclusive-or operation.
K L BC R =
i=1
j=1
W (i, j) ⊕ W (i, j) K×L
(6.18)
Payload capacity is the number of bits those can be embedded into the host frame without corrupting its visual quality. The imperceptibility and robustness of frame
6 Watermarking Models of Video Sequences
73
Fig. 6.3 Relationship among imperceptibility, capacity, and robustness
conflict with each other under the high payload scenarios. This is a fundamental problem to achieve a suitable tradeoff between the payload, fidelity, and robustness. The high capacity watermark embedding can be obtained by compromising either fidelity or robustness (Fig. 6.3). Generally speaking, it is impossible to design a watermarking algorithm that achieves three parameters optimally at the same time. The compromise is defined by practical application. In audio data, the payload is measured by the number of bits that can be encoded in one second, i.e. the number of bits per second (bps). The details can be found in [20]. Complexity. It is desirable that a watermarking scheme has a reasonable computational cost. In recent years, we see an increased complexity of watermarking schemes, especially in frequency domain, that makes a hope to execute a watermarking algorithm in real-time mode unattainable. Security. The watermarking scheme’s security is provided by the encryption techniques including the choice of a secret key. Arnold’s transform remains the popular approach among the scrambling methods, which transform meaningful watermark into the disorder and unsystematic patterns. Arnold’s transform refers to the chaotic transforms [21]. It increases the robustness against the cropping attacks. 2D Arnold’s transform has a view of Eq. 6.19, where (x, y) are the pixel’s coordinates of the original watermark, (x , y ) are the pixel’s coordinates of the scrambled watermark, a and b are the coefficients of transform, K = L. x x 1 a = (6.19) (mod K ) b ab + 1 y y Often Eq. 6.19 is simplified by so called the Arnold’s cat map: x x 11 (mod K ). = 1 2 y y
(6.20)
74
M. N. Favorskaya
The transformation includes three steps during the one iteration: • Shear in the OX direction by a factor of 1. • Shear in the OY direction by a factor of 1. • Evaluate the modulo. Arnold’s transform is a periodic and invertible mapping. Full number of iterations that leads to the original image is known as Arnold’s period. The chosen number of iterations provides a secret key. During decryption, a scrambled image is iterated the predefined times (Arnold’s period minus value of secret key). As a result, the original image is reconstructed successfully. For color watermarks, YCbCr color space (Y is a luminance, Cb and Cr are chrominance components) is recommended, when only Y-component is processed by Arnold’s transform. Another approach deals with Fibonacci-Lucas transform, which is robust against the statistical attacks as well as the noise, cropping, and compression attacks. Fibonacci-Lucas transform is defined as x x Fi Fi+1 = (6.21) (mod K ), L i L i+1 y y where F i is the term of Fibonacci series ⎧ if n = 1 ⎨1 Fn = 1 if n = 2 ⎩ Fn−1 + Fn−2 otherwise and L i is the term of Lucas series ⎧ if n = 0 ⎨2 Ln = 1 if n = 1 ⎩ L n−1 + L n−2 otherwise For a watermark encryption, the chaotic maps, logistic chaotic maps, pseudorandom number generator, encoding, and spread spectrum technique can be applied. However, these transforms have the shortcomings of high computational complexity and low operation efficiency, also the difficulties of deciphering [22]. Some watermarking algorithms for hiding information into a sound track of video sequences were proposed [23]. The preparation of watermark from the speech consists of several stages: • • • • •
Compression of speech signal to shorten the watermark. Speech signal enhancement to remove a noise in the frequency domain. Speech signal conversion into a bit stream. Watermark generation using the message of the copyright of the holder. Converted speech signal.
6 Watermarking Models of Video Sequences
75
The bit-level speech marking is similar to the image-based watermarking technique [24].
6.6 Conclusions The proposed video watermarking models allows to choice the corresponding schemes among three basic watermarking strategies of compressed video sequences. The unsolved problem remains when a compressed video is re-compressed additionally by any codec. In this case, the stable regions disappear randomly. A special issue is the evaluation of the extracted watermark, which is damaged by unknown types of attacks. Some objective functions are considered and their linear combination with the weighed coefficients can serve as the generalized criterion for evaluation of the extracted watermark. Acknowledgements The reported study was funded by the Russian Fund for Basic Researches according to the research project № 19-07-00047.
References 1. Cox, I., Kilian, J., Leighton, T., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Trans. Image Process. 6(12), 1673–1687 (1997) 2. Hartung, F., Girod, B.: Watermarking of uncompressed and compressed video. Sig. Process. 66, 283–301 (1998) 3. Maes, M., Kalker, T., Linnartz, J.P.M.G., Talstra, J., Depovere, G.F.G., Haitsma, J.: Digital watermarking for DVD video copyright protection. IEEE Signal Process. Mag. 17(5), 47–57 (2000) 4. Venugopala, P., Sarojadevi, H., Chiplunkar, N.N., Bhat, V.: Video watermarking by adjusting the pixel values and using scene change detection. IEEE 5th International Conference on Signal and Image Processing, pp. 259–264 (2014) 5. Aly, H.A.: Data hiding in motion vectors of compressed video based on their associated prediction error. IEEE Trans. Inform. Forensics Secur. 6(1), 14–18 (2011) 6. Jordan, F., Kutter, M., Ebrahimi, T.: Proposal of a watermarking technique for hiding/retrieving data in compressed and decompressed video. Technical report M2281, ISO/IEC document, JTC1/SC29/WG11, Stockholm, Sweden (1997) 7. Zhang, J., Li, J.G., Zhang, L.: Video watermark technique in motion vector. Proceedings of XIV Brazilian Symposium on Computer Graphics and Image Processing, pp. 179–182 (2001) 8. Bodo, Y., Laurent, N., Dugelay, J.L.: Watermarking video, hierarchical embedding in motion vectors. International Conference on Image Processing, vol. 2, pp. 739–742 (2003) 9. Zhang, L.-H., Xu, G., Zhou, J.-J., Shen, L.-J.: A video watermarking scheme resistant to synchronization attacks based on statistics and shot segmentation. In: IEEE 7th International Conference on Intelligent Systems Design and Applications, pp. 603–608 (2007) 10. Lin, D.-T., Liao, G.-J.: Swarm intelligence based fuzzy C-means clustering for motion vector selection in video watermarking. Int. J. Fuzzy Syst. 10(3), 185–194 (2008) 11. Fang, D.-Y., Chang, L.-W.: Data hiding for digital video with phase of motion vector. In: International Symposium on Circuits and Systems, pp. 1422–1425 (2006)
76
M. N. Favorskaya
12. Shamim Hossain, M., Muhammad, G., Abdul, W., Song, B., Gupta, B.B.: Cloud-assisted secure video transmission and sharing framework for smart cities. Fut. Gener. Comput. Syst. 83, 596–606 (2018) 13. Favorskaya, M., Pyataeva, A., Popov, A.: Texture analysis in watermarking paradigms. Procedia Comput. Sci. 112, 1460–1469 (2017) 14. Favorskaya, M.N., Jain, L.C., Savchina E.I.: Perceptually tuned watermarking using nonsubsampled shearlet transform. In: Favorskaya, M.N., Jain L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 136, pp. 41–69. Springer International Publishing, Switzerland (2018) 15. Ohm, J.-R., Sullivan, G.J., Schwarz, H., Tan, T.K., Wiegand, T.: Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC). IEEE Trans. Circuits Syst. Video Technol. 22(12), 1669–1684 (2012) 16. Meerwald, P., Uhl, A.: Robust watermarking of H.264/SVC-encoded video: quality and resolution scalability. In: Kim, H.-J., Shi, Y., Barni, M. (eds.) Digital Watermarking, LNCS, vol. 6526, pp. 159–169. Springer International Publishing (2010) 17. Van Caenegem, R., Dooms, A., Barbarien, J., Schelkens, P.: Design of an H.264/SVC resilient watermarking scheme. Proceedings of SPIE, Multimedia on Mobile Devices, vol. 7542, (2010). https://doi.org/10.1117/12.838589 18. Park, S.W., Shin, S.U.: Combined scheme of encryption and watermarking in H.264/scalable video coding (SVC). In: Tsihrintzis, G.A., Virvou, M., Howlett, R.J., Jain, L.C. (eds.) New Directions in Intelligent Interactive Multimedia, SCI, vol. 142, pp. 351–361. Springer-Verlag Berlin, Heidelberg (2008) 19. Park, S-W, Shin, S.U.: Authentication and copyright protection scheme for H.264/AVC and SVC. J. Inf. Sci. Eng. 27,129–142 (2011) 20. Ortiz, M.A.M., Feregrino-Uribe, C., García-Hernández, J.J.: Reversible watermarking scheme with payload and signal robustness for audio signals. Technical Report No. CCC-15–006 (2015) 21. Arnol’d, V.I., Avez, A.: Ergodic Problems of Classical Mechanics. Mathematical Physics Monograph Series, New York, Benjamin (1968) 22. Yu, X., Wang, C., Zhou, X.: A survey on robust video watermarking algorithms for copyright protection. Appl. Sci. 8, 1891.1–1891.26 (2018) 23. Wang, H., Cui, X., Cao, Z.: A speech based algorithm for watermarking relational databases. IEEE International on Symposium on Information Processing, pp. 603–606 (2008) 24. Hu, Z., Cao, Z., Sun, J.: An image based algorithm for watermarking relational databases. In: IEEE 2009 International Conference on Measuring Technology and Mechatronics Automation, pp. 425–428 (2009)
Chapter 7
Experimental Data Acquisition and Management Software for Camera Trap Data Studies Aleksandr Zotin and Andrey Pakhirka
Abstract At present, use of camera traps is widespread, and their importance in wildlife studies is well understood. Camera trap studies produce vast amount of images and there is a need for software, which will able to manage this data and make the automatic annotations to help studies. The chapter is devoted to the description of the experimental data management system used to process the images obtained from the camera traps. The chapter considers a description of the modules of such system, as well as, methods used during software implementation. The proposed software system has the ability to automatically extract metadata from images and associate customized metadata to the images in a database. Additional metadata are formed by set of algorithms, which in automated mode allows us to detect empty images, conduct animal detection with species classification, and make a simple semantic description of observed scene. Keywords Camera traps · Data management · Image processing · Image analysis · Multi-Scale Retinex · Animal detection · CNN · Semantic description
7.1 Introduction One of the main goals is the study and preservation of biodiversity and regulation of the impact of human activity on ecosystems. Vast areas of national parks and conservation areas in Russia make it difficult to conduct monitoring of wild animals and birds by traditional methods. Estimation of the number, species composition and other characteristics of the fauna on the basis of periodic expeditions with the study of animal life traces and their behavior is very expensive. At the same time, the obtained results are usually subjective and unreliable. Studying and monitoring wildlife can be achieved by means of non-invasive sampling techniques, such as A. Zotin (B) · A. Pakhirka Reshetnev Siberian State University of Science and Technology, Institute of Informatics and Telecommunications, 31, Krasnoyarsky Rabochy Ave., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] A. Pakhirka e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_7
77
78
A. Zotin and A. Pakhirka
the camera trap approach. Use of camera traps, which record the events with high probability of animal appearance, provides unique information that is not available to other monitoring methods. Camera traps are an ideal instrument for scientific observation of wild species because they are relatively non-intrusive in the local environment, apply infrared flashes that do not affect animal behavior, and provide large quantities of data and metadata in image format. Although camera trap capturing is a useful methodology, this method generates a large volume of images. Therefore, it is a big challenge to process the recorded images and even harder, if the biologists are looking to identify all photographed species. At present, a huge amount of visual information obtained from camera traps is accumulated. Although the camera traps should only capture animal images, the method generates a lot of false positive captures (images without animals). For example, in Snapshot Serengeti database [1] around 26.8% of the images contain animals. As a result, wildlife scientists must analyze thousands of photographs, which do not show wildlife. Currently, no automatic approach is used to identify species from camera trap images. Often the image content identification and information management called as image interpretation or annotation is very time-consuming process. Thus, the problem of analysis of the accumulated images and formation of unified information resources for the ecological monitoring of Krasnoyarsk region becomes urgent. At the stage of image analysis, it is required to solve such tasks as: selection of events from a series of images, marking of non-informative images, and identification of observed objects (animals) by species. The generated image description represents the basis of the data necessary for ecological monitoring. Image annotation can be performed completely manually, but is becoming increasingly facilitated by technique, such as extraction of metadata from images for certain data (e.g., date and time). Animal detection and semantic description of observed scene can be done with help of artificial neural networks. The remainder of this chapter is organized as follows. In Sect. 7.2, the related work is analyzed. Section 7.3 describes the camera traps data. Section 7.4 presents the proposed experimental software system. The conclusions are given in Sect. 7.5.
7.2 Related Work Traditionally, processing camera trap data has been performed manually by entering data into a spreadsheet. This is time-consuming process pruned to human error, and data management may be inconsistent between projects, hindering collaboration. Recently, vast amount of data being generated by camera trap studies around the world has led to an increasing number of programs to manage and process the generated image data. At present, software systems, such as eMammal [2], TRAPPER [3], Camelot [4, 5], Camera Base [6], and Aardwolf [7] are used for specific research. Most of the programs were developed as the specific projects, and accordingly, image classification is tailored to the respective project focus. Thus, many
7 Experimental Data Acquisition and Management …
79
researchers conduct experiments with different approaches to animal detection and species classification. During development of animal monitoring systems, various approaches can be used to implement an animal detection. One approach is to use the deep neural network, which allows to determine the presence of animals [8], but for its full use requires a sufficiently large amount of training. The second approach utilizes a frame differencing to find motion areas. Part of these areas can be detected due to the changes in light and swaying of vegetation and also correspond to the position of animals in the previous image. A trained classifier is used to distinguish between such areas and areas with animals [9]. The third approach for animal detection is based on background modeling of scene using a series of images. The simplest is the approach based on the background model, since it allows to detect an animal without using classifiers [10–12]. One of the challenging tasks is the species identification and formatting description of observed scene. In last years, various methods have been developed for wild animal monitoring, for example sparse coding spatial pyramid matching and Convolutional Neural Networks (CNN). The spatial pyramid matching for this task was used in [13]. This approach allowed to recognize 18 species of animals reaching 82% of accuracy [13]. According to research of Swanson et al. human accuracies lie around 96.6% [1]. Spatial pyramid matching uses the scale-invariant feature transform descriptors and local binary patterns as local features. The images are classified using a linear support vector machine. In order to use such methods, the camera-trap images should be preprocessed by cropping animal’s body, removing images without animals, and selecting images with whole body of animals. One of the first attempts to use CNNs for animal species classifying from cameratrap images was conducted by Chen et al. [14]. During research, a relatively small dataset (around 23,000 images) of University of Missouri that includes 20 species was used. At that time, CNN demonstrated 38% accuracy. However, the potential of CNN is a lot higher. Lately, a deep CNN was used for automatically identifying animal species on Snapshot Serengeti dataset with 26 species on 780,000 images and provided high accuracies around 90% with a manual preprocessing [15]. The Norouzzadeh et al. [16] reported that the accuracies of automatically classifying 48 species achieved 94% using Snapshot Serengeti dataset, which contains 3.2 million images. In this study, classification two classes (animal and non-animal) provided 96.8% accuracy. Apart from these methods, many approaches use very deep CNN (AlexNet [17], VGGNet [18], GoogleNet [19], and ResNets [20]) that have a higher learning capacity.
7.3 Camera Traps Data During the software development, the used camera traps were taken into account. Thus, the sets of images captured by camera traps in different regions of Ergaki Natural Park, Krasnoyarsky Kray, Russia, 2012–2018 were used. There are camera traps,
80
A. Zotin and A. Pakhirka
which can capture images in day-time color and night-time formats with different spatial resolution. Ergaki dataset contains more than 50,000 images obtained in all seasons. Part of this dataset was annotated by experts. Currently, data from 10 camera traps were loaded in the developed system. They include the captured images in day-time (color) and night-time (grayscale) formats with different spatial resolution. Cameras 1–4 produce images with resolution 2592 × 1944 pixels, cameras 5–7 resolution is 3264 × 2448 pixels, resolution of cameras 8 and 9 is 4000 × 3000 pixels, and camera 10 resolution is 1920 × 1080 pixels. The examples of images from some of these camera traps taken at day-time and night-time are presented in Fig. 7.1.
Fig. 7.1 Examples of images captured by camera traps a day-time images, b night-time images
7 Experimental Data Acquisition and Management …
81
Quality of images captured by camera traps depends on the time and weather conditions. Images can have an uneven illumination and low contrast, which increases a complexity of the analysis, both by a person and computer system. Therefore, system should have implemented image enhancement algorithms.
7.4 Proposed Software System Workflow of typical camera trap data management system includes three distinct stages: file management, data annotation, and data extraction. Analysis that follows data extraction is specific to a study and can be conducted by specialists. However, software system can prepare data for the analysis in automated ways. The generalized structural scheme of proposed system is shown in Fig. 7.2. Since uneven illumination has a great influence on the background model formation and visual understanding of images, we decided to use image enhancement based on the modified Multi-Scale Retinex (MSR) algorithm, which utilizes wavelet transform to speed up the calculations [21]. In order to automate annotation, modules for automatic image importing with metadata extraction, semi-automatic background
Fig. 7.2 The generalized structural scheme of proposed system
82
A. Zotin and A. Pakhirka
model formation module, and sematic builder, which utilizes CNNs data, were introduced. CNN training module, which allows to extend dataset by augmentations, was also included in system. Hereinafter, the main processing modules providing the data management, preliminary analysis, image enhancement, animal detection, CNN control, and semantic description are highlighted in Sects. 7.4.1–7.4.6, respectively.
7.4.1 Module of Data Management Generally, a software program dealing with camera trap data needs to manage two types of data: it needs to organize files and manage the metadata associated with individual files. Both types of data can be stored in an external database. One of a way of organizing files in a camera trap study is to store them in a three-level hierarchical directory, depending on project. Since our experimental software designed for monitoring of Krasnoyarsk region ecosystems, we decided to use a park name as the first level, year of image acquisition as the second level, and camera identifications as the third level. At present, the amount of obtained data is slightly less than 300 GB, so it was decided to use one HDD with replication on network attached storage. Proposed system supports logical partitioning of data with an option for a partition to be physically located anywhere in an operating system’s filesystem. Relying on an operating system’s filesystem interface provides a flexibility to separate logical organization of files from its physical location. This means that the partitions can be stored on additional hard drives or networked storage systems, thereby providing the ability to work with an arbitrarily large number of files. Since the directory structure is used for image storage, and the metadata are stored in the database, the situations, in which the same file will be loaded into the system several times (because it placed in different folders), may occur. To prevent multiple processing of the same image, which is stored in different places, it was decided to store additional information, such as MD5 hash and CRC32 codes. Data annotation for camera trap studies largely means identification of objects and features in individual images or image series. There are several ways to store annotation information and metadata: within the images or in an external database. Storing annotation information in an external database is efficient, when fast retrieval of data is required. This is a preferred way for software developed to deal specifically with camera trap data. If metadata saved within the images or as separate file in the corresponding folder, then the processing speed will be decreased. In proposed system, we decided to use MySQL database management system. Then each image can be annotated with user-defined metadata that is identified in an image by a human expert and/or algorithms of animal detection and classification, as well as, semantic description of an observed scene.
7 Experimental Data Acquisition and Management …
83
7.4.2 Module of Preliminary Analysis Attributive data of images obtained by camera trap allow to get important information for the analysis by specialists. For example, using date and time of recorded files, we can conclude about the distribution and duration of camera trap operation. Camera trap creates a sequence of images with time interval of several seconds between images after motion sensor is triggered. Therefore, the volume of a series of images may indicate an animal’s activity or a false triggering of the device (movement of vegetation under the strong wind, heavy rainfall, etc.). To simplify the analysis, all images are uploaded to the system through a specialized module. The module does not get only the technical characteristics of the image, but also allows to add information in the database regarding the shooting time and temperature parameters. The interface of the import and preliminary analysis module is shown in Fig. 7.3. As practice shows, the number of non-informative images is quite large—from 25 to 75%. Therefore, it is important to provide the ability to automatically tag “empty” images. In the future, taking into account this tag, it is possible to exclude images from subsequent processing or use them to build a background model. The detection of informative images from a series of large-size images (5–12 megapixels), taking into account the constantly changing lighting and weather conditions, is performed using a rough motion estimation [22]. The algorithm used in the system is based on a modified method of block matching with overlapping motion maps. For motion estimation used normalized representation of a map of blocks. The idea of the used normalization is based on the assumption that within a small region the illumination
Fig. 7.3 Screenshot of importing image data
84
A. Zotin and A. Pakhirka
Fig. 7.4 Example of normalized map formation
change affects all the pixels evenly. Example of stage for formation of normalized map is presented in Fig. 7.4. The generated motion map with two images can contain the area (so called “ghost”) corresponding to the position of the animal in the previous image. In order to get rough motion map in the proposed module, we use two motion maps (i.e. process three consequent images). The general flowchart demonstrating internal processing stages is shown in Fig. 7.5. In software implementation, the calculations are performed using integral images, which allows to process up to 100 images per minute. According to experimental research the accuracy of determining images, which have information value to specialists, is up to 94%. Methods described in [23] were used as additional verification factor of informational value of image. For recognition of textual information of the image, the text-recognition library Tesseract-OCR (libtesseract) is used [24]. To optimize the processing of a large number of single-type images, a mechanism for defining the recognition area is provided. Typical area containing valuable text information is placed at the bottom of image. However, it also can be at the top of image. The text data in image usually contains date and time of shooting, temperature parameters (Celsius and/or Fahrenheit), and the phase of the Moon as separate information. Since errors can sometimes occur during recognition, the system implements a mechanism for checking the correctness of temperature (for cameras, in which two corresponding values are reflected) and
Fig. 7.5 Flowchart of normalized map creation
7 Experimental Data Acquisition and Management …
85
date/time. The temperature is checked using the Fahrenheit to Celsius and Celsius to Fahrenheit conversion formulas. When a discrepancy is found, the information in the database is marked with the appropriate label. Thus, if necessary, users will be able to adjust the values.
7.4.3 Module of Image Enhancement Uneven illumination has a great influence on image understanding. Thus, uneven illumination and low contrast increases the complexity of the analysis, both by a person and computer system. The modified MSR-algorithm allows to increase the accuracy of animal localization [22] and obtain more clear images for specialist. Figure 7.6 demonstrates an example of uneven illumination, which obstruct analysis by specialists and algorithms for automation of annotations creation. The workflow of MSR-based algorithm is shown in Fig. 7.7. The main processing is conducted using a brightness value according to HSV color model. The brightness correction conducted only in the low-frequency area after wavelet transform, which
Fig. 7.6 Examples of uneven illumination on camera traps images
Fig. 7.7 Workflow of MSR-based algorithm
86
A. Zotin and A. Pakhirka
Fig. 7.8 Example of illumination enhancement by modified MSR algorithm from Fig. 7.6
allows us to gain performance boost. During brightness normalization, the average level of brightness sets to some desirable value. Figure 7.8 shows illumination enhancement by modified MSR-based algorithm of images presented in Fig. 7.6.
7.4.4 Module of Animal Detection Animal detection module is designed to prepare information for further processing. So, this module enters information about the area coordinates of an animal into the database. In the following processing, the animal presence will be determined and conducted its species classification. In order to determine the estimated location of the animal in an image, we use the background model. Screenshot reflecting the interface for setting up the process of creating a model of the background is shown in Fig. 7.9. The background model is based on information about brightness, color, and such statistics as Pixel-wise Standard Deviation (PSD), Block Based Standard Deviation (BBSD), and its variance [22]. Depending on characteristics of input images, i.e. captured at night-time or day-time, mean value is calculated using grayscale image or color image, respectively. The background model is generated for specific period of time and used automatically during animal detection process. If several models with overlapped time interval exist, then animal detection conducted using these models. After applying animal detection algorithms, the averaged results of motion maps are used to define the animal boundaries. Figure 7.10 demonstrates the examples of generated background maps for nighttime and day-time images. In order to get better visual interpretation of maps containing information of PSD and mean value of BBSD, multiplication of values by 5 was executed, and, for map of BBSD variance, the values are multiplied by 20. The demonstration of animal detection algorithms stages is shown in Fig. 7.11. After a motion map is created, a boundary box for region with high probability of animal positioning is calculated (margins around 3–5% from motion map blob).
7 Experimental Data Acquisition and Management …
87
Fig. 7.9 Screenshot of background model creation
Fig. 7.10 Examples of background maps: a mean value, b pixel wise standard deviation, c mean of block based standard deviation, d variance of block based standard deviation
7.4.5 Module of CNN Control This module consists of a number of subsystems including training set preparation module, training unit, and verification module. After setting the dataset, on which CNN training will take place, it is divided into three sets: training, validation, and test. By default, the ratio of samples for each species of animal is represented by
88
A. Zotin and A. Pakhirka
Fig. 7.11 Scheme of animal detection algorithm
the following values of 70–20–10%, respectively. The training set falls into the images augmentation module. This module expands the images set by providing the following arbitrary geometric transformations: rotation, flipping of image with the following horizontal alignment, resizing according to defined range, scaling, and cropping. Augmentation module also provides a possibility to change the color characteristics of images. Thus, it is possible to simultaneously or independently change the spectrum values in RGB or HSV color models. Screenshot of interface of the module for setup CNN training is shown in Fig. 7.12. In implemented system, TensorFlow 1.5 framework is used to work with CNNs. We decided to use VGG16 (Visual Geometry Group, Department of Engineering Science, University of Oxford), which presented in [18], as CNN architecture. A generalized workflow of the training module is shown in Fig. 7.13. Dataset used in preliminary studies contains images of 12 species. More than 11,000 images from the dataset were annotated by 2 experts. Samples in our experimental dataset were classified manually into animal species represented in Table 7.1. The trained CNN shows better accuracy results achieving 80.4% Top-1 and 94.1% Top-5, respectively, for Ergaki dataset. In the case of the unbalanced training dataset, we obtained 38.7% Top-1 and 64.8% Top-5 accuracy, respectively [25].
7.4.6 Module of Semantic Description The semantic description module allows to create a descriptive characteristic of images using information obtained from artificial neural networks. So, the system has two variants of the neural network. The first network determines the type of animals and their number in the image. The second neural network is trained to describe auxiliary parameters related to their behavior. For these purposes, additional
7 Experimental Data Acquisition and Management …
89
Fig. 7.12 Screenshot of CNN training module Fig. 7.13 Worklow of CNN training
Table 7.1 Samples in dataset obtained from Ergaki Nature Park
Species
Samples
Species
Red fox
3123
Moose
Brown bear
3732
Western capercaillie
848
Wild boar Badger White hare
Samples 959 4332 639
Lynx
1007
3824
Maral
3510
Squirrel
118
Roe deer
4228
Dog
512
90
A. Zotin and A. Pakhirka
Fig. 7.14 Scheme of semantic description
information is used about the objects in the images, as well as, information about the change in the position of animals for a series of frames (animal motion—trajectory). Based on the obtained information, a semantic description of the behavior of animals is built using a number of production rules. The basic information for rules includes the number and type of animals, presence of objects related to activities, such as watering places, or animal resting places, and description of the movement trajectory, as well as, the interactions between animals. Scheme of semantic description formation is shown in Fig. 7.14. If semantic description is not fully relevant to the observed scene, then user can easily change the automatically generated description and re-define the tags of describing image. Screenshot of interface of semantic description details is shown in Fig. 7.15.
7.5 Conclusions Software for animal detection and automated annotation can provide numerous benefits for animal monitoring. Thus, application of automated system for annotation of camera trap images can reduce the costs, lag time between data collection and data analysis, and potentially reduce observer errors occurred in low-light images. Also, the designed software system increases the efficiency of data management for noninvasive techniques without significantly sacrificing analytical accuracy that allows to provide better monitoring of animal populations.
7 Experimental Data Acquisition and Management …
91
Fig. 7.15 Screenshot of semantic description and detailed results of tags marking
Acknowledgements The reported study was funded by Russian Foundation for Basic Research, Government of Krasnoyarsk Territory, Krasnoyarsk Regional Fund of Science, to the research project 18-47-240001.
References 1. Swanson, A., Kosmala, M., Lintott, C., Simpson, R., Smith A., Packer C.: Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci. Data 2, 150026.1–150026.14 (2015) 2. eMammal: Smithsonian Institute. Available at: https://emammal.si.edu. Accessed 7 May 2019 3. Bubnicki, J.W., Churski, M., Kuijper, D.P.: TRAPPER: an open source web-based application to manage camera trapping projects. Methods Ecol. Evol. 7, 1209–1216 (2016) 4. Hendry, H., Mann, C.: Camelot—intuitive software for camera-trap data management. Oryx 52(1), 15.1–15.11 (2018) 5. Camelot Documentation Release 1.5.2. Available at: https://camelot-project.readthedocs.io/en/ latest/gettingstarted.html. Accessed 7 May 2019 6. Camera Base. Available at: http://www.atrium-biodiversity.org/tools/camerabase/. Accessed 7 May 2019 7. Aardwolf camera trap management software. Available at: https://github.com/yathin/ aardwolf2. Accessed 7 May 2019 8. Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C., Clune, J.: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. 115(25), 5716–5725 (2018)
92
A. Zotin and A. Pakhirka
9. Castelblanco, L.P., Narvaez, C.L., Pulido, A.D.: Methodology for mammal classification in camera trap images. In: Proceedings of SPIE Ninth International Conference Machine Vision, vol. 10341, pp. 103410I.1–103410I.7 (2017) 10. Bouwmans, T., Baf, F. El, Vachon, B.: Statistical background modeling for foreground detection: a survey. In: Handbook of Pattern Recognition and Computer Vision, vol. 4, pp. 181–199 (2010) 11. Bouwmans, T.: Traditional and recent approaches in background modeling for foreground detection: an overview. Comput. Sci. Rev. 11–12, 31–66 (2014) 12. Bouwmans, T., Garcia-Garcia, B.: Background subtraction in real applications: challenges, current models and future directions. arXiv preprint 1901.03577 (2019) 13. Yu, X., Wang, J., Kays, R., Jansen, P.A., Wang, T., Huang, T.: Automated identification of animal species in camera trap images. EURASIP J. Image Video Process. 1, 52 (2013). https:// doi.org/10.1186/1687-5281-2013-52 14. Chen, G., Han T.X., He, Z., Kays, R., Forrester, T.D.: Deep convolutional neural network based species recognition for wild animal monitoring. In: 2014 IEEE International Conference on Image Processing, pp. 858–862 (2014) 15. Villa, A.G., Salazar, A., Vargas, F.: Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol. Inf. 41, 24–32 (2017) 16. Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C., Clune, J.: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. PNAS 115(25), E5723.1–E5723.10 (2018) 17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 25th International Conference on Neural Information Processing Systems, vol. 1, pp. 1097–1105 (2012) 18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Int. Conf. Learn. Representations 1–14 (2015) 19. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 1–9 (2015) 20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 770–778 (2016) 21. Zotin, A.: Fast algorithm of image enhancement based on multi-scale Retinex. Procedia Comput. Sci. 131, 6–14 (2018) 22. Zotin, A.G., Proskurin, A.V.: Animal detection using a series of images under complex shooting conditions. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-2(W12), 249–257 (2019) 23. Favorskaya, M., Buryachenko, V.: Selecting informative samples for animal recognition in the wildlife. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies SIST, vol. 143, pp. 65–75. Springer, Singapore (2019) 24. Tesseract-OCR. Available at: https://github.com/tesseract-ocr/. Accessed 7 May 2019 25. Favorskaya, M., Pakhirka, A.: Animal species recognition in the wildlife based on muzzle and shape features using joint CNN. Procedia Comput. Sci. 159, 933–942 (2019)
Chapter 8
Two-Stage Method for Polyps Segmentation in Endoscopic Images Nataliia A. Obukhova, Alexander A. Motyko and Alexaner A. Pozdeev
Abstract An important feature of medical images is high variability, which makes it difficult to use traditional models of machine learning and tightens the requirements for the database size for training of Convolutional Neural Networks (CNN). Neural network approaches to segmentation are relevant and effective, which leads to their widespread use. However, the limited amount of training facilities available, which is typical for medical image processing, requires the search for additional solutions. In this chapter, we propose method to improve the quality of segmentation of polyps in endoscopic images using CNN introducing a preliminary stage of binary classification based on the use of global features extracted from the image. Our research show: using the binary classification as a preliminary segmentation stage increases Dice score more than 10% in conditionals of small database in CNN training. Keywords Endoscopic image processing · Automatic polyps detection · Polyps segmentation · Global feature analysis · Convolutional neural networks
8.1 Introduction Clinical Decision Support System (CDSS) can be used to solve various medical tasks, such as prevention or screening, treatment, diagnosis, and medication. CDSS involves the integration of diagnostic results carried out by the physician and automatic analysis, and, thus, provide a high sensitivity and specificity of diagnosis. Due to the worldwide increase in life expectancy, the relative and absolute incidence of cancer is increasing. More than 700,000 people die of gastric cancer in the world every year [1]. Gastroscopy allows early detection of pathology changes in the stomach, which makes it possible to ensure timely treatment and avoid more serious consequences. The main tool for examine a person’s digestive tract is a video endoscope. The video sensor at the end of the endoscope converts the optical image into a digital signal that is transmitted by cable to the processor. In the processor, this signal is processed and then displayed on the monitor. During the examination, N. A. Obukhova (B) · A. A. Motyko · A. A. Pozdeev Saint Petersburg Electrotechnical University “LETI”, ul. Professora Popova 5, 197376 St. Petersburg, Russian Federation © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_8
93
94
N. A. Obukhova et al.
physician makes decisions about confirming or excluding the alleged diseases, as well as, conducting further procedures based on the information available on the monitor screen. Therefore, the effectiveness of gastroscopy is largely determined by the quality of the image and video data presented to the physician, as well as, the qualifications of the doctor. One of the important tasks of gastroscopy is polyp detection. Polyps are high cancer risk factor and it is important in medicine practice to realize early polyps detection and removing. Polyps have high variability of their shape, size, and appearance. The probability of polyps’ correct detection is strongly connected with physician experience. There is an alternative procedure for examining the gastrointestinal tract, which is often used to detect bleeding and polyps. A small capsule wireless endoscopic camera is swallowed by the patient, capsule travels through the gastrointestinal tract and recording video data. Unlike a laborious and expensive screening procedure, capsular endoscopy is practically painless and does not cause discomfort in patients. After the procedure, the physician must view the recorded video which duration can be very long. Automatic segmentation of polyps allows to provide for viewing only the important fragments that require his/her immediate attention and, thereby, save his/her time. Taking into account the importance of detecting polyps as in traditional and wireless gastroscopy, it is proposed to develop a method of automatic analysis for the segmentation of polyps, providing high sensitivity and specificity. The chapter is organized as follows. Related works are presented in Sect. 8.2. In Sect. 8.3, the main approaches of machine learning for images classification with the aim of polyp’s detection are considered. Section 8.3 contains several subsections, where the following is described in detail: the idea of the proposed approach, choice of the database of images for research, and description of the stages of binary classification and segmentation. Experimental studies are considered in Sect. 8.4. Concluding remarks are presented in Sect. 8.5 of the chapter.
8.2 Related Work The problem of automatic detection of polyps is highly demanded in gastroscopy, which confirms a significant number of studies in this area. The automatic approaches for polyp classification presented in scientific articles differ in the choice of a classification strategy, a set of features (if available), a model for combining them into a feature space, and methods for pre-processing images [2]. The solution of the classification problem in medical diagnostics requires consideration of the following features: • Performance requirement. • Significantly small base for training of classification models. • High degree of congruence between classes.
8 Two-Stage Method for Polyps Segmentation …
95
• Complex conditions of images acquisition. In the case of machine learning methods application, it is required to determine a training model with regard to its accuracy and efficiency. This is especially important in medicine because of the requirement of high-precision decisions with a minimum of delay. In the case of a diagnostic examination, the image processing or video sequence procedures can be applied to an already recorded file and do not require much speed performance. However, during medical operation and when used in corresponding real-time applications, processing speed is critical. For the above reason, classification techniques with low computational cost, such as Bayesian models and linear regression models, can potentially be useful. A common problem in the development of medical systems of artificial intelligence is the absence of sufficient data for training. Thus, most of the solutions are associated with traditional machine learning technique, such as Support Vector Machine (SVM) [3], Random Decision Forest (RDF) [4], Logit-regression, perceptron, since they are capable of solving a problem with relatively small amount of available data. Li et al. [5] focused on the features of the shape of polyps and used the responses of MPEG-7 form descriptor based on the region and Zernike moments as features and the multi-layer perceptron neural network as classifier. Karargyris et al. [6] searched texture features using logs of Gabor and determined the boundaries of polyps using Smallest Univalue Segment Assimilating Nucleus (SUSAN) edge detector [7]. Later, the authors added SVM classifier for the binary classification [8] to improve the proposed algorithm. However, the computational complexity of the edge detection does not allow to create the real-time applications. Another type of texture evaluation filter was used by Nawarathna et al. [9], where the authors formed a dictionary of textural elements and then used kNN classifier to analyze the distribution of textual elements in an image block. Complex observation conditions and objects of interest create additional difficulties in solving the classification problem. Sharp camera shifts, the appearance of bubbles, and the movement of the mucous membrane cause the noticeable artifacts in the image, such as reduced sharpness, the appearance of specular highlights, geometric distortion, which significantly affects the quality of the resulting images and leads to a decrease in the classification accuracy. The high similarity ratio of objects of interest with the background is essential in the classification of polyps. Recently, a fundamentally different approach to solve the problem of segmentation of objects in an image has become popular. Methods of deep learning and especially CNN are trend among artificial intelligence methods, so researchers are increasingly trying to use them in medical video systems and endoscopic systems. In [10], the authors use CNN for three different image scales to increase accuracy, as well as, Gaussian filter for smoothing the result and reducing noise. In [11], three different CNNs are used to classify input data. In the article [12], the authors propose to apply Otsu threshold and choose the largest related area by area among all candidate areas, which allows to reduce the number of false positives. It is important to emphasize that the implementation of segmentation based on neural networks in relation to medical images is largely hampered by the need to use a very large base for training.
96
N. A. Obukhova et al.
It is extremely difficult to provide a sufficient amount of a training set with polyps marked up by an experienced physician.
8.3 Proposed Two-Stage Approach for the Classification and Segmentation of Polyps Hereinafter, the idea of a two-stage approach is discussed in Sect. 8.3.1. Section 8.3.2 contains a description of databases. Binary classification based on global features is proposed in Sect. 8.3.3. Section 8.3.4 provides segmentation based on CNN.
8.3.1 The Idea of a Two-Stage Approach Algorithms for automatic detection and segmentation of polyps use various characteristics, such as geometric primitives, color spaces, and texture descriptors. Polyps have pronounced geometric features, among which there is a distinct form or characteristic “protrusions” on the surface of the gastric mucosa. The standard approach, which provides high values of sensitivity and specificity and, at the same time, does not require a substantially large base for training, is the use of traditional machine learning methods, such as SVM and RDF based on global image features. For their implementation, it is required to form a feature space that takes into account and most fully describes the specifics of objects of interest. Using only color features as attributes for classification is not effective, since the color of these formations differs little from the surrounding healthy surface. Textural features, in turn, are very diverse. Endoscopic mages suffer from various distortions and blurring caused by the camera movement, so we have to take into account other features, for example, the geometric shape. A large number of features should be used for effective polyp detection. Features should contain the color, texture, and geometry of recognition objects. Such type of feature (combination of different simple features in one) is called a global feature, for example feature JCD (JCD is a combination of color and edge directivity descriptor) [13] or Tamura features [14]. Global features describe the whole image. The global features can be used for detecting polyps in an image and not for their segmentation. Segmentation of polyps requires the extraction of local features from image fragments bounded by a small spatial region. As local features for segmentation, there may be features of color and brightness averaged over the area of a fragment of the image. Rosenfield-Troy measure [15] and Histogram of Oriented Gradient (HOG) [16] are common local features describing texture features. The characteristic features of polyps described above as objects of recognition on endoscopic images, as well as, the high requirements for accuracy, sensitivity, and
8 Two-Stage Method for Polyps Segmentation …
97
specificity of classification in CDSS make the application of local features ineffective for segmentation. Another approach to the recognition of objects in the image is the use of CNNs. CNN is used in a large number of applications to perform a variety of tasks. At the same time, their use is an effective approach to the object recognition in images. The neural network allows to capture various features and can be used including the segmentation of polyps. We propose an algorithm for segmentation of polyps, taking into account the characteristics and specificity of the problem of segmentation of polyps in endoscopic images. The main idea of the proposed algorithm is to combine the advantages of both the above approaches under conditions of a substantially limited training base. The implementation of the algorithm includes two consecutive steps (Fig. 8.1): • Binary classification step. It provides a preliminary analysis of global image features using traditional machine learning technologies. The result of the preliminary classification is the decision about the presence of a polyp in the image. • Segmentation step. It is based on the use of CNN with the purpose of segmentation of one or several polyps if their presence in the image was confirmed at the previous stage.
Fig. 8.1 Block diagram of a two-stage segmentation algorithm
98
N. A. Obukhova et al.
Table 8.1 Databases used to train CNN Database
Resolution 574 × 500
Cvc-Clinic DB [18]
Database size
Device
612 images from 29 video fragments
Olympus Q160AL and Q165L, Exera II videoprocessor
CVC-ColonDB [19]
From 720 × 576 to 384 × 288
300 images from 15 video fragments
–
ETIS-Larib [20]
1225 × 966
196 images from 34 video fragments
Pentax 90i series, EPKi 7000 videoprocessor
8.3.2 Databases The choice of databases used for training is determined by the requirements for each stage of the solving problem. The database used to train the binary classifier at the first stage should cover a wide variety of different polyp types in the “positive sample” and, at the same time, include normal tissues and pathologies not related to the polyps in the “negative sample”. The expanded KVASIR database [17] contains 8000 endoscopic images, corresponding to 8 different classes, 1000 images per class, including the “polyps”. These images were obtained by different video sensors in different conditions of observation. The classification of images was carried out by experienced endoscopists. KVASIR database was chosen to form a training sample at the preliminary classification stage. The database for training of the convolutional neural network should contain information about the polyp’s location in the image, usually given as a “ground truth” mask that does not allow the use of KVASIR base at the segmentation stage. The number of databases with a marked area of polyps is still far too few that makes it difficult for high-quality training of the neural network, which is strongly dependent on the size of the training sample. In the public domain, currently there are three databases (see Table 8.1) suitable for CNN training, which were used in the study. All databases are represented in RGB color space. Thus, the size of database for training a binary classifier is 1000 images with polyps and about the same number of images selected from a negative sample submitted by other classes from KVASIR dataset. The size of the base for training a convolutional neural network is about 1000 images.
8.3.3 Binary Classification Based on Global Features The first stage is based on the automatic analysis of the endoscopic image, the purpose of which is to conclude whether polyps are present on the frame. The process of developing an algorithm based on a traditional classification model typically includes the following steps:
8 Two-Stage Method for Polyps Segmentation …
99
• Data acquisition and preprocessing. • Features set definition, extraction, and analysis. • Choosing the best suitable machine learning algorithms for current task and finding the best solution. • The results of postprocessing. The first part of the development is the selection of suitable distinctive features for object recognition. In this study, we used the following global features that were well proven in image classification problems: JCD [13], Tamura [14], ColorLayout [21], EdgeHistogram [21], AutoColorCorrelogram [22], and Pyramid HOG (PHOG) [23]. Not all features can be significantly useful for the classification of polyps. The high dimension of the feature space (in our case, 1185 features per 1000 positive examples in the training set) can reduce the quality of classification. For this reason and also taking into account the possible limitation in speed imposed on the final algorithm, it is required to select the most successful features and, at the same time, preserve the high quality of classification. t-SNE [24] method proposed by van der Maaten and Hinton is well suited for this purpose. The method allows non-linear and non-parametric reduction of the dimension of the feature space for the visualization. A set of points in a high dimension space is matched in 2D or 3D space with the structure preserved in the data. The series of transformations used in the method allows to convert the proximity of each pair of points in the original high-dimensional space into the probability that one data point is connected to another point as its neighbor. t-SNE visualization of the database used to train the classifier at the first stage of the proposed algorithm is presented in Fig. 8.2. Each diagram corresponds to 2D representation of a set of KVASIR database objects based on one of six feature sets used in the study. Different colors on the scatter diagram represent 8 classes in the markedup KVASIR database. Thus, the red points correspond to objects from the “polyps” class. This diagram allows for conclusion about the usefulness of various features for classification. Three diagrams in Fig. 8.2a, c, e corresponding to JCD, ColorLayout, and AutoColorCorrelogram features, respectively, show noticeable separability of classes compared to the other three sets of features. These features suitable for training were used further for training a classifier. The next important step is to select the most suitable machine learning algorithms for the current task and find the best solution. In our study, we used the following machine learning methods: • Linear discriminant analysis [25] as a preliminary level of performance evaluation. • The method of support vectors with kernels of radial functions as the most suitable algorithm in the case of a relatively small data set. • Random Decision Forest. • AdaBoost [26]. The last two methods are very good also in case of impossibility of linear separability of classes, which was the reason for their inclusion in the investigation.
100
N. A. Obukhova et al.
Fig. 8.2 t-SNE visualization for a complete set of features plotted over all the objects in the training sample of KVASIR database (the red points in diagram correspond to objects of the “polyps” class): a JCD, b Tamura, c ColorLayout, d EdgeHistogram, e AutoColorCorrelogram, f PHOG
8.3.4 Segmentation Based on CNN The result of the first stage of the proposed algorithm is a binary decision on the presence of polyps in the image. If the result is positive, at the second stage segmentation of one or several polyps is performed. Segmentation is based on the use of deep neural network with UNet-like architecture, one of the most popular for segmentation and analysis of medical images. The choice of this architecture is due to the following reasons:
8 Two-Stage Method for Polyps Segmentation …
101
• The network can provide end-to-end training in a relatively small number of images and is superior to the most popular and common method (a sliding-window convolutional network). • Network I/O resolution (512 × 512 − 388 × 388) correlates well with the resolution of endoscopic images. • The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. Insufficient size of the training base is a common problem when training neural networks due to the requirement of having a huge amount of input data. To train our model, we formed a training sample based on several free access polyp databases, which were accompanied by ground-truth information. The data set used in the study is a combination of three databases and contains a total of 1000 images representing a series of frames from 78 video fragments. The peculiarity of the generated database that requires attention is the strong crosscorrelation between images. For a significant increase of the training set in the learning process, an artificial expansion of the data was used—a sinusoidal transformation of the image. The applicability of this approach is due to the fact that the deformation of the image, which is formed using this transformation, is “natural” for the soft tissues of the body (in contrast to the bones). New artificially generated data arrived at the input of the neural network during the learning process, resulting in partially solved the problem of the small training set size.
8.4 Experimental Studies The implementation of binary classification step involves the selection of features and the choice of a classification model. In accordance with Sect. 8.3.3, three of the six sets of features were selected (JCD, EdgeHistogram, AutoColorCorrelogram), providing better separability of classes. All available combinations of three sets of attributes were used to train the following classification models of machine learning: linear discriminant analysis, SVM, RDF, and AdaBoost. Evaluation of the classification efficiency was carried out using a cross-validation with 10 folds. KVASIR dataset was previously split into the training and test samples. The training sample was included 900 images with polyps (positive examples) and 915 images without polyps (negative examples), corresponding to other classes of KVASIR dataset. The test sample size is 100 positive and 108 negative examples. The best classification results were obtained using RDF classifier (see Table 8.2). RDF classifier, which accepts JCD, EdgeHistogram, and AutoColorCorrelogram descriptor responses combined into a single vector was selected as the main solution at the stage of binary classification. We used RDF as classification algorithm for the first stage, because it has additional possibility in comparing with other algorithms. RDF constructs a collection of individual decision tree classifiers, which utilized Classification And Regression
102
N. A. Obukhova et al.
Table 8.2 Results of binary classification using RDF Feature sets
Accuracy
Sensitivity
Specificity
JCD, EdgeHistogram, AutoColorCorrelogram
0.880
0.92
0.84
JCD, AutoColorCorrelogram
0.875
0.91
0.84
JCD, ColorLayout
0.875
0.92
0.83
Trees (CART) algorithms. The result of the classification decision is obtained by voting from the individual classifiers in the ensemble. Thus, in this case we have possibility to determine not only binary decision (1—polyp, 0—norm), but membership degree to each class according to number of trees voting for each class. We suppose to use this possibility in further improving of CDSS possibilities. For example, to each image not only segmented polyps can be corresponded (after second stage), but also the degree of its correspondence to the given diagnosis. The main task of investigation for the polyp segmentation step was to estimate the effectiveness of CNN in a small amount of data for training, but with the preliminary results of binary classification. The convolutional neural network was trained on a dataset that is a combination of three databases Cvc-Clinic DB, CVC-ColonDB, and ETIS-Larib (see Table 8.1). Assessment of the quality of segmentation in the training process was performed on a test dataset. The same dataset was also used in the case of binary classification. For each image from the test dataset, a ground-truth mask was preliminarily formed. The mask contains information about the localization of polyps in the image to calculate the assessment of segmentation quality. We used Dice score, which is determined by the ratio between doubled ground truth and net result intersection and their sum: DC S =
2|A ∩ B| , |A| + |B|
where |A| and |B| are the cardinalities of the ground truth and CNN response sets, respectively. The learning curves are given in Fig. 8.3. The upper curve is the loss function, which shows that after 60,000 iterations the distribution has reached saturation, and the generation of new data using a sinusoidal transformation cases to have a positive effect. This suggests that the network extracted all available information from the training dataset. The average value of Dice score is 0.51 in the stagnation part of Dice-curve (lower curve in Fig. 8.4). Training curves show that even in the case of a small data set (about 1000 correlated images from 78 videos), the use of CNN with UNet architecture is promising and workable. After training, CNN was tested on KVASIR dataset. The neural network has not trained on KVASIR database images, which were obtained by using various devices in different conditions, but the quality of segmentation is comparable with the results obtained for the training set. Examples of polyp segmentation on KVASIR
8 Two-Stage Method for Polyps Segmentation …
103
Fig. 8.3 Learning curves: loss function (on top) and Dice (on bottom)
testing set are presented in Fig. 8.4. The two upper examples illustrate good quality segmentation. The two lower examples reflect the results of segmentation “below average”. However, despite the low value of Dies score, the polyp area is defined correctly. We also trained the neural network using only one Clinic_DB database. The sample size was about half the current one. The saturation stage was reached already at 15,000 iterations and corresponded to a lower value of Dice score, which makes it possible to conclude that the quality of training is positive as the base grows. The final and the most important stage of analysis of the proposed algorithm effectiveness is to assess the sequential use of two stages: binary classification followed by segmentation. For this purpose, the results of calculating Dice score for test sample images in the case of using only segmentation step were compared with the results of preliminary classification using RDF and subsequent CNN segmentation. The results of binary classification using the testing set are mentioned below: • • • • • • •
True positive—91. True negative—89. False positives—19. False negatives—9. Accuracy—0.865. Sensitivity—0.91. Specificity—0.82.
The results of the study show that due to the preliminary stage of the binary classification, the average value of Dice score increased more than 10%, which indicates the effectiveness of the proposed approach.
104
N. A. Obukhova et al.
Fig. 8.4 Results of polyp segmentation on KVASIR dataset with corresponding Dice scores: a test data, ground truth, and prediction with Dice equaled to 0.85, b test data, ground truth, and prediction with Dice equaled to 0.65, c test data, ground truth, and prediction with Dice equaled to 0.41, d test data, ground truth, and prediction with Dice equaled to 0.49
8.5 Conclusions In this chapter, we proposed a novel method of polyp segmentation based on a twostage approach, including the stage of binary classification using traditional machine
8 Two-Stage Method for Polyps Segmentation …
105
learning models and the stage of segmentation using convolutional neural networks. For the stage of binary classification, we proposed to use RDF classifier and a set of global features JCD, EdgeHistogram, AutoColorCorrelogram, combined into a unified feature space. The following statistical efficiency results were obtained for the test samples at the binary classification stage: accuracy equals 0.865, sensitivity equals 0.91, and specificity equals 0.82. For the stage of segmentation based on CNN, we used KVASIR database as a test dataset. It is important to emphasize that the training CNN was performed by using other free access databases (Cvc-Clinic DB, CVC-ColonDB, and ETIS-Larib) obtained with other devices and in another resolution. To use KVASIR database as a test dataset, we supplemented the dataset with ground-truth information about the localization of polyps in the image. The test dataset included images independent of each other and did not contain a cross-correlation, which allows to judge the reliability of the results obtained. The estimation of segmentation stage based on CNN shows the net obtained the suitable features from data. Using the binary classification as a preliminary segmentation stage to reduce the proportion of false positives has allowed to increase Dice score averaged over the test dataset more than 10%. Thus, we can conclude that the developed approach is workable and can be a strong base for further developing an automated endoscopic system.
References 1. Hong, J.H., Rho, S.-Y., Hong, Y.S.: Trends in the aggressiveness of end-of-life care for advanced stomach cancer patients. Cancer Res Treat 45(4), 270–275 (2013) 2. Pogorelov, K., Riegler, M., Eskeland, S.L., de Lange, T., Johansen, D., Griwodz, C., Schmidt, P.T., Halvorsen, P.: Efficient disease detection in gastrointestinal videos–global features versus neural networks. J. Multimedia Tools Appl 76(21), 22493–22525 (2017) 3. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 4. Ho, T.K.: Random decision forests. In: 3rd International Conference Document Analysis and Recognition, pp. 278–282 (1995) 5. Li, B., Meng, M.Q., Xu, L.: A comparative study of shape features for polyp detection in wireless capsule endoscopy images. In: 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3731–3734 (2009) 6. Karargyris, A., Bourbakis, N.: Identification of polyps in wireless capsule endoscopy videos using log Gabor filters. In: 2009 IEEE/NIH Life Science Systems and Applications Workshop, pp. 143–147 (2009) 7. Smith, S.M., Brady, J.M.: SUSAN—a new approach to low level image processing. Int. J. Comput. Vis. 23(1), 45–78 (1997) 8. Karargyris, A., Bourbakis, N.: Detection of small bowel polyps and ulcers in wireless capsule endoscopy videos. IEEE Trans. Biomed. Eng. 58, 2777–2786 (2011) 9. Nawarathna, R.D., Oh, J., Yuan, X., Lee, J., Tang, S.J.: Abnormal image detection using texton method in wireless capsule endoscopy videos. In: Zhang, D., Sonka, M. (eds.) Medical Biometrics, LNCS, vol. 6165, pp. 153–162. Springer, Berlin, Heidelberg (2010) 10. Park, S., Lee, M., Kwak, N.: Polyp detection in colonoscopy videos using deeply-learned hierarchical features. Seoul Nat. Univ., pp. 1–4 (2015)
106
N. A. Obukhova et al.
11. Tajbakhsh, N., Gurudu, S.R., Liang, J.: Automatic polyp detection in colonoscopy videos using an ensemble of convolutional neural networks. In; IEEE 12th International Symposium Biomedical Imaging, pp. 79–83 (2015) 12. Akbari, M., Mohrekesh, M., Esfahani, E.N., Soroushmehr, S.M.R., Karimi, N., Samavi, S., Najarian, K.: Polyp segmentation in colonoscopy images using fully convolutional network. In: 40th Annual International Conference on IEEE Engineering in Medicine and Biology Society, pp. 1–10 (2018) 13. Chatzichristofis, S.A., Boutalis, Y.S., Lux, M.: Selection of the proper compact composite descriptor for improving content-based image retrieval. In: 6th IASTED International Conference Signal Processing, Pattern Recognition and Applications, pp. 134–140 (2009) 14. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Trans. Syst., Man, and Cybern 8(6), 460–473 (1978) 15. Rosenfeld, A., Troy, E.: Visual Texture Analysis. Technical report, pp. 70–116 (1970) 16. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005) 17. Pogorelov, K., Randel, K.R., Griwodz, C., Eskeland, S.L., de Lange, T., Johansen, D., Spampinato, C., Dang-Nguyen, D., Lux, M., Schmidt, P.T., Riegler, M., Halvorsen, P.: Kvasir: A multi-class image dataset for computer aided gastrointestinal disease detection. In: 8th ACM Conference Multimedia Systems, pp. 164–169 (2017) 18. Bernal, J., Sánchez, F. J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., Vilariño, F.: WMDOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015) 19. Bernal, J., Sanchez, J., Vilariño, F.: Towards automatic polyp detection with a polyp appearance model. Pattern Recogn. 45(9), 3166–3182 (2012) 20. Silva, J.S., Histace, A., Romain, O., Dray, X., Granado, B., Towards embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9(2), 283–293 (2014) 21. Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: Multimedia Content Description Interface. Wiley, N.Y. (2002) 22. Huang, J., Kumar, S.R., Mitra, M., Zhu, W.-J., Zabih, R.: Image indexing using color correlograms. IEEE Conf. Comput. Vis. Pattern Recognit. 762–768 (1997) 23. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: 6th ACM International Conference Image and Video Retrieval, pp. 401–408 (2007) 24. van der Maaten, L.J.P., Hinton, G.E.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008) 25. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann Eugenics 7(2), 179–188 (1936) 26. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Chapter 9
Algorithms for Markers Detection on Facies Images of Human Biological Fluids in Medical Diagnostics Victor Krasheninnikov, Larisa Trubnikova, Anna Yashina, Marina Albutova and Olga Malenova Abstract The precise diagnostics of different diseases is very important for their treatment. It is particularly important to differentiate the disease on the early stages when the pathological alterations have not yet caused great harm to the whole organism, since this allows using a greater number of therapies and increase the recovery probability. One of the methods of early diagnostics is based on the examination of human biological liquids (blood, tears, cervical mucus, urine, etc.). A small drop of a liquid is drawn on an object-plate and dried out slowly. Thus, a thin dry film (facies) remains. There appear characteristic patterns (markers) on the facies in the process of fluid crystallization. Each marker is a highly definite sign of some pathology even at an early stage of a disease development. It is necessary to analyze a large number of images when mass population health examination is carried out. Due to this reason, the problem of algorithm and software development for automated processing of images is rather urgent nowadays. The algorithms for detecting several markers on images of facies are presented in this chapter. First, the characteristic features (location, geometry, brightness, variation, spectrum, etc.) are revealed by means of their visual analysis of markers. Then, the methods of algorithmic detection of these features are developed. The decision about the presence of the marker is made in case a set of its necessary characteristics presents. The tests of algorithms have showed that correctly identified images with different markers are 86–98%. V. Krasheninnikov (B) · O. Malenova Ulyanovsk State Technical University, Ulyanovsk, Russia e-mail: [email protected] O. Malenova e-mail: [email protected] L. Trubnikova · M. Albutova Ulyanovsk State University, Ulyanovsk, Russia e-mail: [email protected] M. Albutova e-mail: [email protected] A. Yashina Research-and-Production Association “Mars”, Ulyanovsk, Russia e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_9
107
108
V. Krasheninnikov et al.
Keywords Medical diagnostic · Biological fluid · Facies · Marker · Detection · Recognition · Algorithm
9.1 Introduction In recent decades, research on methods of medical images processing has been actively conducted. These images carry great information about the state of human health, which is used in the diagnosis and monitoring of the treatment of many diseases. Images of a different nature are used: optical, X-ray, tomographic, spectral, etc. The development of algorithms for the automated analysis of images of the facies of Biological Fluids (BF) is of interest for a number of reasons. On the one hand, the processes that occur during the drying of various BF (saliva, blood serum, cerebrospinal fluid, urine, etc.) are intensively studied. This can provide additional information in the diagnosis of diseases, carry out early diagnosis in the absence of visible symptoms, and begin treatment at an early stage. On the other hand, this method belongs to the group of non-invasive methods of investigation, which is especially important in the diagnosis of diseases in newborns and premature infants. In addition, the acceleration achieved and reduction in the cost of image analysis make it possible to conduct mass preventive surveys of the population, helping to improve the quality of health care. The method of studying BF by means of their dehydration and analysis of the crystallization of contained substances has a long history. The theoretical description of the process of evaporation of a drop was made by Maxwell [1]. Bohlen investigated the facies of capillary blood and noticed a connection between markers and gastrointestinal tumors [2]. The processes occurring during the dehydration of BF have been investigated in a large number of works, for example [3, 4]. In the series of works by Shabalin and Shatokhina (for example, [5, 6]) the features of the crystalline structures of BF and their connections with pathologies were analyzed. Similar studies are conducted at the Department of Obstetrics and Gynecology of Ulyanovsk State University under the guidance of Trubnikova. The facies images used in this chapter are provided by this Department. The comprehensive review and bibliography on the study of BF are given in the book by Kraevoy and Coltovoy [7]. Currently, the dehydration method used to diagnose diseases in oncology, gerontology, pediatrics, obstetrics, gynecology, and other fields of medicine. However, the most of these works are aimed at identifying markers of pathologies and improving visual perception: modernization of microscopes, addition of chemical reagents in BF, etc. There are considerably fewer papers on computer processing of facies images. And even among these works, methods of finding image zones suspicious of the existence of markers are usually developed. In [8], algorithms were developed for detecting 7 markers in blood serum facies based on an analysis of the texture characteristics of images (contrast, inertia, energy, etc.), which also makes it possible to assess the severity of the pathology. Algorithms presented in [9–12] were elaborated
9 Algorithms for Markers Detection on Facies Images of Human …
109
to detect several peculiar markers based on the structural analysis of images with a high probability of correct detection while the probability of false alarms is low. Note that research on the use of facies for medical diagnostics and the algorithmic detection of markers is carried out mainly in Russia. The markers on facies of BF are very diverse in shape, size, orientation, etc., which is the main difficulty in their algorithmic recognition. High variability in the size and shape of the markers justifies the use for their detection and recognition of the method, consisting in the allocation of a system of attributes, the combination of which corresponds to a certain marker. Note that only a medical personal can make the final diagnosis to the patient. Computer analysis of the facies is only auxiliary. Its destination is to identify images, in which specific set of markers is present. Therefore, it is not necessary to find all markers of this type in the image. It is enough to find at least one of them and inform about it. Then the operator will perform a more thorough analysis of this image. In fact, it is required to select images which contain at least one of considered markers. This is an indicator of the effectiveness of algorithms for markers detecting and recognizing. This chapter is organized as follows. The examples of images of biological liquids facies are depicted in Sect. 9.2. Section 9.3 describes the preprocessing algorithms of images. The proposed algorithms for markers detection and recognition are represented in Sect. 9.4. Statistical tests of algorithms are given in Sect. 9.5. Section 9.6 concludes the chapter.
9.2 The Examples of Images of Biological Liquids Facies Images of facies are color. However, the color is very dependent on the laboratory conditions of facies formation. Therefore, only the brightness (grey scale) is taken into account. Let us consider some examples of images of BF facies. In the normal state of the patient, the image of the whole drop has a radial-ring structure (physiological morphotype, shown in Fig. 9.1). In the presence of pathologies, the structure is disturbed (pathological morphotype, shown in Fig. 9.2) and pathological markers appear.
Fig. 9.1 Physiological morphotype
110
V. Krasheninnikov et al.
Fig. 9.2 Pathological morphotype
Some examples of markers are shown in Fig. 9.3. The images show that markers of even one species are very diverse in form. High variability in the size and shape of the markers is the main difficulty of their recognition.
9.3 The Image Preprocessing Each marker has a number of features, among which are the following: the location on the image relative to the cracks (long dark lines) that make up the skeleton, the local brightness, uniformity, and so on. Therefore, preprocessing is first performed to find the separation of these common features. Skeleton and homogeneous areas. The general structure of facies is formed by large cracks. The construction of the skeleton is carried out on the bases of the detection of relatively large (exceeding a threshold) values of the variance of brightness in a sliding window. Due to the heterogeneity of the image, the threshold should be variable depending on the local texture. To determine it, the adaptive pseudo-gradient procedure is applied [13]: λn+1 = λn + μ{q, i f G n ≥ λn ; − p, i f G n < λn }, where λn+1 is the threshold estimate following λn , Gn is the value of the local variance in the sliding window, q is the established order of quantiles (the probability that the variance does not exceed the threshold), p = 1 − q, μ is a constant that affects the step size of the procedure. The threshold here must be such that only relatively large values of the variation exceed it, so large orders of quantiles were assigned (q = 0.85 that is, only in 15% of cases the threshold was exceeded). To detect radial cracks in the image, the following method is used. Dark areas with a large variation are marked in the image. If the boundary of the region is a long smooth line, then this region is considered as a radial crack. To find homogeneous regions, small variations were observed in the image, so a small order q = 0.1 of quantiles was assigned.
9 Algorithms for Markers Detection on Facies Images of Human …
111
Fig. 9.3 The examples of markers: a comb structures, b crescent structures, c typical spherulites, d atypical spherulites, e tourniquets, f toxic plaques, g wrinkles, h leaf structure, i funnel structures, j cracks of silver, k torsion cracks, l three-ray cracks, m Y-shaped cracks, n tongue-like fields, o granularity, p block-like cracks, q plate structures, r fern-like structures
112
Fig. 9.3 (continued)
V. Krasheninnikov et al.
9 Algorithms for Markers Detection on Facies Images of Human …
Fig. 9.3 (continued)
113
114
Fig. 9.3 (continued)
V. Krasheninnikov et al.
9 Algorithms for Markers Detection on Facies Images of Human …
115
Fig. 9.3 (continued)
Segmentation of images by brightness. A distinctive feature of some markers is the brightness, which is much less or greater than an average brightness of the nearest environment. The selection of such regions is carried out by analyzing of the brightness histogram. Construction of the facies boundary and center. Each marker has a typical location and orientation on the facies. Therefore, it is required to find the boundary of the facies. First, we find a set of points, at which there is a significant difference in brightness. After correcting this set, we obtain the set M, approximately forming the boundary of the facies. The facies is formed from a drop of liquid, which leads to its oval shape. Therefore, the boundary of the facies is approximated by an ellipse El given by Eq. 9.1, where the coefficients minimize the sum of squared residuals ε(i) = x 2 (i) + Bx(i)y(i) + C y 2 (i) + Dx(i) + E y(i) + F by the points of M. x 2 + Bx y + C y 2 + Dx + E y + F = 0
(9.1)
The center of this ellipse is taken as the center of the facies. The approximating ellipse El describes the boundary of the facies approximately, so we make an additional refinement of the boundary as follows. We arrange points of M in the sequence and obtain their deviations z(i) from the ellipse. We will consider z(i) as noisy observations z(i) = d(i)+n(i) of the “true” deviations d(i) of the facies boundary from the ellipse. To estimate the deviations d(i) from their observations z(i), we
116
V. Krasheninnikov et al.
apply the adaptive approximated Kalman pseudo-gradient filter [13] ˆ = a(i)d(i ˆ − 1) + b(i)(z(i) − a(i)d(i ˆ − 1)) = a(i)d(i ˆ − 1) + b(i)(i), d(i) whose coefficients a(i) and b(i) are found using a pseudo-gradient procedure in Eq. 9.2, where h is a parameter of the procedure (approximately 0.001–0.01). a(1) = 1 b(1) = 1 (0) = 0 ˆ − 1) + b(i)(i − 1)]} a(i + 1) = a(i) + hsgn{(i)[2a(i)d(i b(i + 1) = b(i) + hsgn{(i)a(i)(i − 1)}
(9.2)
Definition of the morphological type. The general state of the patient can be determined by the morphotype of the facies (Figs. 9.1 and 9.2). In [8], textural features are used to determine the morphotype, among which entropy, homogeneity, and fractal dimensionality are the most informative. We use structure analysis for this purpose [10]. The sign of a physiological morphotype is the symmetrical arrangement of radial cracks, when the axial lines intersect near the center of the facies. To test this feature, the radial cracks are approximated by straight lines. The points of their intersection are found, and the position of the sliding window with the maximum number of intersection points is determined. If this position of the window is near the center of the facies, then the decision about the physiological morphotype is made, otherwise—about the pathological. In the tests of this algorithm, the morphotype of 91% facies was correctly identified. To determine the morphotype, we also applied an autoregressive model in Eq. 9.3 of a circular image, where k is the turn number, l is a node number (l = 0, …, T − 1) in the turn, xk,l is the image brightness at node (k, l), xk,l = xk+1, l−T when l ≥ T , T is the period, i.e. the number of points in one turn, ξk,l are independent standard random variables [14]. xk, l = a xk, l−1 + b xk−1, l − a b xk−1, l−1 + c ξk,l
(9.3)
The grids of nodes on a circle are shown in Fig. 9.4a. The parameters a and b of model in Eq. 9.3 set the degree of correlation in the radial and circular direction.
Fig. 9.4 Circle images: a grids on a circle, b and c simulated images
9 Algorithms for Markers Detection on Facies Images of Human …
117
When a < b, the image will have a higher correlation in the radial directions. In Fig. 9.4b, the simulated image is shown at a = 0.95 and b = 0.99. When a > b, the image will have a higher correlation in the circular direction. Figure 9.4c shows the simulated image with a = 0.99 and b = 0.95. The images in Fig. 9.1 show the facies of a healthy person (physiological type of facies). It has pronounced radial cracks. Therefore, in Eq. 9.3 corresponding to this image, the radial correlation coefficient must be large. The image in Fig. 9.2 shows the facies of a sick person. Here, the radial structure is particularly badly damaged. Therefore, it can be expected that the radial correlation coefficient will be small. Indeed, for the first images, obtained estimates of the radial correlation coefficient a was about 0.7, and for the second images was about 0.3. Thus, estimates of the parameters of the facies model provide an opportunity to make a conclusion on the general condition of the patient.
9.4 Algorithms for Markers Detection and Recognition Different methods of image processing (analysis of histograms, mean values and variances, anisotropy, spectrum, etc.) are applied in works on algorithmic investigation of facies. The above examples of images show that markers of even one species are very diverse in form. This is particularly evident in Figs. 9.1, 9.2 and 9.3. High variability in the size and shape of the markers justifies the following approach to the development of recognition algorithms. First, a visual analysis of the markers is carried out to reveal their characteristic features. Then the methods of algorithmic detection of these features are developed. The decision on the presence of the marker is made if a combination of its necessary characteristic was found in the image section. Let us consider this method in detail with the example of two markers recognizing. Comb structures. Such structures are sine of angiospasm and violation of microcirculation (Fig. 9.3a). The visual analysis shows that the comb structures are quite diverse in form. But they possess some common features such as: • Feature 1. Comb structures are always found near the facies border. • Feature 2. They have a considerably less brightness than the surrounding background. • Feature 3. They represent several obtuse triangles. • Feature 4. Long sides of the triangles are approximately parallel to each other. • Feature 5. Triangles are in a short distance from each other. • Feature 6. Background between the triangles is rather homogeneous. Let us consider the algorithm comb detection in the image in Fig. 9.5a. Feature 1. Let us use the first feature. For this purpose, the facies boundary and adjoining environment are determined. The future search of comb structures is carried out in adjoining environment.
118
V. Krasheninnikov et al.
Fig. 9.5 Comb structures detection: a initial image, b areas with small brightness, c approximating lines, d found comb structures
Feature 2. To determine areas, which brightness is much lower than the one of the surrounding background, a distribution histogram of brightness values is constructed. Then the threshold equaled to 20% distribution quantiles is determined. Obtained areas are shown in white in Fig. 9.5c, d. They form several related areas. Feature 3. Among the obtained areas, it is necessary to find obtuse triangles. First, the boundary lines of these areas are distinguished by tracing the contour [15]. Then, each of these lines obtained is analyzed. From each point on the line, let us draw vectors a and b with aimed at the fourth point on this line in opposite directions. The cosine of the angle between these vectors is cos α = ab/(|a| |b|). Obviously, cos α ≈ 1 for very sharp angles and cos α ≈ −1 for very obtuse angles. These points are marked. From each point, approximately rectilinear boundary sections are searched. For such sections, cos α ≈ −1. Using the least-square method, approximating lines are drawn through these sections. The lines obtained in the course of approximation are shown in black in Fig. 9.5c. Out of the multitude of the lines obtained, three pairwise intersecting lines are chosen. At the intersection of these lines, a triangle
9 Algorithms for Markers Detection on Facies Images of Human …
119
is formed. Let us find the angles of the triangle (cosine law) and consider only the obtuse triangles. Feature 4. The straight lines forming the triangle sides opposite the obtuse angle should be approximately parallel. Let us find the tangent of the angle between these lines (y1 = m · x + k and y2 = p · x + q): tan β = (m − p)/(1 + m · p). The tan β must be close to 0. Feature 5. The triangles of comb structure are located in close groups. Therefore, the distance between them must be small. The perpendicular length drawn from the vertex of an acute angle on the long side is considered to be this interval. Feature 6. Finally, the homogeneity of the background between the triangles is checked. The indices of homogeneity are small variations of brightness along the straight lines, drawn between the pairs of triangles. As a result, there remain only those triangles that satisfy all the above six criteria. Figure 9.5d shows final processing of the initial image (Fig. 9.5a) taking into consideration the above mentioned peculiarities. Comb structures have successfully been discovered and are marked in white. There was only one omission of comb structure in 161 images while testing the algorithm described. False detections were 10% of all images. Crescent structures. This marker in the images of blood serum and cervical mucus facies (Fig. 9.3b) indicates the ischemic disease. This marker has a form of long hooped bands, resembling a crescent. Rather often such bands overlap with crosswise short folds. Visual analysis of crescent structures allows us to distinguish their common peculiarities: • Peculiarity 1. There is a jump in brightness at the crescent structure boundary, but it is not as strong as in the case of radial cracks. • Peculiarity 2. They are long smooth lines. • Peculiarity 3. They have high anisotropy: their brightness changes more rapidly crosswise than longwise. Let us consider the comb structures on blood serum facies in Fig. 9.6a as an example to demonstrate the detection algorithm. Peculiarity 1. To distinguish a jump in brightness at crescent structure boundary, let us apply wavelet transformations [16]: one low-frequency LL and two high-frequency components LH and HL of the original image Ii, j : L L i, j =
I2i−1,2 j + I2i,2 j , 2
L Hi, j = I2i,2 j−1 − I2i,2 j ,
H L i, j = I2i−1,2 j − I2i,2 j .
The low-frequency component (LL) is actually twice as little as the original image. It is shown in Fig. 9.6a. High-frequency components distinguish vertical (LH) and horizontal (HL) jumps in brightness in the original image. Figure 9.6b, c are the binarized images of HL and LH transformations. Peculiarity 2. The task is to detect long smooth lines in the given images. Therefore, it is necessary to remove the points, which correspond to small short lines. The
120
V. Krasheninnikov et al.
Fig. 9.6 Crescent structures detection: a the initial image, b wavelet HL-transformation, c wavelet LH-transformation, d false point deletion and drawing lines Hj on HL, e false point deletion and drawing lines Gj on LH, f detected crescent structures
following method can be applied to solve the problem. Let us reduce the obtained LL component to the binary form with threshold 0. Using the sliding window, we calculate the number of points, in which the binary image value is equal to 1 (white color). If the number of points exceeds the threshold level, which is equal to 1/3 of the number of points in the window, then point values in the window are taken as 0 (black color). Then, tracing the outline [17], let us distinguish the boundaries of the white areas L i . In case, the boundary length is less than 40, then we delete the points
9 Algorithms for Markers Detection on Facies Images of Human …
121
at the boundary and inside it. Also, it is necessary to exclude radial cracks (very long dark lines in the facies image) from the consideration, i.e. we exclude boundaries, which length exceeds the preset threshold. In this case, we consider radial cracks to be pixels, which brightness is less than 10% brightness distribution quantile of the entire image. The points obtained are excluded from further consideration. The result of these operations is shown in Fig. 9.6d, e. Peculiarity 3. We consider the anisotropic index W = V /v, where V = var(L g )/|L g |,v = var(L per )/|L per |, L g is the rectilinear interval with the length |L g |, which is parallel to the image gradient, L per is the rectilinear interval with the length |L per |, which is perpendicular to L g , var(L) is the variation on interval L, which is calculated as the sum of the brightness differences moduli at points of the interval, which are situated 5 pixels from each other. The midpoints of these intervals are located at the cross point of lines G j and L i or H j and L i . Thus, the anisotropy index is the ratio of the maximum variance (in the direction of the gradient) to the minimum one (in the perpendicular direction). If anisotropy index W exceeds the preset threshold, then we make a conclusion that there is high anisotropy at the cross points, otherwise the obtained lines are excluded from further consideration. The points, which remain after the abovementioned rejection, correspond to all three attributes, so we make a decision that there exist crescent structures that pass through these remaining points. The detected crescent structures are marked with crosses in Fig. 9.6f. To determine the algorithm efficiency, 583 blood serum and cervical mucus facies images were processed. The probability of correct detection of images with crescent structures was 92.7% in blood serum facies and 100% in cervical mucus facies. False detection of images, which did not contain crescent structures, was 8.5% of blood serum facies and 9.8% of cervical mucus facies images. Since algorithms for detecting other markers have been developed in a similar way, only a brief overview is given below. Spherulite (Fig. 9.3c, d) is a complex round-shaped mineral aggregate consisting of thin needle-like crystals. At the facies, the spherulite normally looks like a rounded crystalline formation. In the presence of pathologies, it can significantly change its shape. Figure 9.3c shows the image of a spherulite in the facies of a healthy person. Spherulites for facies with pathology are depicted in Fig. 9.3d. The features are: (1) brightness below average, (2) low variation, and (3) a shape different from an ellipse. The tourniquets (Fig. 9.3e) are radial cracks with wavy lines at the sides. They are a sign of brain hypoxia. Hypoxia is a condition, in which the body or a region of the body is deprived of adequate oxygen supply at the tissue level. The presence of this marker in the central zone of the facies indicates chronic alcohol intoxication. The features are: (1) they are located on both sides along the cracks, (2) they consist of short slightly wavy lines perpendicular to the crack, and (3) they have a large anisotropy. Toxic plaques and wrinkles. Toxic plaques (Fig. 9.3f) are uniform rounded formations framed on one side by a set of short wavy lines. Wrinkles (Fig. 9.3g) are a local displacement of the facies relief with the formation of parallel folds on its surface. These markers are a sign of organism intoxication. Characteristic features
122
V. Krasheninnikov et al.
are: (1) large anisotropy, (2) the presence of a uniform patch (plaque), and (3) the presence of appendages in the form of short lines (in the case of plaques they are located at different angles, and in the case of wrinkles—at the same angle). Plaques and wrinkles differ from the combs in that they are not located along the cracks and the appendages are not located in two parallel bands. Wrinkles differ from plaques by the absence of a uniform patch and their appendages can be located at the same angles to each other. Leaf-like structures (Fig. 9.3h) are a sign of sclerotic processes occurring in the blood vessels. This is a disease, in which the inside of an artery narrows. Initially, as a rule, there are no symptoms. It can lead to coronary artery disease, stroke, peripheral artery disease, or kidney disease. Their features are: (1) they are relatively large structures reaching the periphery of facies, (2) they are located at a great distance from the facies center, (3) they have approximately symmetrical arrangement along radially directed cracks, (4) they are darker than the background, (5) there is a small jump in brightness near its border, and (6) the border is an archwise line. The funnel structures (Fig. 9.3i) are elongated homogeneous bright regions of approximately elliptical shape. They indicate a high tension of functional systems and protective mechanisms. This leads to a general decrease in immunity and the risk of developing serious diseases. The features are: (1) the boundary is close to an ellipse with a large eccentricity, (2) the inner region is uniform and lighter than the surrounding background, (3) there is a small jump in brightness at the boundary, and (4) they are located in the central zone between the cracks. Cracks of silver (Fig. 9.3j) represent a series of small parallel dark lines and indicate a violation of the elasticity of blood vessels. The features are: (1) much less bright compared to the surrounding background and (2) they represent the number of dark lines parallel to each other. Torsion cracks (Fig. 9.3k) are dark spiral lines. They are an indicator of the high tension of the adaptive mechanisms of homeostasis, for example the rate of sweating. This is one of the factors affecting homeostatic body temperature control. It depends on the heat load, which threatens to destabilize the temperature of the body, for which the brain has a sensor in the hypothalamus. The features are: (1) significantly less brightness in comparison with the surrounding background, (2) the shape is close to a circle, and (3) the region inside is relatively homogeneous. Three-ray cracks (Fig. 9.3l) are an indicator of stagnant phenomena in the body. This causes self-poisoning of cells due to insufficient oxygen supply and poor outflow of metabolic products, venous, and lymphatic stagnation. This phenomenon leads to a strong decrease in immunity. Their features are: (1) they are three short segments, starting at one point and forming approximately equal angles, and (2) significantly less brightness in comparison with the surrounding background. Y-shaped cracks (Fig. 9.3m) in the facies of cervical mucus are a sign of precancerous diseases of the cervix. The features are: (1) there are sharp differences in brightness at the crack boundary, (2) there is a dark closed region at the base of the crack, which is an obtuse triangle, and (3) cracks are not located in groups, there may be one to several cracks scattered throughout the facies.
9 Algorithms for Markers Detection on Facies Images of Human …
123
Tongue-like fields (Fig. 9.3n) are markers of various inflammations. A large marker indicates a more intense inflammatory process. Inflammation is biological response of body tissues to harmful stimuli, such as pathogens, damaged cells, or irritants. The function of inflammation is to eliminate the initial cause of cell injury, clear out necrotic cells and initiate tissue repair. This marker looks like an elongated tongue. The features are: (1) it contains a rounded bright homogeneous spot, (2) near one edge of the bright spot there are light motley formations, and (3) the brightness of the formations is approximately equal to the brightness of the spot itself. Granularity (Fig. 9.3o) is a sign of a bladder diseases. Any bladder pathology requires attention, as there is a serious violation of the quality of life. This marker is characterized by intensive pigmentation with a clear granularity near the concretions. Concretions are long light areas between dark long radial cracks. The features are: (1) they are located near the concretions, (2) they have intense pigmentation, hence, brightness is much less than the brightness of concretions, and (3) they have a clear granularity, hence there is a large variation. Block-like cracks (Fig. 9.3p) are recognized by a long crack and one end is enclosed in an oval. This marker indicates the presence of structural changes in tissues, hypoxic and ischemic brain lesions marker indicates the presence of structural changes in tissues and the hypoxic and ischemic brain lesions. These diseases can lead to a stroke of brain tissue. Their features are: (1) a closed circular or oval crack or an open arcuate crack and (2) appear at the ends of radial cracks. Plate structures (Fig. 9.3q) consequence of high doses of cholesterol in the blood. It is thrown into the blood with massive cell death. This can be caused by ischemia, burns, and various injuries. The features are: (1) small dispersion of brightness over the entire area, (2) borders are formed by straight lines intersecting at a right angle, and (3) contour pronounced brightness difference. Fern-like structures (Fig. 9.3r) are markers of estrogen saturation of the body of women in the middle of the menstrual cycle. The more pronounced is the marker, the stronger is the estrogenic background. Estimation of the level of estrogen background is made during hormone therapy. Their features are: (1) much lower brightness compared to the surrounding background but there are also darker formations and (2) large number of almost straight dark lines located almost parallel to each other on both sides of the center line perpendicular to it.
9.5 Statistical Tests of Algorithms The described algorithms were tested with 2,608 images. The results are shown in Table 9.1. Note that the markers can be in different combinations in the facies. Therefore, it is sufficient to note images containing at least one of the markers. These images (and patients) will necessarily be re-analyzed by medical personnel. At testing, 96% of such images were detected with 10% of false detections.
124
V. Krasheninnikov et al.
Table 9.1 The efficiency of marker detection Marker
Percentage of images with detected and correctly recognized markers
Percentage of images with errors in markers recognition
Comb structures
92
12
Granularity
93
8
Tourniquets
99
7
Wrinkles
92
7
Toxic plaques
93
7
Leaf-like structures
91
9
The funnel structures
97
13
Cracks of silver
92
3
Torsion Cracks
90
3
Three-ray cracks
92
6
Tongue-like fields
95
10
Y-shaped cracks
99
7
Crescent structures
97
8
Block-like cracks
86
11
Plate structures
95
4
Spherulite
86
11
100
0
Fern-like structures
9.6 Conclusions The algorithms for detecting and recognizing several markers in images of facies of human BF are proposed in the chapter. These markers are signs of various pathologies and are used for early medical diagnosis. Detection and recognition is performed by a set of characteristic features of markers. Statistical tests showed that about 96% of images containing markers were detected at 10% of false alarms. The processing speed is two images per minute, which is sufficient for using these algorithms in mass preventive surveys of the population. Acknowledgements The reported study was funded by the Russian Fund for Basic Researches according to the research projects № 20-01-00613.
References 1. Maxwell, J.C.: The Scientific Papers of James Clerk Maxwell. In: Niven, W.D. (ed.) University Press Cambridge (1890)
9 Algorithms for Markers Detection on Facies Images of Human …
125
2. Bolen, H.L.: The blood pattern as a clue to the diagnosis of malignant disease. J. Lab. Clin. Med. 27, 1522–1536 (1942) 3. Mollaret, R., Sefiane, K., Christy, J.R.E., Veyret, D.: Experimental and numerical investigation of the evaporation into a drop on a heated surhace. Chem. Eng. Res. Des. 82(A4), 471–480 (2004) 4. Parise, F., Allain, C.: Shape changes of colloidal suspension droplets during drying. Phis. II France 6, 1111–1119 (1996) 5. Shatokhina, S.N., Shabalin, V.N.: Morphology of human biological fluids. Chrysostom, Moscow (in Russian) (2001) 6. Shatokhina, S.N., Aleksandrin, V.V., Kubatiev, A.A., Shabalin, V.N., Shatokhina, L.S.: A marker of ischemia in solid state of blood serum. Bull. Exp. Biol. Med. 164(3), 366–370 (2018) 7. Kraevoy, S.A., Koltovoy, N.A.: Diagnosis using a single drop of blood. In: Biofluid crystallization Moscow-Smolensk (in Russian) (2016) 8. Shabalin, V.V.: Biophysical mechanisms of formation of solid-phase structures of human biological fluids. Doctoral dissertation, St. Petersburg State University, Russia (in Russian) (2018) 9. Krasheninnikov, V.R., Kopylova, A.S.: Identification of pectinate structures in images of blood serum facia. Pattern Recogn. Image Anal. 21(3), 508–510 (2011) 10. Krasheninnikov, V.R., Kopylova, A.S.: Algorithms for automated processing images blood serum facies. Pattern Recogn. Image Anal. 22(4), 583–592 (2012) 11. Krasheninnikov, V.R., Yashina, A.S., Malenova, O.E.: Markers detection on facies of human biological fluids. Procedia Eng. 201, 312–321 (2017) 12. Krasheninnikov, V.R., Yashina, A.S., Malenova, O.E.: Algorithms for detection of markers on the facies of human biological fluids. In: III International Conference on Information Technologies and Nanotechnologies, New technology, Samara, Russia, pp. 655–662 (2017) 13. Krasheninnikov, V.R., Vasil’ev, K.K.: Multidimensional image models and processing. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 135, pp. 11–64 Springer International Publishing, Switzerland (2018) 14. Dement’ev, V.E., Krasheninnikov, V.R., Vasil’ev, K.K.: Representation and processing of spatially heterogeneous images and image sequences. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-5, ISRL vol. 175, pp. 53–99 (2020) 15. Canny, J.A.: Computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986) 16. Daubechies, I.: Wavelets. CBMS-NSF. Series in Appl. Math., SIAM Publ., Philadelphia (1992) 17. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Prentice Hall (2008)
Chapter 10
An Investigation of Research Activities in Intelligent Data Processing Using Data Envelopment Analysis Andrey V. Lychev, Aleksei V. Rozhnov and Igor A. Lobanov
Abstract The report implements a vision of intelligent data processing task and elaboration of the efficiency evaluation using data envelopment analysis in discussions on the problem investigations of advanced technology precursors. Data processing aspect of a collaboration of advanced processes from the standpoint of pervasive informatics is presented. The target setting corresponds to an initiative focused on a comprehensive discussion of geosocial networking formation issues, assessment of the quality of intelligent data processing based on conceptual models of integration advanced technology of computer vision and location-based social networks, and innovative potential of distributed computer vision and collaborative innovation network. The representations of hybrid optimization modeling and control of intelligent transport systems are of interest in the modern conditions of rapid development of artificial neural networks, cognitive and other intelligent data processing technologies. In this regard, a interdisciplinary research is directly aimed at the implementation of effective common-based peer production of the geosocial networking in the transition to intelligent production technologies and new materials by creating the original tools of data envelopment analysis (free disposal hull) for the search, collection, storage, and processing of pertinent information resources, in particular together object-based image analysis. A convergence of professional, scientific, and educational network communities and prerequisites for its implementation at research activities are discussed. Primarily, a short description of data envelopment analysis is presented and followed by the overview of integration components, which use the distributed computer and telecommunication networks. Hereinafter, we investigate the opportunities of intelligent data processing in object-based image analysis for A. V. Lychev (B) · A. V. Rozhnov National University of Science and Technology “MISiS” (NUST MISiS), 4, Leninskiy Ave., Moscow 119049, Russian Federation e-mail: [email protected] A. V. Rozhnov e-mail: [email protected] A. V. Rozhnov · I. A. Lobanov V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences (ICS RAS), 65, Profsoyuznaya Street, Moscow 117997, Russian Federation e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_10
127
128
A. V. Lychev et al.
location-based social networks. Presented hybrid optimization modeling framework is used at experimental studies. Keywords Data envelopment analysis · Distributed computer vision · Geosocial networking · Hybrid optimization modeling · Intelligent data processing · Intelligent transportation systems · Location-based social networks · Object-based image analysis · System integration
10.1 Introduction In this chapter, we discuss the diversified approaches in the design of elements stratified models of pervasive informatics for modifiable vehicles from the standpoint intelligent data processing task elaboration and its efficiency evaluation by Data Envelopment Analysis (DEA). Pervasive informatics is the study of how information affects interactions with the built environments they invade. The term and concept were introduced in [1]. Indeed, artificial environments are full of information that users can use to improve their work and life. This information is created, managed, distributed, and consumed more efficiently when advanced technologies are implemented, resulting in more complex interactions between the users and environment. Social interactions in these spaces have added value. Information permeates or spreads through all these social and technical systems, and all-pervasive informatics aims to study and assist in the development of all-pervasive information environments or all-pervasive spaces. One of the most rapidly developing industries is intelligent transport. In this regard, a formation of technologies predecessors (or advanced technologies precursors) of such Intelligent Transport Systems (ITS) and infrastructure technology solutions is of main research interest. Modern advances in vehicle electronics and electrical engineering have led to a move towards fewer, more capable computer processors in the vehicle components. The current steady trend is toward fewer, more costly microprocessor modules with only hardware memory control and individual real-time operating systems. The new embedded systems and digital platforms integration components allow for more sophisticated and smarter software applications to be implemented, including the model-based process control, artificial intelligence, and ubiquitous computing. Obviously, the most predetermining efficiency of these applications for ITS is intelligent data processing [2, 3]. A pervasive space is characterized by the physical and informational interaction between the users and designed environment, e.g. the act of controlling the modifiable vehicles is a physical interaction, while the space responding to this regulation or user action is an informational interaction. The concept of the work is to present implements a vision of intelligent data processing task and elaboration of the efficiency evaluation using data envelopment analysis in research activities discussions on the problem investigations of advanced technology precursors.
10 An Investigation of Research Activities in Intelligent Data …
129
The core approach in this work is a nonparametric method of operations research for the estimation of production frontiers for intelligent data processing tasks. The target setting corresponds to an initiative focused on a comprehensive discussion of geosocial networking formation issues and assessment of the quality of intelligent data processing on the basis of conceptual models of integration advanced technology of computer vision and location-based social networks, and innovative potential of distributed computer vision and collaborative innovation network. Exactly here interesting are the representations of hybrid optimization modeling and control of intelligent transport systems in the modern conditions of the rapid development of artificial neural networks, cognitive and other intelligent data processing technologies. The convergence of professional, scientific, and educational network communities and prerequisites for its implementation at research activities are a variety of methods used. Thereby, the short description of data envelopment analysis are representing and expected to receive the overview of integration components, which use the distributed computer and telecommunication networks. Hereinafter, Sect. 10.2 provides the preliminary foresight inherently smarter infrastructure. In common, a short description of DEA is discussed in Sect. 10.3, while Sect. 10.4 provides an overview of integration components for distributed computer and telecommunication networks using DEA. Section 10.5 concludes the chapter.
10.2 The Foresight of Impending Smart Infrastructure from the Position of Pervasive Informatics This section introduces a conception of intelligent data processing by smart infrastructure and elaboration of the efficiency evaluation under convergence of professional, scientific and educational network communities. This allows the collaboration of intelligent processes and investigation of advanced technology precursors to be applied in smart infrastructure implementation [4, 5]. Intelligent pervasive spaces are those that display intelligent behavior in the form of adaptation to user requirements or the environment itself. Such intelligent behavior can be implemented using artificial intelligence methods or other smart-based technologies [6]. Intelligent spaces aim to provide communication and computing services to their users in such a way that the experience is almost transparent, e.g. automated control of movement based on distributed computer vision and occupant preference profiles. The pervasive spaces are manifest in an IBM Research Report [7] but was not properly defined or discussed in general. An intelligent pervasive space is “an adaptable and dynamic area that optimizes user services and management processes using information systems and networked ubiquitous technologies” [8]. The alternative definition is “social and physical space with enhanced capability through Information and Communications Technology (ICT) for a human to interact with the built
130
A. V. Lychev et al.
environments” [9]. According to experts, a common point between these definitions is that pervasive computing technologies are the means by which intelligence and interactions are achieved in pervasive spaces, with the purpose of enhancing a user’s experience. What should be understood by “pervasive computing”? More often pervasive informatics may be initially viewed as simply another branch of pervasive or ubiquitous computing. However, attention should be paid to the view that pervasive informatics places a greater emphasis on the ICT-enhanced socio-technical pervasive spaces, as opposed to the technology-driven direction of pervasive computing. Such a distinction between these directions is similar to that of informatics and computing, where informatics focuses on the study of information, while the primary concern of computing is the data processing. Accordingly, pervasive informatics aims to analyze the pervasive nature of information, examining its various representations and transformations in pervasive spaces, which are enabled by pervasive computing technologies, e.g. smart devices and intelligent control systems. The built environment is rich with information, which can be utilized by its consumers to enhance the quality of their work and life. By introducing intelligent data processing in ITS this information can be created, managed, distributed, and consumed more effectively, leading to more advanced interactions between the users and environment. Moreover, the social interactions in these spaces are of additional value, and informatics can effectively capture the complexities of such informationrich activities. And we fully agree that information literally pervades or spreads throughout, these socio-technical systems, and pervasive informatics aims to study and assist in the design of pervasive information environments or pervasive spaces for the benefit of their stakeholders and users [1–11]. The rapid development of computer vision technologies made the efficiency evaluation of distributed intelligent data processing task much harder. That further leads to new challenges in improving collecting metadata for various lifecycle stages such as systems integration, modification of integration components, foresight, planning (strategic analysis, priority setting), networking (participatory, dialogic), and optimization modeling. This, in turn, motivates the joint strategic development of computer vision, machine vision, remote sensing, intelligent data processing, machine learning, artificial neural networks, distributed computer and telecommunication networks, internet of things, and other technologies. The intention of a comprehensive investigation of integration components and advanced technology precursors is the main goal of hybrid optimization modeling. Our contribution deals with development of integration components intelligent data processing represented in some scenarios of system integration advanced technologies of International Charter “Space and Major Disasters” [12] and locationbased social networks [13]. Usually, a geosocial networking efficiency evaluation does not include object-based image analysis [14]. For example, our approach utilizes DEA and selection of the best regions for embedding smart city technologies for the effective use of regional information (innovation) landscape in terms of evolution geosocial networking. Invariance framework is provided by feature-based
10 An Investigation of Research Activities in Intelligent Data …
131
approach for system integration with original integration components of intelligent data processing [15]. Our objective corresponds to an initiative focused on a comprehensive discussion of geosocial networking formation issues and assessment of the quality of processes on the basis of conceptual models of integration advanced technology of computer vision and location-based social networks, and convergence of professional, scientific, and educational network communities (innovation potential of content control at collaborative innovation network). In this regard, cross-disciplinary research and development using hybrid optimization modeling are directly aimed at the implementation of effective management of the geosocial networking (professional, scientific, and educational). It is carried out by creating original tools for the search, collection, storage, and processing of pertinent information resources in modern conditions of rapid development of artificial neural networks, cognitive and other intelligent data processing technologies, in particular, DEA (free disposal hull model, in particular) and object-based image analysis. Hereinafter, a brief review of the system integration of research activities in geosocial networking and innovations in distributed computer and telecommunication networks using DEA is given. Obviously, the proposed stratified approach is aimed at further developing and filling more levels and elements. Also, we propose some possible cases of object-based image analysis application in DEA model. The integration components of the virtual semantic environment [16] complement the specific examples of a software implementation of this hybrid optimization modeling framework. So, as a result of consideration of impending smart infrastructure from the position pervasive informatics is especially interesting to note the difference in methods used to understand the benefits potentially available from collaboration of intelligent processes and investigations of advanced technology precursors give us. Meanwhile some of the advantages of DEA including in the subject area are the following: • • • • •
No need to explicitly specify a mathematical form for the production function. To be useful in uncovering relationships that remain hidden for other reception. Capable of handling multiple inputs and outputs. Capable of being used with any input-output measurement. Sources of inefficiency can be analyzed and quantified for every evaluated unit. The DEA background concerning this matter is presented in following section.
10.3 Data Envelopment Analysis Background The intensive development of information technologies in the tasks of control, computing, and communication, with the improvement of distributed computing and telecommunication networks, makes the evaluation of the efficiency of such technologies an urgent one. One of the most effective approaches in this area is DEA
132
A. V. Lychev et al.
approach [17]. The main purpose of this section is to review the best DEA practices in new patentable applications of distributed computing and telecommunication network technologies. The existing functional capabilities of distributed network technologies (control, computing, and communication) in combination with DEA are developed in the framework of the prototype [16, 18]. The approach for investigating the complex systems behavior is based on DEA approach, which covers a wide range of concepts and capabilities of computation and efficiency analysis of complex objects. It has close multi-aspect links with theoretical economics, systems analysis, and multicriteria optimization. Production technology in DEA is constructed on the set of production units using postulates that are specified a priori. When the efficiency score of production units or other analyzed indicators is computed, units are projected onto the frontier, which consists of (weakly) Pareto efficient units. Therefore, in order to calculate the various characteristics of the units’ behavior, it is necessary to build and visualize a multidimensional frontier constructed on the basis of economic or other desired parameters. Such visualization for analyzing the production units behavior can be accomplished by constructing the sections using two-, three- and, conceivably, multidimensional affine subspaces. Such sections are a generalization of well-known functions in economics (production function, isoquant, isocost, etc.). Visualization methods allow one to navigate for specific units and groups of units in the multidimensional space of indicators and find the optimal tactical and strategic directions of complex systems development. These and many other arguments of DEA application are considered in describing the next nested level of the proposed approaches and methods within the framework of a brief patent study of particular applications. Since DEA was first introduced in 1978, there have been a large number of papers written on DEA or applying DEA on various sets of problems. Emrouznejad and Yang [19] gave a survey and analysis of the 40 years of scholarly literature in DEA up to the year 2016. According to this study, more than 10,000 DEA-related articles were published in the literature. There are about 2,200 articles published as working paper, book chapter, or conference proceedings, which did not include in the study. A number of international conferences devoted to this subject are held regularly, e.g. DEA2019, EWEPA2019, DEAIC2018, NAPW2018, DEA40, etc. Next, we briefly describe the main DEA models and its underlying technologies. Suppose there are a set of n Decision Making Units (DMUs) to be assessed. Each observed DMUj , j = 1, …, n is represented by the pair (X j , Y j ), where X j = (x 1j , …, x mj ) ≥ 0, j = 1, …, n is the vector of inputs, and Y j = (y1j , …, yrj ) ≥ 0, j = 1, …, n is the vector of outputs. All data are assumed to be nonnegative, but at least one component of every input and output vector is positive. The production technology T is defined as T = {(X, Y ) | outputs Y can be produced from inputs X}, i.e., it contains the set of all feasible input-output vectors. The generalized formulation of convex and non-convex DEA technologies under different returns to scale assumptions can be written in the following form [20]:
10 An Investigation of Research Activities in Intelligent Data …
T , =
⎧ ⎨ ⎩
(X, Y )|X ≥
n
X j δλ j , Y ≤
n
j=1
j=1
133
⎫ ⎬
Y j δλ j , λ j ∈ , δ ∈ , ⎭
where λ = (λ1 , . . . , λn )T is called the intensity vector, δ ∈ + is the scaling factor, ∈ {N C, C}, NC and C represent the non-convexity and convexity, respectively: NC =
⎧ n ⎨ ⎩
j=1
λ j = 1, λ j ∈ {0, 1}
⎫ ⎬ ⎭
and C =
⎧ n ⎨ ⎩
j=1
⎫ ⎬
λ j = 1, λ j ≥ 0, λ j ∈ R+ , ⎭
set contains specific assumptions regarding the returns to scale [21–24] of technology T and defined as ∈ {V RS, C RS, N D RS, N I RS}, with V RS = {δ|δ = 1}, C RS = {δ|δ ≥ 0}, N I RS = {δ|0 ≤ δ ≤ 1}, N D RS = {δ|δ ≥ 1}. Taking convexity and assuming variable returns to scale, we derive technology T C,VRS of the traditional Banker, Chames, Cooper (BCC) model; T C,CRS represents the technology of classical Charnes, Cooper and Rhodes (CCR) model [17]; the nonconvex Free Disposal Hull (FDH) technology [25] is generated using T NC,VRS . The rest combinations of and produce other well-known convex and non-convex DEA reference technologies. In order to visualize the frontier in a multidimensional space of inputs and outputs we can construct two- and three-dimensional sections of the frontier. Parametric optimization algorithms for the construction of sections for convex technologies are described in [26]. For non-convex models, algorithms of the frontier visualization using enumeration and optimization methods are developed in [22, 23]. Visual representation in this form is more convenient for perception and analysis; it strengthens the performance analysis and the intuitive decision making. The main problem considered in this section involves a complex and interrelated assessment of the applicability of DEA from the perspective of multi-aspect research and analysis of the efficiency of complex objects. Although DEA originally developed for the analysis of the production units, there are many applications of DEA in the areas of control, distributed computing, and telecommunication networks. In the next section, we consider several examples of DEA applications including a brief overview of patented technologies in the area of distributed computer and telecommunication networks.
134
A. V. Lychev et al.
10.4 System Integration of Research Activities in Geosocial Networking Using Data Envelopment Analysis System integration of research activities in geosocial networking and innovations in distributed computer and telecommunication networks using DEA is an important part of our investigations. Next, we consider several examples of DEA applications including a brief overview of patented technologies. The evaluation of the relative operational efficiency of large-scale computer networks is proposed in paper [27]. The methodology is implemented in two stages. In the first stage, typical network simulation is conducted using a queuing model, and the main performance indicators are obtained. Then, in the second stage, they are used in DEA procedure to evaluate network operational efficiency. The suggestions are made for improving the efficiency level of relatively inefficient nodes. Finally, possible routes for achieving a higher level of overall network efficiency are discussed, in the context of reducing the bottlenecks. The attempt to assess the topology of a network considering simultaneously multiple criteria (cost, reliability, throughput, traffic pattern, etc.) was made in paper [28]. Authors considered two input variables: total node count and total link length. They are related to the network equipment cost and the network operating cost. The outputs are a sum of path lengths weighted by path traffic, and the amount of traffic on the maximally loaded link. In order to get outputs, these indicators are subtracted from the maximum values among all DMUs. DEA enables us to evaluate each network topology based on multiple criteria measured in different units. After evaluation, we can focus on a small number of desirable topologies among a huge number of possible candidates. Zhou and Ai [29] evaluated broadband real-time communication models for the high-speed train using DEA. The main bottleneck of this high mobility communication network is a handover. Authors proposed DEA-based procedure for evaluation of six typical handover system models (satellite communication, leaky coaxial cable, radio over fiber, relay station, single frequency network, and dual-soft handover model). The used cost, transmission power, and handover as inputs, and bandwidth, handover rate, and success probability as outputs. The evaluation results show that the radio over fiber model is the best appropriate system to support high mobility communications. In paper [30], authors combined the qualitative and quantitative analysis together in order to evaluate different wireless communication means. The proposed preferable DEA method for efficiency evaluation takes into account the certain orders of the weight of input and output indicators. The evaluation is based on economic, technical, and reliability parameters. Finally, the proposed method is applied to plan one city’s communication network. Soja et al. [31] carried out a comparison of multicast algorithms in terms of performance efficiency in order to encourage cost-effective group communication over the internet. DEA was applied to the results obtained from the Improved Network Coding (INC) algorithm with two and three parameters. Simulations for INC algorithm
10 An Investigation of Research Activities in Intelligent Data …
135
used two parameters: packet delay and cost of bandwidth. The cost of bandwidth is further improved using an additional parameter (packet loss). In [32], DEA framework was introduced for evaluating the efficiency of a coded packet level wireless network protocols, then its performance was compared with the existing IEEE802.11 protocol. The input-oriented and slacks models was implemented to show how routing loads with overheads are reduced in order to put the IEEE802.11 and packet level network coding based protocols in their efficiency frontier. What follows is an overview of patents using DEA in distributed computing and telecommunications networks. The invention [33] provided a driver circuit for driving a line terminated by a load, where a driver circuit was configurable for dynamically selecting a suitable energy/delay working point, given the circumstances wherein said driver circuit had to operate. When selecting optimal points, which have a minimum energy consumption for a certain delay, a convex boundary envelope was constructed using DEA approach. Relevant parameters in the present invention was access time, e.g. delay in accessing a functional electronic unit such as a memory, delay along bus or another communication line, or energy consumption, e.g., of a line driver. The invention [34] disclosed an evaluation method and device for network planning. The method divides a planning service area into n classes of sub-planning service areas, collects basic data of each sub-planning service area and quantifies the basic data of each area into the input and output indicators. Then DEA was applied, and the relative efficiency score of each sub-planning service area and the average relative efficiency rate of the n classes of sub-planning areas were obtained, so that the degree of deviation of the service area was determined. If the deviation is greater than the first threshold, it is possible to determine that the planning service area is invalid. The method used DEA to effectively analyze resources allocated by the network planning service area, and then according to the evaluation results of the planning service area, it can be concluded whether the resource allocation of the planning service area is reasonable. The invention [35] disclosed a cooperative game and DEA-based method for sharing the fixed cost of a power transmission system. According to the method, a coalitional game model in DEA framework was established. Then a method for sharing the fixed cost of the power transmission system was proposed based on a cooperative game and DEA from the perspective of multi-attribute decisions, DEA approach, and coalitional game theory. The fixed cost of the power transmission system was calculated under the condition that area constraints are ensured. Optimal and reasonable weight was calculated within a limited weight range so that each user obtained a satisfied sharing result maximizing the benefit of each user. An investment planning method and device based on power grid resources was disclosed in [36]. In this method, DEA was used for the comprehensive evaluation of a power grid communication resource construction projects at the initial screening stage. It is obvious that the sources presented in the brief review only outlines a number of cases from the variety of possible applications in the subject area under consideration. In the extended version of the progress report, detailed studies of the
136
A. V. Lychev et al.
virtual semantic environment also were presented in the following sequence of its system integration [16, 18, 37]: integrated collaboration environment, integrated development environment, social peer-to-peer processes, commons-based peer production, computer-supported cooperative work, collaborative information seeking, virtual research environment and so on [38–41].
10.5 Conclusions This chapter presents a vision of intelligent data processing task and elaboration of the efficiency evaluation using data envelopment analysis in discussions on the problem investigations of advanced technology precursors. Data processing aspect of a collaboration of intelligent processes from the standpoint of pervasive informatics is presented. The approach presented in this chapter adds to the perspective scenarios analysis at system integration of ITS. The framework presented to show how DEA methodology can be used to compare different integration components of ITS [42]. DEA analysis conducted in this research allowed the determination of the most efficient solutions for the smart city. Wherein, DEA analysis had limited application of intelligent data processing (for location-based social networks) instead of the traditional combination of geosocial networking as part of transport project appraisal. This research utilized DEA for this purpose, as the literature suggests that it provides numerous benefits over another most commonly used efficiency evaluation techniques. The results presented also show how DEA can be used as a powerful decision-making tool for similar collaborative innovation network options. Development of the advanced technologies on such a methodological basis and improvement of economic and mathematical models for analyzing the innovative potential of a smart city (region) are particularly relevant under formation a scientifically based regional innovation policy and creating regional development programs taking into account the effective use of regional information (collaborative innovation network) landscape in terms of evolution geosocial networking. Acknowledgements This work was partially supported by the Russian Science Foundation, project No. 17-11-01353 (implementation of DEA approach). Partially financial support from the RFBR according to the research projects No. 17-06-00237 (investigation of innovative potential), and No. 18-311-00267 (investigation of research activities and knowledge extraction for the optimization modeling system) is also gratefully acknowledged. This research was partially supported by the Presidium of the Russian Academy of Sciences, Program No. 30 “Theory and Technologies of Multi-level Decentralized Group Control under Confrontation and Cooperation” (investigation of smart infrastructure from ITS).
10 An Investigation of Research Activities in Intelligent Data …
137
References 1. Liu, K.: Pervasive informatics in intelligent spaces for living and working. In: 2008 IEEE International Conference on Service Operations and Logistics, and Informatics, vol. 2, pp. XVIII–XIX (2008) 2. Co-operative Intelligent Transport Systems (ITS)—Local Dynamic Map, Intelligent Transport Systems, ISO 18750:2018. Available at: https://www.iso.org/standard/69433.html. Accessed 26 Aug 2019 3. Dedicated Short Range Communication (DSRC)—DSRC Application Layer, Intelligent Transport Systems, ISO 15628:2013. Available at: https://www.iso.org/standard/59288.html. Accessed 26 Aug 2019 4. Ryvkin, S., Rozhnov, A., Lobanov, I.: Convergence of technologies of the evolving prototype of an energy efficient large-scale system. In: 2018 20th International Symposium on Electrical Apparatus and Technologies, pp. 1–4 (2018) 5. Pham, M.C., Klamma, R., Jarke, M.: Development of computer science disciplines: a social network analysis approach. Soc. Netw. Anal. Min. 1(4), 321–340 (2011) 6. Rozhnov, A.V., Melikhov, A.A.: Vectorizing textual data sources to decrease attribute space dimension. In: 2017 10th International Conference Management of Large-Scale System Development, pp. 1–4 (2017) 7. McFaddin, S., Coffman, D., Han, J.H., Jang, H.K., Kim, J.H., Lee, J.K., Moon, Y.S., Narayanaswami, C., Paik, Y.S., Park, J.W., Soroker, D.: Celadon: delivering business services to mobile users in public spaces. IBM Research Report, RC24381 (2007) 8. Moran, S., Nakata, K.: Ubiquitous monitoring and behavioural change: a semiotic perspective. In: 11th International Conference on Informatics and Semiotics in Organisations, Beijing, China, pp. 449–456 (2009) 9. Liu, K., Nakata, K., Harty, C.: Pervasive informatics: theory, practice and future directions. Intell. Build. Int. 2(1), 5–19 (2010) 10. Favorskaya, M., Buryachenko, V.: Fast salient object detection in non-stationary video sequences based on spatial saliency maps. In: De Pietro, G., Gallo, L., Howlett, R.J., Jain, L.C. (eds.) Intelligent Interactive Multimedia Systems and Services. SIST, vol. 55, pp. 121–132. Springer International Publishing, Switzerland (2016) 11. Rozhnov, A.V., Lobanov, I.A.: Investigation of the joint semantic environment for heterogeneous robotics. In: 2017 10th International Conference Management of Large-Scale System Development, pp. 1–5 (2017) 12. The International Charter Space and Major Disasters. Available at: https://disasterscharter.org/ web/guest/home. Accessed 26 Aug 2019 13. Location-Based Social Networks. Available at: https://www.microsoft.com/en-us/research/ project/location-based-social-networks/. Accessed 26 Aug 2019 14. Blaschke, T., Lang, S., Hay, G.J. (eds.): Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications. Springer Science & Business Media (2008) 15. Rozhnov, A., Zhuravleva, N., et al.: Technology and software complex environment analysis of complex systems. In: Seoul International Invention Fair (SIIF 2012), Seoul, Korea (2012) 16. Nechaev, V., Goncharenko, V., Rozhnov, A., Lytchev, A., Lobanov, I.: Integration of virtual semantic environments components and generalized data envelopment analysis (DEA) model. In: CEUR Workshop Proceedings: Selected Papers of the XI International Scientific-Practical Conference Modern Information Technologies and IT-Education, vol. 1761, pp. 339–347 (2016) 17. Cooper, W.W., Seiford, L.M., Tone, K.: Data Envelopment Analysis. A Comprehensive Text with Models, Applications, References and DEA-Solver Software, 2nd edn. Springer Science and Business Media, New York (2007) 18. Krivonozhko, V., Rozhnov, A., Lychev, A.: Construction a hybrid intelligent information framework and components of expert systems using the generalized DEA model. Neurocomputers (6), 3–12 (in Russian) (2013)
138
A. V. Lychev et al.
19. Emrouznejad, A., Yang, G.: A survey and analysis of the first 40 years of scholarly literature in DEA: 1978–2016. Socio-Econ. Plan. Sci. 61, 4–8 (2018) 20. Briec, W., Kerstens, K., Eeckaut, P.V.: Non-convex technologies and cost functions: definitions, duality and nonparametric tests of convexity. J. Econ. 81(2), 155–192 (2004) 21. Krivonozhko, V.E., Førsund, F.R., Lychev, A.V.: Measurement of returns to scale using a non-radial DEA model. Eur. J. Oper. Res. 232(3), 664–670 (2014) 22. Krivonozhko, V.E., Lychev, A.V.: Algorithms for construction of efficient frontier for nonconvex models on the basis of optimization methods. Dokl. Math. 96(2), 541–544 (2017) 23. Krivonozhko, V.E., Lychev, A.V.: Frontier visualization for nonconvex models with the use of purposeful enumeration methods. Dokl. Math. 96(3), 650–653 (2017) 24. Podinovski, V.V.: Returns to scale in convex production technologies. Eur. J. Oper. Res. 258(3), 970–982 (2017) 25. Deprins, D., Simar, L., Tulkens, H.: Measuring labor efficiency in post offices. In: Marchand, M., Pestieau, P., Tulken, H. (eds.) The Performance of Public Enterprises: Concepts and Measurements, pp. 243–268. Springer, Boston, MA (1984) 26. Volodin, A.V., Krivonozhko, V.E., Ryzhikh, D.A., Utkin, O.B.: Construction of threedimensional sections in DEA by using parametric optimization algorithms. Comput. Math. Math. Phys. 44(4), 589–603 (2004) 27. Giokas, D.I., Pentzaropoulos, G.C.: Evaluating the relative operational efficiency of largescale computer networks: an approach via data envelopment analysis. Appl. Math. Model. 19(6), 363–370 (1995) 28. Kamiyama, N.: Network topology design using data envelopment analysis. In: IEEE Global Telecommunications Conference, pp. 508–513 (2007) 29. Zhou, Y., Ai, B.: Evaluation of high-speed train communication handover models based on DEA. In: 2014 IEEE 79th Vehicular Technology Conference, pp. 1–5 (2014) 30. Wang, Z., Zhang, L., Liu, X., Fan, Y.F.: Evaluation of distribution communication network various access means basing on preferable DEA. In: 2009 Asia-Pacific Power and Energy Engineering Conference, pp. 1–4 (2009) 31. Soja, J.S., Luka, M.K., Thuku, I.T., Girei, S.H.: Comparison of performance efficiency of improved network coding multicast algorithms using data envelopment analysis. Commun. Appl. Electron. 5(2), 6–10 (2016) 32. Ajibesin, A.A., Ventura, N., Chan, H.A., Murgu, A.: Service productivity in IT: a network efficiency measure with application to communication systems. In: Emrouznejad, A., Cabanda, E. (eds.) Managing Service Productivity: Using Frontier Efficiency Methodologies and Multicriteria Decision Making for Improving Service Performance, pp. 241–261. Springer Berlin Heidelberg, Berlin, Heidelberg (2014) 33. Papanikolaou, A., Wang, H., Miranda, M., Catthoor, F.: Power-aware configurable driver circuits for lines terminated by a load. Patent USA US20050280443A1, Priority date: 2004-06-18 34. Evaluation method and device for network planning. CN106454857A, Priority date: 2015-0813 35. Cooperative game and DEA (Data Envelopment Analysis) based method for sharing fixed cost of power transmission system. CN105160490A, Priority date: 2015-09-30 36. Investment planning method and system based on power grid resources. CN106991516A, Priority date: 2017-01-13 37. Krivonozhko, V.E., Førsund, F.R., Lychev, A.V.: Measurement of returns to scale in radial DEA models. Comput. Math. Math. Phys. 57(1), 83–93 (2017) 38. Abrosimov, V., Ryvkin, S., Goncharenko, V., Rozhnov, A., Lobanov, I.: Identikit of modifiable vehicles at virtual semantic environment. In: International Conference on Optimization of Electrical and Electronic Equipment and International Aegean Conference on Electrical Machines and Power Electronics (ACEMP), pp. 905–910 (2017) 39. Rozhnov, A., Lychev, A.: System integration of research activities and innovations in distributed computer and telecommunication networks using data envelopment analysis. In: 21st International Conference on Distributed Computer and Communication Networks: Control, Computation, Communications, vol. 1, pp. 273–280 (2018)
10 An Investigation of Research Activities in Intelligent Data …
139
40. Ryvkin, S., Rozhnov, A., Lobanov, I., Chernyshov, L.: Investigation of the stratified model of virtual semantic environment for modifiable vehicles. In: 20th International Symposium on Electrical Apparatus and Technologies, pp. 1–4 (2018) 41. Ryvkin, S., Rozhnov, A., Lychev, A., Lobanov, I., Fateeva, Y.: Multiaspect modeling of infrastructure solutions at energy landscape as virtual semantic environment. In: International Conference on Optimization of Electrical and Electronic Equipment and International Aegean Conference on Electrical Machines and Power Electronics, pp. 935–940 (2017) 42. Caulfield, B., Bailey, D., Mullarkey, S.: Using data envelopment analysis as a public transport project appraisal tool. Transp. Policy 29, 74–85 (2013)
Chapter 11
Hybrid Optimization Modeling Framework for Research Activities in Intelligent Data Processing Aleksei V. Rozhnov, Andrey V. Lychev and Igor A. Lobanov
Abstract The chapter continues our investigation of the problem investigations of advanced technology precursors from the point of view of the formation of intelligent transportation systems. The problem statement corresponds to an initiative focused on a comprehensive discussion of geosocial networking formation issues and assessment of the quality of intelligent data processing based on conceptual models of integration advanced technology of computer vision and location-based social networks. In this regard, interdisciplinary research and development of modifiable vehicles include the need to solved particular tasks of the system integration, optimization modeling, and control. Based on investigation of geosocial networking using data envelopment analysis, our discussion is directly aimed at the implementation of effective commons-based peer production of the geosocial networking in the progressive movement of pervasive informatics. This provides by creating the original tools of data envelopment analysis for search, collection, storage, and processing of pertinent information resources in modern conditions of rapid development of artificial neural networks, cognitive and other intelligent data processing technologies, in particular together object-based image analysis. The chapter provides the opportunities of intelligent data processing in object-based image analysis for location-based social networks. Proposed hybrid optimization modeling framework and experimental studies scenarios are discussed. Keywords Data envelopment analysis · Distributed computer vision · Geosocial networking · Hybrid optimization modeling · Intelligent data processing · A. V. Rozhnov (B) · I. A. Lobanov V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences (ICS RAS), 65, Profsoyuznaya Street, Moscow 117997, Russian Federation e-mail: [email protected] I. A. Lobanov e-mail: [email protected] A. V. Rozhnov · A. V. Lychev National University of Science and Technology “MISiS” (NUST MISiS), 4, Leninskiy Ave., Moscow 119049, Russian Federation e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_11
141
142
A. V. Rozhnov et al.
Intelligent transportation systems · Location-based social networks · Object-based image analysis · System integration
11.1 Introduction In this chapter, from the point of view of the formation of Intelligent Transportation Systems (ITS) we continue discuss on the problem investigations of advanced technology precursors and research activities in intelligent data processing. Convergence network communities and prerequisites for its implementation are in a collaboration of intelligent processes within the framework system integration by intelligent data processing task and elaboration of the efficiency evaluation using Data Envelopment Analysis (DEA) [1–3]. The previous Chap. 10 presents a brief description of DEA and provides an illustrative overview of integration components for distributed computer and telecommunications networks using DEA [4–6]. It is obvious that the rapid development of computer vision technologies made the efficiency evaluation of distributed intelligent data processing task much harder. This, in turn, motivates an intertwined development of computer vision, machine vision, remote sensing, intelligent data processing, machine learning, artificial neural networks, distributed computer and telecommunication networks, internet of things, and other technologies. The intention of a comprehensive investigation of integration components and advanced technology precursors is the main goal of hybrid optimization modeling. Thus, multi-aspect modeling of infrastructure solutions at energy landscape and identikit of modifiable vehicles were previously proposed in [7–9]. The proposed stratified approach is obviously aimed at further developing and filling more levels and elements. Also, we offer some possible cases of object-based image analysis application with DEA model. The integration components of the virtual semantic environment are complemented to the specific examples of a software implementation of this hybrid optimization modeling framework. It should be noted that the scientific literature presents a full range of experience in solving, e.g. remote sensing problems. Thus, to concretize the arguments of data processing in practice, several processing levels were defined by NASA as part of its Earth Observing System and steadily adopted since then, both internally at NASA, and elsewhere [10–12]. The key levels of data processing are the following ones: • Level 0: reconstructed, unprocessed instrument and payload data at full resolution, with any and all communications artifacts (e.g. synchronization frames, communications headers, duplicate data) removed. • Level 1a: reconstructed, unprocessed instrument data at full resolution, timereferenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g. platform ephemeris) computed and appended but not applied to the Level 0 data (or if applied, in a manner that Level 0 is fully recoverable from Level 1a data).
11 Hybrid Optimization Modeling Framework for Research Activities …
143
• Level 1b: Level 1a data that have been processed to sensor units (e.g. radar backscatter cross section, brightness temperature, etc.); not all instruments have Level 1b data; Level 0 data is not recoverable from Level 1b data. • Level 2: derived geophysical variables (e.g. ocean wave height, soil moisture, ice concentration) at the same resolution and location as Level 1 source data. • Level 3: variables mapped on uniform spacetime grid scales, usually with some completeness and consistency (e.g. missing points interpolated, complete regions mosaicked together from multiple orbits, etc.). • Level 4: model output or results from analyses of lower level data (i.e. variables that were not measured by the instruments but instead are derived from these measurements). While these processing levels are suitable for satellite data processing pipelines, other data level vocabularies have been defined and may be appropriate for more heterogeneous workflows. Wherein, the regular spatial and temporal organization of Level 3 datasets makes it feasible to combine data from different sources. Our contribution deals with development of integration components intelligent data processing represented in some scenarios of system integration advanced technologies of International Charter “Space and Major Disasters” [13] and location-based social networks [14]. Usually, a geosocial networking efficiency evaluation does not include object-based image analysis [15]. For example, our approach utilizes DEA and selection of the best regions for embedding smart city technologies for the effective use of regional information (innovation) landscape in terms of evolution geosocial networking [16–18]. Invariance framework is provided by feature-based approach for system integration with original integration components of intelligent data processing. In this regard, cross-disciplinary research and development of modifiable vehicles include the need to solved particular tasks system integration, optimization modeling, and control. The remainder of the chapter is organized as follows. Section 11.2 provides opportunities for intelligent data processing in object-based image analysis for location-based social networks. Proposed hybrid optimization modeling framework and experimental studies are discussed in Sect. 11.3. Section 11.4 concludes the chapter.
11.2 Intelligent Data Processing and Object-Based Image Analysis The Object-Based Image Analysis (OBIA) scenarios suitable for more diverse workflows may be of the greatest interest. At the same time, OBIA is an independent area of research and is quite often used in spatial planning and many related geospatial applications, in ecology, etc. Meanwhile, many concepts and algorithms have been developed for various applications. Unlike traditional methods of image analysis, OBIA allows one to explore the “space-time” and “inter-scale” relations between
144
A. V. Rozhnov et al.
different objects. These methods provide an opportunity to analyze the relationship of discrete objects. In addition, they enable us to analyze properties such as comparable properties of landscapes and complex systems [19]. The development of distributed computer vision algorithms [20] promises to significantly advance the state of the art in computer vision systems by improving their efficiency and scalability (through the efficient integration of local information with global optimality guarantees), as well as, their robustness to outliers and node failures (because of the use of redundant information). In particular, we have selected the egocentric vision as one of the most visible integration components that provide a significant improvement in the quality of data analysis. To facilitate users to control robots, we investigate a wearable hand posture control system based on egocentric-vision by imitating the sign language interaction way among users. Considering the characteristics of the egocentric-vision according to the scenario, such as complicated backgrounds, large ego-motions and extreme transitions in lighting, a hand detector (based on Binary Edge HOG Block (BEHB) features [21]) is used to extract articulated postures from the egocentric-vision. Different from many other methods that use skin color cues, this hand detector adopts contour cues and part-based voting idea. Its algorithm can be used in dark environment because infrared cameras can be used to get contour images rather than skin color images. To improve intelligent data processing quality, a hybrid classifier is proposed for distributed computer vision. At last, to reduce DEA underestimation, OBIA is used and applied to the sequence of the egocentric vision data processing results. Egocentric vision or first-person vision is a subfield of computer vision that uses analyzing images and videos captured by a wearable camera at the visual field of the camera user. Visual information stream captures the part of the scene, on which the user focuses to carry out the task at hand, and offers a valuable perspective to understand the user’s activities and their context in a usual setting. The basic camera is often supplemented with another camera is aimed at the user’s eye and able to estimate a user’s eye gaze, which is important to reveal attention and to better understand the user’s activity and intentions. The implementation of scenarios for location-based social networks is used location awareness, location-based service, location-activity recommendations, and associated metadata (for constructing optimal routes from user check-in data). Thus, in location-activity recommendations using GPS-based location data and user comments in various places, the useful places and actions are detected. The study is specified in the following issues related to location in our daily life: “If we want to do something, such as sightseeing or searching for a certain service in the smart city (for example, where should we go?)”, or “If you have already visited some places, what else can we do there?” [22, 23]. A knowledge-based smart infrastructure is provided for building popular routes along undefined paths (i.e. user registration sequences, geo-tagged photos, or other digital traces). Given the sequence of locations and time span, it is possible to construct the best k routes that consistently pass through the locations during a specified period of time, by aggregating such indefinite trajectories by mutual reinforcement
11 Hybrid Optimization Modeling Framework for Research Activities …
145
(that is uncertain reinforcement for sure). It can helps with trip planning, traffic management, studying person movements, etc. [24, 25]. It is obvious that the particular tasks presented in the section only illustrate some of the possibilities from the vast multitude in the new subject domain. In the extended version of the progress report, detailed studies of the virtual semantic environment also were presented in the following sequence of its system integration [6]: virtual network computing, transreality gaming, ambient intelligence, ubiquitous computing, sentient computing, context awareness, context-aware pervasive systems, and so on. In the next sections, we describe the opportunities of intelligent data processing in object-based image analysis for location-based social networks and the optimization modeling tool. Let us highlight in the specified set of problematic issues of system integration and collaboration of intelligent processes, the following two characteristic and concatenate examples, which represent with a certain degree of confidence advanced technology precursors. (1) Project Tango. This project was an augmented reality computing platform developed and authored by the Advanced Technology and Projects (ATAP), division of Google. It used computer vision to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. This allowed application developers to create user experiences that include indoor navigation, 3D-mapping, physical space measurement, environmental recognition, augmented reality, and windows into a virtual world. It is known that in 2014, two Peanut phones were delivered to the International Space Station to be part of NASA project to develop autonomous robots that navigate in a variety of environments including outer space. These 18-sided polyhedral SPHERES robots (with Tango technology) were developed at NASA Ames Research Center [26]. (2) Identikit. Feature of experimental studies using modeling framework is improving the identikit approach. The experimental studies continue improving the original identikit approach to requirements justification for design characteristics and evaluation of the flight capabilities of the next-generation vehicles. It is demonstrated that it is possible to effectively sort out various vehicles configurations options, dimensional and aerodynamic characteristics, onboard equipment, controls, communications and other elements of a vehicle in order to obtain the necessary input data for the subsequent flight simulation, design optimization, evaluation of the possibility to resolve various tasks and maximum use of the flight and service performance. Using the identikit principle enables the determination of justified requirements to the function of modifiable vehicles, increases the efficiency of design processes, and essentially enhances the efficiency of simulators for training of pilots and operators of remotely-piloted aircraft systems and unmanned aerial vehicle [7]. Obviously, this is just a conditioned example of an application OBIA among many other projects. Thus, the combination of advantages in the tasks of intelligent data processing in object image analysis for the specified integration components can be
146
A. V. Rozhnov et al.
extended subsequently for the formation of other ITS’s components. In this regard, cross-disciplinary research and development of modifiable vehicles include the need to solved particular tasks system integration, optimization modeling, and control.
11.3 Hybrid Optimization Modeling Framework Based on the analysis of a wide range of areas of system integration both theoretical methods and already implemented integration components of hybrid optimization modeling framework will be formed in the interests of ITS development. Vehicular communications are in most cases built as an integral part of ITS, when integration components of modifiable vehicles are developed. Vehicular communication systems are distributed computer networks, in which vehicles and roadside units are the communicating nodes, providing each other with the necessary information, for examples such as safety threat warnings or other important traffic data. They can be irreplaceable in avoiding accidents and traffic congestion. Both types of nodes are Dedicated Short Range Communications (DSRC) devices [27]. Vehicle-to-Everything (V2X) communication is the passing of information from a vehicle to any entity that may affect the vehicle, and vice versa. It is a vehicular communication system that incorporates other more specific types of communication as V2I (vehicle-to-infrastructure), V2N (vehicle-to-network), V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian), V2D (vehicle-to-device), and V2G (vehicle-to-grid). The main motivations for V2X are the road safety, energy savings, and traffic efficiency. At first in this section, we discuss the key questions of system integration by hybrid optimization modeling framework, i.e. Usage-Based Insurance (UBI), application of the eCall (Sect. 11.3.1) and various scenarios (search, rescue, emergency service, combat search, and rescue) (Sect. 11.3.2).
11.3.1 Functionality of Hybrid Optimization Modeling Framework In order to apply DEA models to various ITS units, the hybrid optimization modeling system is developed. It implements a number of algorithms for efficiency analysis and multidimensional frontier visualization with the help of construction of two- and three-dimensional sections. The functional diagram of the modeling system is shown in Fig. 11.1. The external data source is connected to the system and contains raw information about the units under investigation. Depending on the research task (level of detail, model parameters, size of the entire task) the data is aggregated and converted into the internal data format of the modeling system; then it is recorded in the internal database.
11 Hybrid Optimization Modeling Framework for Research Activities …
147
Fig. 11.1 The functional diagram of the hybrid optimization modeling framework
The external data source focuses on the use of OBIA in the interests of International Charter on “Space and Major Disasters”, a single security space in future ERA-GLONASS and eCall services, their respective vehicle integration components (including ITS) and information infrastructure. The geographically distributed automated information system (emergency response system in case of accidents ERA-GLONASS) ensures the prompt receipt of information on road traffic accidents and other emergency situations on the roads of the Russian Federation, the processing, storage and transmission of this information to emergency operational services, as well as, access to the specified information of interested state bodies, local self-government bodies, officials, legal and physical persons (using the signals of the global navigation satellite system GLONASS together with another operating GNSS). The analogue of the ERA-GLONASS system is the developed all-European eCall system, with which the ERA-GLONASS system is harmonized in its main functional properties (a unified composition and format of mandatory data transmitted as part of the minimum a set of data on a traffic accident, uniform rules for establishing and completing a two-way voice connection with persons in a vehicle cabin, etc.). In this case, the minimum data set transmitted by the automotive emergency call system in a traffic accident includes information on the coordinates and motion parameters of the emergency vehicle, the time of the accident and other information necessary for emergency response. At the same time, the primary integration of UBI components is proposed. A framework is supplemented to deploy a smartphone-based measurement system for road vehicle traffic monitoring and UBI. Through the aid of system integration in this case to modularize the description, the functionality is described as spanning
148
A. V. Rozhnov et al.
from sensor-level functionality and technical specification to the topmost business model [28].
11.3.2 Experimental Studies Based on the approach described above, the prototype of optimization modeling system was developed for modeling and visualization the behavior of complex systems. It is designed for supporting strategic decision making or data analysis, evaluating various scenarios, forecasting the development of complex systems. It can be used both in daily analytical work and in the operational mode for the analysis of critical situations. In order to enhance effect from geosocial networking analysis, a projection system was applied, where 3D-sections of the frontier are generated using virtual reality. To create a visual stereo effect, two projectors with polarizing filters are used. On a special screen, two images for the left and right eyes are simultaneously formed. The screen has a special metallic surface that preserves the polarization of the images to be presented to each eye, see Fig. 11.2. Intelligent data and metadata processing in the tasks of remote sensing and geosocial networking has a growing relevance in the modern information society. It represents one of key technology as part of the aerospace industry and bears increasing economic relevance—new materials and sensors are developed constantly and the demand for skilled labor is increasing steadily. Furthermore, remote sensing exceedingly influences everyday life, ranging from weather forecasts to reports on climate
Fig. 11.2 Example of the three-dimensional section of six-dimensional DEA frontier visualization
11 Hybrid Optimization Modeling Framework for Research Activities …
149
change, or natural disasters. Studies have shown that only a small amount of users know enough about the opportunities of data analysis they are working with. There exists a huge knowledge gap between the application and the understanding of images analyzed. It should be noted that in this case in order to integrate remote sensing in a sustainable manner organizations like European geosciences union or digital Earth encourage the development of learning modules and learning portals. Scenarios and conditions for the implementation of intelligent data processing (search and rescue, emergency service, combat search and rescue) are of great importance in the further improvement of methods and tools based on a hybrid framework. Moreover, they are applicable to the creation and replenishment an interactive atlases of emergency risks for the various areas [29, 30]. The framework being developed is one of the most interesting and first-priority research and development direction. Presented invariance framework is developed by feature-based approach for system integration with original integration components of intelligent data processing. In our opinion, further development of integration components for intelligent data processing has the significant potential for system integration of advanced technologies of International Charter “Space and Major Disasters” and location-based social networks. Furthermore, the selection of the best regions for embedding smart city can be accomplished using DEA and taking into account the regional information (innovation) landscape in terms of evolution geosocial networking.
11.4 Conclusions The approach presented adds to the perspective scenarios analysis at system integration of ITS. The framework presented to show how DEA methodology can be used to compare different integration components of ITS. DEA analysis conducted in this research allowed the de-termination of the most efficient solutions for the smart city. Wherein, DEA analysis had limited application of intelligent data processing (for location-based social networks) instead of the traditional combination of geosocial networking as a part of transport project appraisal. This research utilized DEA for this purpose, as the literature suggests that it provides numerous benefits over another most commonly used appraisal techniques. The results presented also show DEA can be used as a powerful decision-making tool for similar collaborative innovation network options. The research is in line with the hybrid optimization modeling framework, focused in the chapter on a comprehensive discussion of the problematic issues of creating location-based social networks and object-based image analysis based on conceptual models of integration, DEA (free disposal hull model) and convergence of intelligent production technologies by creating original tools. These tools can be used for the search, collection, storage, and processing of pertinent information resources in modern conditions of rapid development of artificial neural networks, cognitive and other intelligent data processing technologies. It should be noted quite a definite novelty of the conducted research in issues, for example, system integration of virtual semantic environments components and generalized DEA model [6].
150
A. V. Rozhnov et al.
The practical importance of the work also lies in the timely search and expansion of possible related applications of DEA, e.g. advanced analytics software for performance analysis and visualization of financial institutions [31]. Among the priority proposals on the use of the results of system integration in research activities and in the innovations of distributed computer and telecommunication networks based on the use of DEA technology, we outlined a generalization in the field of advanced computer vision for cyber-physical systems in general [32]. Development advanced technologies on such a methodological basis and improvement of economic and mathematical models for analyzing the innovative potential of a smart city (region) are particularly relevant, when developing a scientifically based regional innovation policy and creating regional development pro-grams with due account for the effective use of regional information (collaborative innovation network) landscape in terms of evolution geosocial networking. Experimental results of the next stage (mobile location analytics) are planned to be presented at in the future ongoing cross-disciplinary research. Acknowledgements This work was partially supported by the Russian Science Foundation, project No. 17-11-01353 (DEA-based optimization modeling framework). Partially financial support from the RFBR according to the research projects No. 16-29-04326 (integration and investigation of pertinent information resources) is also gratefully acknowledged. This research was partially supported by the Presidium of the RAS, Program No. 30 “Theory and Technologies of Multi-level Decentralized Group Control under Confrontation and Cooperation” (experimental studies of intelligent data and metadata processing in geosocial networking for ITS).
References 1. Pham, M.C., Klamma, R., Jarke, M.: Development of computer science disciplines: a social network analysis approach. Soc. Netw. Anal. Min. 1(4), 321–340 (2011) 2. Rozhnov, A.V., Lobanov, I.A.: Investigation of the joint semantic environment for heterogeneous robotics. In: 2017 10th International Conference Management of Large-Scale System Development, pp. 1–5 (2017) 3. Rozhnov, A., Lychev, A.: System integration of research activities and innovations in distributed computer and telecommunication networks using data envelopment analysis. In: 21st International Conference on Distributed Computer and Communication Networks: Control, Computation, Communications, vol. 1, pp. 273–280 (2018) 4. Krivonozhko, V., Rozhnov, A., Lychev, A.: Construction a hybrid intelligent information framework and components of expert systems using the generalized DEA model. Neurocomputers (6), 3–12 (in Russian) (2013) 5. Krivonozhko, V.E., Førsund, F.R., Lychev, A.V.: Measurement of returns to scale using a non-radial DEA model. Eur. J. Oper. Res. 232(3), 664–670 (2014) 6. Nechaev, V., Goncharenko, V., Rozhnov, A., Lytchev, A., Lobanov, I.: Integration of virtual semantic environments components and generalized data envelopment analysis (DEA) model. In: XI International Scientific-Practical Conference Modern Information Technologies and IT-Education, vol. 1761, pp. 339–347 (2016)
11 Hybrid Optimization Modeling Framework for Research Activities …
151
7. Abrosimov, V., Ryvkin, S., Goncharenko, V., Rozhnov, A., Lobanov, I.: Identikit of modifiable vehicles at virtual semantic environment. In: International Conference on Optimization of Electrical and Electronic Equipment and International Aegean Conference on Electrical Machines and Power Electronics, pp. 905–910 (2017) 8. Ryvkin, S., Rozhnov, A., Lychev, A., Lobanov, I., Fateeva, Y.: Multiaspect modeling of infrastructure solutions at energy landscape as virtual semantic environment. In: International Conference on Optimization of Electrical and Electronic Equipment and International Aegean Conference on Electrical Machines and Power Electronics, pp. 935–940 (2017) 9. Ryvkin, S., Rozhnov, A., Lobanov, I., Chernyshov, L.: Investigation of the stratified model of virtual semantic environment for modifiable vehicles. In: 20th International Symposium on Electrical Apparatus and Technologies, pp. 1–4 (2018) 10. Report of the EOS Data Panel, Earth Observing System, Data and Information System, Data Panel Report, vol. IIa. NASA Technical Memorandum 87777 (1986) 11. Parkinson, C.L., Ward, A., King, M.D. (eds.): Earth Science Reference Handbook: A Guide to NASA’s Earth Science Program and Earth Observing Satellite Missions. NASA, Washington, D.C. (2006) 12. Product User Manual, GRAS Satellite Application Facility, Version 1.2.1. Available at: http:// www.grassaf.org/general-documents/products/grassaf_pum_v121.pdf (Mar 2009). Accessed 26 Aug 2019 13. The International Charter Space and Major Disasters. Available at: https://disasterscharter.org/ web/guest/home. Accessed 25 Aug 2019 14. Location-Based Social Networks. Available at: https://www.microsoft.com/en-us/research/ project/location-based-social-networks/. Accessed 26 Aug 2019 15. Blaschke, T., Lang, S., Hay, G.J. (eds.): Object-Based Image Analysis: Spatial Concepts for Knowledge-Driven Remote Sensing Applications. Springer, Heidelberg, Berlin, New York (2008) 16. Ryvkin, S., Rozhnov, A., Lobanov, I.: Convergence of technologies of the evolving prototype of an energy efficient large-scale system. In: 20th International Symposium on Electrical Apparatus and Technologies, pp. 1–4 (2018) 17. Rozhnov, A.V., Melikhov, A.A.: Vectorizing textual data sources to decrease attribute space dimension. In: 10th International Conference Management of Large-Scale System Development, pp. 1–4 (2017) 18. Favorskaya, M., Buryachenko, V.: Fast salient object detection in non-stationary video sequences based on spatial saliency maps. In: De Pietro, G., Gallo, L., Howlett, R.J., Jain, L.C. (eds.) Intelligent Interactive Multimedia Systems and Services. SIST, vol. 55, pp. 121–132. Springer International Publishing, Switzerland (2016) 19. Tiede, D., Lang, S., Fureder, P., Holbling, D., Hoffmann, C., Zeil, P.: Automated damage indication for rapid geospatial reporting. An operational object-based approach to damage density mapping following the 2010 Haiti earthquake. Photogramm. Eng. Remote Sens. 77(9), 933–942 (2011) 20. Tron, R., Vidal, R.: Distributed computer vision algorithms. IEEE Signal Process. Mag. 28(3), 32–45 (2011) 21. Ji, P., Song, A., Xiong, P., Yi, P., Xu, X., Li, H.: Egocentric-vision based hand posture control system for reconnaissance robots. J. Intell. Robot. Syst. 87(3–4), 583–599 (2017) 22. Zheng, V.W., Zheng, Y., Xie, X., Yang, Q.: Collaborative location and activity recommendations with GPS history data. In: International Conference on World Wild Web, pp. 1029–1038. ACM (2010) 23. Zheng, V.W., Cao, B., Zheng, Y., Xie, X., Yang, Q.: Collaborative filtering meets mobile recommendation: a user-centered approach. In: AAAI Conference on Artificial Intelligence, pp. 236–241. ACM (2010) 24. Wei, L.-Y., Zheng, Y., Peng, W.-C.: Constructing popular routes from uncertain trajectories. In: 18th SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 195–203 (2012) 25. Liu, H., We, L.-Y., Zheng, Y., Schneider, M., Peng, W.-C.: Route discovery from mining uncertain trajectories. In: IEEE International Conference on Data Mining Workshop, pp. 1239– 1242 (2011)
152
A. V. Rozhnov et al.
26. Williams, M.: Google’s project Tango headed to international space station. PCWorld. Available at: https://www.pcworld.com/article/2110660/googles-project-tango-headed-to-internationalspace-station.html. Accessed 26 Aug 2019 27. Dedicated Short Range Communication (DSRC)—DSRC Application Layer, Intelligent Transport Systems, ISO 15628:2013. Available at: https://www.iso.org/standard/59288.html. Accessed 26 Aug 2019 28. Händel, P., Ohlsson, J., Ohlsson, M., Skog, I., Nygren, E.: Smartphone-based measurement systems for road vehicle traffic monitoring and usage-based insurance. IEEE Syst. J. 8(4), 1238–1248 (2014) 29. Caulfield, B., Bailey, D., Mullarkey, S.: Using data envelopment analysis as a public transport project appraisal tool. Transp. Policy 29, 74–85 (2013) 30. EMERCOM of Russia to Create an Interactive Atlas of Emergency Risks for the North Caucasus. Available at: https://www.mchs.gov.ru/dop/info/smi/news/item/34002458/. Accessed 26 Aug 2019 (in Russian) 31. Lychev, A.V., Rozhnov, A.V.: Advanced analytics software for performance analysis and visualization of financial institutions. In: 11th IEEE International Conference on Application of Information and Communication Technologies, vol. 2, pp. 133–137 (2017) 32. Proceedings of the Permanent Scientific Seminar (Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, ISC RAS): Control Science of Autonomous Systems [item 1–35]. Moscow, Russia (in Russian) (2017–2018)
Chapter 12
Non-local Means Denoising Algorithm Based on Local Binary Patterns S. K. Kartsov, D. Yu. Kupriyanov, Yu. A. Polyakov and A. N. Zykov
Abstract Previously, the document flow was mainly through the using of documents in paper form. It created the series of problems during the archiving and searching of the necessary documents. The archive paper documents take a lot of noise points, therefore it is the problem to search of documents in the archive, because there are mistakes in documents, and searching requires a long time. While information technology, it became possible using scanners to convert documents from paper to electronic form. In the process of scanning and due to the fact, that the documents are not always in good shape, the output images are obtained with various defects in the form of noise. Various noise reduction algorithms are used to improve the image quality and remove the noise from scanned documents. This chapter discusses a possibility of using Local Binary Patterns (LBP) operator to make changes into the operation of Non-Local Means (NLM) noise reduction algorithm. As a result, it was possible to improve a quality of scanned images after their processing by the proposed modified algorithm. Keywords Digital image processing · Noise reduction · Algorithm image denoising · Non-local means · Local binary patterns operator
S. K. Kartsov (B) · A. N. Zykov Moscow Automobile and Road Construction Technical University (MADI), 64, Leningradskiy Ave., Moscow 125319, Russian Federation D. Yu. Kupriyanov National Research Nuclear University «MEPhI», 31, Kashirskoye Highway, Moscow 115409, Russian Federation e-mail: [email protected] Yu. A. Polyakov National University of Science and Technology «MISiS», 4, Leninskiy Ave., Moscow 119049, Russian Federation Moscow State University of Technology «STANKIN», 3a, Vadkovskiy lane, Moscow 127055, Russian Federation © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_12
153
154
S. K. Kartsov et al.
12.1 Introduction Converting old paper documents to digital format greatly simplifies their archiving and searching. Such conversion requires a scan process. In this case, the noise is the inevitable result of this process. Preliminary images processing of scanned documents using noise reduction algorithms is a key step to solve this problem. As a rule, the algorithms are aimed at reducing the specific background noise, and the purpose of these algorithms is to restore the original image. Important feature of good process of eliminating noise in an image is that it must completely remove noise, as well as, preserve the contrast, sharpness and image clarity. Currently, it is not yet possible in fully. The problem of noise suppression in an image is one of the most urgent and common problems in the field of image processing. Over the years, various algorithms of noise reduction on digital images have been proposed. Most of these algorithms try to divide the image into a smooth part (true image) and an oscillatory part (noise), removing the higher frequencies from the lower frequencies. However, not all images are smooth. Images may contain small details and structures with high frequencies. When the high frequencies are removed, the high frequency content of the original image will be removed along with the high frequency noise, because the algorithms cannot distinguish the noise from the true image. It leads to the loss of small details in the image with noise. There is also lowfrequency noise, and nothing is done to remove it from an image. Low-frequency noise will remain in the image even after noise reduction. Based on these problems after using noise reduction algorithms, an algorithm of non-local noise reduction known as NLM was developed and proposed [1–3]. This algorithm is a filter, that divides an input image into fragments, and then processes each fragment separately, using a block-based method. Each image fragment contains many blocks. The blocks are processed separately to provide an estimate of the true pixel values in the center of the block. The similarity of the blocks inside the fragment is measured by the basis of two dimensions: the Euclidean distance between the centers of the blocks and brightness distance between the blocks. Comparison of blocks is implemented in a fragment window, but not between adjacent pixels. In the course of this comparison, blocks with similar brightness levels have more weight when averaging a pixel value. That is why this algorithm is called the non-local method. The chapter has a following structure. Section 12.2 provides a short literature review. Description of NLM algorithm is given in Sect. 12.3, while the modified NLM algorithm is discussed in Sect. 12.4. Section 12.5 suggests the proposed NLM algorithm based on local binary patterns. Experimental studies are represented in Sect. 12.6. Section 12.7 concludes the chapter.
12 Non-local Means Denoising Algorithm Based on Local Binary …
155
12.2 Related Work NLM algorithm underlies many methods based on non-local image processing with the use of blocks. These methods, apparently, achieve better results in the process of noise reduction in digital images, as they create fewer artifacts, than other methods of eliminating noise in the image. To date, about 1500 works on non-local image processing have been published. In these papers, various modifications of NLM are described, due to the fact, that this algorithm has some drawbacks. One of the drawbacks is that NLM is computationally expensive due to the large number of weight calculations between similar blocks. Optimizing the method of assigning weights between blocks improves the performance of this method. In [4], the weighting adjustment process was improved by means of Markov clustering. A description of the statistical perspective of compression in assigning weights using compression estimate, according to [5], is given in [6]. Using the strategy of preliminary classification of neighborhoods for optimizing the weights of filter kernels was described in [7]. In [8], the work scheme of NLM was changed, and is was suggested to use a step at threshold processing in order to reduce a number of used blocks before averaging the block weights. In several papers, fast implementations of the algorithm by pre-selecting blocks [9, 10], using Gaussian KD-trees for classifying image blocks [11], singular value decomposition (SVD) [12], statistical arguments [13], and approximate search [14] were proposed. Another disadvantage is that NLM is a filter in the spatial domain, although convolution can be easily implemented in the frequency domain. The conversion this filter into the frequency domain allows to suppress more noise in the image. In [15], the blocks were transformed into the frequency domain, and a discrete cosine transformation with a threshold was used to estimate the initial blocks. Use of fast Fourier transformation for calculation of the correlation between blocks was described in [16]. In [17], NLM was combined with a filter [18] for improvement of synthetic-aperture radar image. The inclusion of median filtering in NLM for noise reduction in images with Small Noise Ratio (SNR) was described in [19]. Adaptation of NLM for noise reduction of X-ray images and application of additional multiscale contrast enhancement in frequency domains were presented in [20]. In addition, the adaptation of NLM can be used for improvement applications in other areas of image processing, for example, segmentation, recognition and noise reduction in video. An extension of the algorithm for reducing the ultrasound spectrum was presented in [21]. For this purpose, the weights of similar blocks in the subspace of a lower dimension were iteratively assigned using the principal component analysis. Adaptation of NLM using a frequency conversion for improvement of the cell images in pictures, taken with a microscope, was described in [22]. In [23], a modified version of the algorithm was used for detection small objects by suppressing the background. The background pixels here were estimated by the weighted average value, depending from the similarity between adjacent pixels. In [24], an algorithm was adapted for the noise reduction and improvement of video quality at extremely
156
S. K. Kartsov et al.
low light intensity by application an adaptive temporal filter using gamma-correction with adaptive thresholds before NLM application. A brief overview of papers shows, that NLM algorithm and its modifications are also widely used in various areas of digital image processing, as well as, in many commercial applications.
12.3 Description of Non-local Means Algorithm NLM algorithm, unlike other algorithms, assumes, that an image contains a large amount of self-similarity. An example of self-similarity is shown in Fig. 12.1. Here are shown three pixels: p, q1, q2 and the neighborhoods, corresponding to them. The neighborhoods of pixels p and q1 are similar. At the same time, the neighborhoods of pixels p and q2 are not similar. The principle of self-similarity suggests, that neighboring pixels have similar neighborhoods, but non-neighboring pixels will also have similar neighborhoods, if there is a similar structure in these neighborhoods [25]. For example, in Fig. 12.1, the most of pixels in the same column, as p, have similar neighborhoods with a neighborhood of p. From here, we can conclude, that the pixels p and q1 have similar neighborhoods, but the neighborhoods of pixels p and q2 are not similar. Therefore, the pixel q1 has a larger weight w(p, q1) and will have a stronger impact on the final value of p, than the pixel q2, which has a weight w(p, q2). Pixels with similar neighborhoods can be used to determine the pixel noise value. In NLM algorithm, each pixel is a weighted average of all pixels in a neighborhood of a given pixel on the image. Fig. 12.1 Example of self-similarity in the image
12 Non-local Means Denoising Algorithm Based on Local Binary …
157
The weighted value of each pixel p [1] is calculated by the formula: N L(V )( p) =
w( p, q)V (q),
(12.1)
q∈V
where V is an image with noise, and the weights w(p, q) satisfies the following conditions: 0 ≤ w( p, q) ≤ 1 and q w( p, q) = 1. These weighting coefficients w are based on the similarity between the neighborhoods of pixels p and q. For example, from Fig. 12.1, it can be seen, that pixels p and q1 have more similar neighborhoods, than neighborhoods around pixels p and q2. Therefore, the weight w(p, q1) will be much larger, than the weight w(p, q2), and the value of pixel q1 will contribute more to the final weighted value of pixel p. For calculation a similarity, it is necessary to define a neighborhood around a pixel. Let Ni is a square neighborhood with center in pixel i and radius r. For calculation the similarity between two neighborhoods, we take a weighted sum of the squares of the difference between two neighborhoods: d( p, q) = (V (N p ) − V (Nq ))2 ∗ F,
(12.2)
where F is the filter, applied to the squared of the neighborhood difference between two pixels: 1 1/(2 ∗ i + 1)2 , r i=m R
F=
(12.3)
where m is the distance from the center of the filter. Filter F (Eq. 12.3), whose graph is shown in Fig. 12.2, gives more weight to the pixels, located closer to the center of the neighborhood, and less weight to the pixels, located closer to the edge of the neighborhood. After that, the weights are calculated by the following formula: w( p, q) =
−d( p,q) 1 e h , Z ( p)
(12.4)
where Z(p) is the normalizing constant, h is the control parameter of the weight value: −d( p,q) e h . (12.5) Z ( p) = q
158
S. K. Kartsov et al.
Fig. 12.2 The graph of the filter
12.4 Modified Non-local Means Algorithm In this section, we offer a description of NLM algorithm using LBP operator. The standard NLM algorithm shows good noise reduction results in the scanned documents, where there are vast areas of the same tone and many areas of self-similarity. For calculation the weighted average value of each pixel p by this algorithm, it is necessary to select two areas with sizes R, r and to center them around the pixel p (Fig. 12.3). Fig. 12.3 Example of areas with sizes R and r
12 Non-local Means Denoising Algorithm Based on Local Binary …
159
Then, it is necessary to compare the area of size r around the pixel p with the areas around all the other pixels. In each case, the values of d(p, q) and w(p, q) are determined (Eqs. 12.2–12.4). When all values w(p, q) in the window R are found, the weighted value NL(V )(p) for the given pixel p is defined by Eq. 12.1. And so for all pixels of the image. In the original version of the NLM algorithm, the size of area R was equal to the size of the entire image area, for example, R = 256 × 256. The weighted value of the next pixel p was calculated by comparing the area r around the selected pixel p with areas of size r around all the other pixels q in area R. During these calculations, weighting coefficients w were found for each pixel q. Thus, it was determined, what contribution does the value of each pixel q to the final weighted value of pixel p. This approach allowed to show good results, but it took quite a long time for the algorithm work. For reduction of execution time, the option of selecting a smaller area R, for example R = 25 × 25, was proposed. With this approach, the operation time of the algorithm was reduced to an acceptable value, but at the same time, a quality of this algorithm work was slightly changed. To find a compromise between speed and quality, it is proposed to use not only a comparison with all pixels q ∈ R for calculation the weighted value of pixel p, but to apply a slightly modified approach: • Select an arbitrary pixel p. • Select pixels, the texture around which is similar on the texture around pixel p, within a neighborhood of size r around the selected pixels. • Perform calculations (Eq. 12.1) within the area of size R around the selected pixels, but use the area around pixel p as the central area of size r. • Get several weighted values of pixel p when calculating in different areas R. • Calculate the average of the obtained values and assign it to pixel p. • Repeat this algorithm for each pixel of image. During these operations, the value of pixel p is obtained, which is more weighted and less dependent only from the values of the neighboring pixels in the area R around the given pixel. The area of forming the value of the selected pixel expands by using self-similarity between the areas around the image pixels. The second approach for using the similarity between similar areas around pixels is as follows: • Calculate the value of each pixel by Eq. 12.1. • Select pixels, the texture around which is similar on the texture around pixel p, within the neighborhood of the size R around these pixels. • Take the values of the selected pixels, calculated by Eq. 12.1, and calculate the average value. • Assign this average value to all selected pixels with similar areas of size R. In this case, a side effect is the reduction of time for the algorithm work due to the application of previously performed calculations.
160
S. K. Kartsov et al.
This approach allows to improve a quality of the noise reduction algorithm. At the same time, the speed of the algorithm remains almost unchanged due to the preliminary calculation of pixel values with the same texture characteristics.
12.5 Non-local Means Based on Local Binary Patterns It is proposed to use LBP operator [26] for finding areas ri with the same texture characteristics around pixel pi . Using LBP operator allows to classify of image areas. LBP operator converts an image into an array of binary codes, that describe the neighborhoods of the image elements. This is a theoretically simple and, at the same time, effective texture analysis method, that shows good results in many empirical studies. It can be represented as a unifying method of statistical and structural texture analysis methods. The meaning of LBP operator is as follows. Consider a neighborhood of 3 × 3 around each pixel. Adjacent pixels, that have a value greater or equal to the value of the center pixel, are assigned the value “1”, and those pixels, whose value is less, than the value of the center pixel, are assigned the value “0”. Then, clockwise, for example, from the upper left corner, the obtained values are multiplied by powers of two and summed for obtaining the binary code of the central pixel (Fig. 12.4). Since the neighborhood of the center pixel consists of 8 adjacent pixels, it is possible in total to get 28 = 256 different binary codes, which can be obtained depending from the relative tone values in the center and adjacent pixels. These binary numbers or their decimal equivalent can be associated with the central pixel and used as characteristics of the local structure of the image texture around this pixel. The decimal form of the resulting 8-bit binary number can be expressed as follows: L B P(xc , yc ) =
7 n=0
Fig. 12.4 An example of LBP operator
s(i n − i c )2n ,
(12.6)
12 Non-local Means Denoising Algorithm Based on Local Binary …
161
where i c corresponds to the center pixel value (xc , yc ), i n corresponds to the values of eight adjacent pixels, and the function s(x) is defined by Eq. 12.7. s(x) =
1 if x ≥ 0 0 otherwise
(12.7)
The distribution of LBP codes is used as a classification or segmentation for further texture analysis.
12.6 Experimental Studies Let us consider a fragment of the scanned document (Fig. 12.5). We will perform noise reduction of this document, using the original NLM algorithm and method based on the proposed approach, which uses LBP operator. All measurements were carried out, using MatLab 7.5.0 system, on a computer with Intel (R) i3-3220 3.30 GHz processor and 4 GB of installed memory. For obtaining results by the standard method, a code, that implements the standard method in MatLab [26], was used. The results, that were obtained during the work, are presented in Fig. 12.6. Using of scanned old documents, containing the traces of noise, as source images does not allow the application of various metrics for assessment a quality of the selected algorithms. But, as seen from Fig. 12.6, a visual perception of image, obtained by the proposed approach, is better, than by the original algorithm. The speed of operation of these algorithms, in this study, is, in principle, the same and depends from the size of the selected image areas R and r and the image size. During the execution of both algorithms, the difference between areas of size r around pixel p and all pixels q within region R is calculated. As a result, the complexity of the calculation on the image of size M × N makes M * N * R2 * r 2 comparisons for
Fig. 12.5 Original noisy image
162
S. K. Kartsov et al.
Fig. 12.6 Images processed by: a original NLM algorithm, b NLM algorithm based on LBP
calculating new values for all pixels of the image. This suggests about a comparable speed of operation of both algorithms while improving the noise reduction process by the proposed method. The calculation of the area around each pixel p by LBP operator in the proposed method does not significantly affect to the final execution time. At the same time, as mentioned above, it can even be reduced. However, for a significant acceleration of operation of the proposed method, the optimal ways for the assignment of weights between blocks by various methods can be used, for example [4–14].
12.7 Conclusions The objective of this study is a quality improvement of the resulting image during the noise reduction operation. The application of the proposed approach allows one to form an averaged final value of image pixels based on calculations not only within the region R, but also using values of all pixels p, having a similar structure within the region r around themselves within the entire image. Based on all the above, it can be concluded, that using this approach for improvement the quality of the resulting image in the process of noise reduction of the image obtained by scanning a document, allowed to solve the already known task at a qualitatively new level, opening the new possibilities of the methods used in the algorithm. Acknowledgements The reporting study was conducted in connection with the work on old paper documents, when converting them into electronic form by scanning, and the need for improvement the quality of the scanned documents.
12 Non-local Means Denoising Algorithm Based on Local Binary …
163
References 1. Buades, A., Coll, B., Morel, J.: A non-local algorithm for image denoising. In: IEEE International Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 60–65 (2005) 2. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. Multiscale Model. Simul. 2, 490–530 (2005) 3. Buades, A., Coll, B., Morel, J.M.: Non-local image and movie denoising. Int. J. Comput. Vis. 2, 123–139 (2008) 4. Hedjam, R., Moghaddam, R.F., Cheriet, M.: Markovian clustering for the non-local means image denoising. In: 16th IEEE International Conference on Image Processing, pp. 3877–3880 (2009) 5. James, W., Stein, C.: Contributions to the theory of statistics. Estimation with quadratic loss. In: 4th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 361–379 (1961) 6. Wu, Y., Tracey, B., Natarajan, P., Noonan, J.P.: James–Stein type center pixel weights for non-local means image denoising. IEEE Signal Process. Lett. 20(4), 411–414 (2013) 7. Lai, R., Dou, X.: Improved non-local means filtering. In: 3rd International Congress on Image and Signal Processing, vol. 2, pp. 720–722 (2010) 8. Khan, A., El-Sakka, M.R.: Non-local means using adaptive weight thresholding. In: 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, pp. 67–76 (2016) 9. Mahmoudi, M., Sapiro, G.: Fast image and video denoising via nonlocal means of similar neighborhoods. IEEE Signal Process. Lett. 12(12), 839–842 (2005) 10. Bilcu, R.C., Vehvilainen, M.: Combined non-local averaging and intersection of confidence intervals for image denoising. In: 15th IEEE International Conference on Image Processing, pp. 1736–1739 (2008) 11. Adams, A., Gelfand, N., Dolson, J., Levoy, M.: Gaussian KD-trees for fast high-dimensional filtering. ACM Trans. Graph. 28, 21.1–21.12 (2009) 12. Orchard, J., Ebrahimi, M., Wong, A.: Efficient non-local-means denoising using the SVD. In: Proceedings of IEEE International Conference on Image Processing, pp. 1732–1735 (2008) 13. Coupe, P., Yger, P., Barillot, C.: Fast non-local means denoising for 3D MRI images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 33–40 (2006) 14. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28, 24.1–24.8 (2009) 15. Enríquez, A.E.P., Ponomaryov, V.: Image denoising using block matching and discrete cosine transform with edge restoring. In: International Conference on Electronics, Communications and Computers, pp. 140–147 (2016) 16. Wang, J., Guo, Y., Ying, Y., Liu, Y., Peng, Q.: Fast non-local algorithm for image denoising. In: IEEE International Conference on Image Processing, pp. 1429–1432 (2006) 17. Zhong, H., Zhang, J., Liu, G.: Robust polarimetric SAR despeckling based on nonlocal means and distributed Lee filter. IEEE Trans. Geosci. Remote Sens. 52(7), 4198–4210 (2013) 18. Lee, J.S.: Digital image smoothing and the sigma filter. Comput. Vis. Graph. Image Process. 24(2), 255–269 (1983) 19. Chan, C., Fulton, R., Feng, D.D., Meikle, S.: Median non-local means filtering for low SNR image denoising: application to pet with anatomical knowledge. In: IEEE Nuclear Science Symposium & Medical Imaging Conference, pp. 3613–3618 (2010) 20. Irrera, P., Bloch, I., Delplanque, M.: A flexible patch based approach for combined denoising and contrast enhancement of digital X-ray images. Med. Image Anal. 28, 33–45 (2016) 21. Zhan, Y., Ding, M., Wu, L., Zhang, X.: Nonlocal means method using weight refining for despeckling of ultrasound images. Signal Process. 103, 201–213 (2014) 22. Xu, J., Hu, J., Jia, X.: A multistaged automatic restoration of noisy microscopy cell images. IEEE J. Biomed. Health Inform. 19(1), 367–376 (2015)
164
S. K. Kartsov et al.
23. Genin, L., Champagnat, F., Besnerais, G.L., Coret, L.: Point object detection using a NL-means type filter. In: 18th IEEE International Conference on Image Processing, pp. 3533–3536 (2011) 24. Kim, M., Park, D., Han, D.K., Ko, H.: A novel approach for denoising and enhancement of extremely low-light video. IEEE Trans. Consum. Electron. 61(1), 72–80 (2015) 25. Barnsley, M., Hurd, L.: Fractal Image Compression. A. K. Peters Ltd., Wellesley, MA (1993) 26. Ojala, T., Pietikainen, M., Maenpaa, T.: Multi resolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002)
Chapter 13
The Object-Oriented Simultaneous Localization and Mapping on the Spherobot Platform Vladimir A. Antipov, Vasilii P. Kirnos, Vera A. Kokovkina and Andrey L. Priorov Abstract The chapter is about the extraordinary robot, which looks like a ball and is controlled by Wi-Fi communication. It has the microcomputer, camera with special lens, and microcontroller as a slot for additional sensors. On this platform, an algorithm of the simultaneous localization and mapping is running. The algorithm builds the path of the robot and map of the environment. The path of robot is already noisy. To overcome a noise influence, we add encoders, which describe the passable way of the robot. Keywords SLAM · Landmarks detectors · Landmarks descriptors · Wheel encoders · Robotic kinematic model
13.1 Introduction Mobile robotics systems have broad application in the most different areas of modern life: in the industry, different military and rescue applications, in medicine, etc. Each of these applications has different requirements to characteristics of robots. There are a number of tasks, for which execution the mobile robot should have rather small sizes. The robots described further in the chapter it supposed to use for a research of rooms and areas, where no access has for people due to a danger for life. This robot also solves the problem of penetration because person cannot get there physically, for example in the narrow horizontal mines. In addition, it is offered to use robots as supportive application to performing diagnostics of cases of the produced products in hardly accessible compartments at faults by airplanes or river transport. Extraordinary construction of this robot comes across of its size. The map is a rather convenient tool for determining the position in space, graphical image of the terrain plan, or navigation. Map rather has disadvantages, so it is rarely used in the navigation of mobile robots without additional instrumentation, which V. A. Antipov · V. P. Kirnos · V. A. Kokovkina · A. L. Priorov (B) P.G. Demidov Yaroslavl State University, 14, Sovetskaya St., Yaroslavl 150003, Russian Federation V. P. Kirnos e-mail: [email protected] © Springer Nature Switzerland AG 2020 M. N. Favorskaya and L. C. Jain (eds.), Computer Vision in Control Systems—6, Intelligent Systems Reference Library 182, https://doi.org/10.1007/978-3-030-39177-5_13
165
166
V. A. Antipov et al.
are the environment sensors. These two tools are complementary: the contribution of maps to the assessment of the current location in space increases with decreasing accuracy and quality of sensors of perception of space, and otherwise. The chapter focuses on the comparison of different approaches to the solutions of these subproblems. Section 13.2 is about a mobile robotics platform named spherobot and its applications. Section 13.3 describes the proposed algorithm based on the extended Kalman filter because, hereinafter, this information is required more than once [1, 2]. Here, we describe how to reconstruct 3D scene using a fisheye camera [3–10]. Results are presented in Sect. 13.4, while Sect. 13.5 concludes the chapter.
13.2 Robot Construction The mobile robot needs the locomotion mechanisms, which allow it to move without restrictions during the entire environment. There are many options of possible ways of movement. Therefore, a choice of a way of movement is an important aspect of construction of mobile robots. The majority of these mechanisms of locomotion are inspired by their biological analogs though not without any exception. The wheel is purely human invention that provides extremely high performance on a plan surface. This mechanism is not completely alien to biological systems (for example, the biaxial system of walking can be approximated by a polygon with the parties equal to step length). However, the nature did not create completely rotating actively working connection, which is necessary for a wheel movement. Almost there are problems with complexity of locomotion biological mechanisms and mechanisms of providing them with energy, expendables, and individual production of each component. Mobile robots usually move ether with use of wheel mechanisms or with use of a small amount of hinged legs, which is the simplest of biological approaches to locomotion. The environment created by the person often consists of the designed smooth surfaces. Thus, use of wheel robots in similar conditions is most effective in spite of this approach has own features and restrictions. The robot described in the chapter carries out movement by principles of the mechanics inherent two-wheeled robots, whose center of gravity lies below an axis of wheels. This solution helps the robot to save from the balancing problem, but causes to write the special software. Also it is required to add additional sensors and optionally more powerful power source or more powerful transmitter of a signal (if data processing is made for balancing of the robot by the external server). That, first, makes heavier the robot, second, occupies already a small amount of internal volume of the robot. A priority was including the small size and a certain distribution of the robot mass, these parameters was important to be considered. The mobile robot consists of two hemispheres serving as wheels and providing movement and the central part remaining at the movement motionless because of the gravity center, which is displaced down. The central part also serves for placement of the robot hardware: control board, communication module, sensors, camera, etc.
13 The Object-Oriented Simultaneous Localization and Mapping …
167
Fig. 13.1 Construction of the spherobot
The device of the robot allows it to rise always in the correct situation even if it lies on one side. The robots construction is presented on Fig. 13.1. The communication is the basic part of this robot. Rather there are problems with limitations of battery source that is why the solution is calculated outsource of the robot on the server. Wi-Fi is used for the communication between the server and robot. By the wireless protocol, the robot streams the video data and data from encoders to the server. The sensors data broker on the robot is the Raspberry Pi Zero W.
13.3 Proposed Algorithm The robot is used fisheye lens with 260° on the camera for the getting much information as possible form the one camera. The axis of the camera is directed vertically up. It gives us the image of a ceiling and the image around the robot. We use the camera for constructing a three-dimensional point cloud (resolution 640 × 480). It is fixed in such a way that it is possible to obtain panoramic images. The algorithm consists of three main stages. Data acquisition and synchronization is provided in Sect. 13.3.1. Determining the location of the mobile platform is given in Sect. 13.3.2. Construction of 3D map is discussed in Sect. 13.3.3.
13.3.1 Data Acquisition and Synchronization The mobile platform is controlled remotely by a human operator. The software of the robot takes off data from the sensors to the file “rosbag”. Then, the obtained file is processed by an algorithm implemented on ROS. Since the sensors have a different reading frequency, it is necessary to synchronize the received data in time. A small period of time is taken, and if all the sensors are activated at this moment, then the data is processed.
168
V. A. Antipov et al.
13.3.2 Determining the Location of the Mobile Platform Simultaneous Localization And Mapping (SLAM) algorithm is used to determine the location of the mobile platform. This chapter uses Extended Kalman Filter SLAM (EKF-SLAM) algorithm, which considers the state of the system as Gaussian distribution and constantly evaluates the mathematical expectation and covariance matrix. The update of the system state evaluation is carried out in two stages: prediction and correction [3]. Prediction stage: (1) Prediction of the system state is provided by Eq. 13.1. μt = g(μt−1 , u t )
(13.1)
(2) Prediction of covariance error is estimated by Eq. 13.2, where μt is the prediction of the system state at the current moment of time, g(μt−1 , u t ) is the prediction function of the system state, u t is the control action at the current moment of time, Σt is the prediction of the system state error at the current moment of time, G t is the state transition matrix, Rt is the system noise. Σt = G t ∗ Σt−1 ∗ G tT + Rt
(13.2)
Equation 13.3 is used to define Rt : Rt = Vt Σc VtT ,
(13.3)
where Vt is the matrix of state transition of control action, Σc is the covariance of the control action. Correction stage: (1) The calculation of Kalman gain is implemented by Eq. 13.4. −1 K t = Σt ∗ HtT ∗ Ht ∗ Σt ∗ HtT + Q
(13.4)
(2) Update the estimate with the measurement of z t has a view of Eq. 13.5. μt = μt + K t ∗ (z t − h(μt ))
(13.5)
(3) Update the covariance error is provided by Eq. 13.6, where K t is Kalman gain, Ht is the measurement matrix showing the ratio of measurements and states, Q is the covariance of measurement noise, z t is the measurement at the current moment of time, I is the identity matrix. Σt = (I − K t ∗ Ht ) ∗ Σt
(13.6)
13 The Object-Oriented Simultaneous Localization and Mapping …
169
After filtering, the local maximum of the output normed filter signal module is allocated in each received fragment. The association of data does not require using the criterion of close locating the observed landmark to the predicted location (geometric method) but aided by the technical vision. Many perceptual techniques, such as vision, provide a lot of information about shape, color, and texture, and all of this can be used to find a correspondence between two sets of landmarks. The main steps of the data association algorithm are the following: Step 1. Converting the fisheye images into panoramic image. Step 2. Searching for the area of the image, where found landmark is located. Step 3. Searching the key points and their descriptors by Scale-Invariant Feature Transform (SIFT) method in the found image area (Fig. 13.2). Step 4. Comparison of landmarks on a set of coinciding key points. The main steps of the algorithm for constructing a panoramic image are following: Step 1. Determination of the center and the inner and outer radii. Determination of the center can be automated using Hough transformation for the search of circles. Step 2. Building a map for converting fisheye image into a panoramic image. This map is a display of the location of individual pixels of a fisheye image in a panoramic image (Eqs. 13.7–13.10):
r=
x f = xc + r ∗ sin(θ ),
(13.7)
y f = yc + r ∗ cos(θ ),
(13.8)
yp (Router − Rinner ) + Rinner , height
Fig. 13.2 Description of the landmark using SIFT-descriptors
(13.9)
170
V. A. Antipov et al.
Fig. 13.3 Converting a fisheye image to a panoramic image
θ = 2π
xp . width
(13.10)
Step 3. Use the map of conversion with the application of interpolation (Fig. 13.3).
13.3.3 Construction of Three-Dimensional Map The displacement map is not obtained using two cameras, but one camera for two consecutive images from different viewpoints. Each viewpoint is defined using EKFSLAM algorithm. In this chapter, a local stereo matching algorithm is used, in which the displacement map is determined based on the comparison of pixel windows on the epipolar line using the sum of absolute differences (Eq. 13.11): E S AD ( p, d) =
k l
[B(x + i, y + j) − M(x + d + i, y + j)]2 ,
(13.11)
i=−1 j=−k
where B is the first image, M is the second one [5, 6]. The accuracy of the estimation of the displacement map often suffers from extreme scenarios, such as an area without texture, overexposure, repeated structure, etc. Therefore, it is necessary a post-processing to improve the accuracy of the displacement map. During post-processing stage, Weighted Least Squares (WLS) filtering is used because it provides good smoothing that preserve the edges [4]. The purpose of filtering the stereo correspondence can be expressed as minimizing Eq. 13.12:
13 The Object-Oriented Simultaneous Localization and Mapping …
171
Fig. 13.4 Example of WLS-filtering: a raw image, b image after filtration
p
D p − D p
2
∂D + λ ax, p (I ) ∂x
2
∂D + a y, p (I ) ∂y p
2 ,
(13.12)
p
where D is the original image, D is the filtered image, p is the index that determines the pixel position, ax, p (I ) and a y, p (I ) are the weighting coefficients, which are defined by Eqs. 13.13–13.14: α −1 ∂l ax, p (I ) = ( p) + ε , ∂x α −1 ∂l a y, p (I ) = ( p) + ε , ∂y
(13.13) (13.14)
where l is the brightness channel at logarithmic scale I , parameter α determines the sharpness of the border, ε is the constant with small value (Fig. 13.4). The fisheye camera model is based on a spherical projection. Suppose there is a sphere with a unit radius and a point P in space, as shown in Fig. 13.5. Point P is the projection of point P on the sphere, i.e. a point P is the intersection of the surface of a sphere with a line drawn from the center of the sphere O to the point P [5]. Thus, the displaying between space points and points on the surface of a sphere is determined. Then, these points are vertically projected onto the image plane, a resulting circular image is shown in Fig. 13.3. Position of the point P is defined by Eq. 13.15. ⎡ ⎤ ⎡ ⎤ x r ∗ sinφ ∗ cosφ P = ⎣ y ⎦ = ⎣ r ∗ sinφ ∗ sinφ ⎦ x r ∗ cosθ
(13.15)
172
V. A. Antipov et al.
Fig. 13.5 Fisheye camera model
Knowing the coordinates of the point P and the displacement map, it is possible to determine the coordinates of the point P (Eqs. 13.16–13.17) [6]: P = λ(θ, φ)P ,
(13.16)
λ(θ, φ) = b ∗ f /d(θ, φ),
(13.17)
where b is the distance between camera viewpoints, d is the displacement map, f is the focal length.
13.4 Results To evaluate the operation of EKF-SLAM algorithm, the modeling environment “gazebo” is used. On the map “willow garage”, two arrivals are held. In each race, the standard deviation was calculated. Results of researches are shown in Figs. 13.6, 13.7 and Table 13.1. Table 13.1 shows that the developed algorithm of simultaneous localization and mapping works approximately like other SLAM algorithms. The research algorithm has a root-mean-square error in the range from 0.01 to 0.03 m, whereas Large-Scale Direct monocular (LCD)-SLAM has from 0.02 to 0.38 m and Red Green Blue Depth (RGBD)-SLAM has from 0.01 to 0.9 m. However, the implemented algorithm has qualitative change in that it allows to obtain both two-dimensional and three-dimensional maps.
13 The Object-Oriented Simultaneous Localization and Mapping …
173
Fig. 13.6 Two arrivals in “willow garage”: a, c received maps, b, d its trajectories of the mobile platform on the map
13.5 Conclusions We presented the mobile platform and some technical problems, which we come across on the implementation stage. The main goal of the chapter is to implement SLAM algorithm based on fisheye. The key idea of this solution is to reconstruct 3D scene from only one camera with special lens. The resulting algorithm is not worse than LCD-SLAM and RGBD-SLAM.
174
V. A. Antipov et al.
Fig. 13.7 Result of SLAM: a resulting map, b point cloud, c trajectory of the mobile platform in the room Table 13.1 Standard deviations of coordinates for each trajectory
Trajectory 1
Trajectory 2
SD of coordinate X
0.281
0.014
SD of coordinate Y
0.096
0.04
SD of coordinate Y
0.032
0.02
13 The Object-Oriented Simultaneous Localization and Mapping …
175
References 1. Newman, P.M.: EKF Based Navigation and SLAM. Background Material, Notes and Example Code. SLAM Summer School, Oxford (2006) 2. Bailey, P., Beckler, M., Hoglund, R., Saxton, J.: 2D simultaneous localization and mapping. Available at: https://pdfs.semanticscholar.org/82c6/ f386767d992b9edd2f56490245e30c80d6da.pdf. Accessed 12 Sept 2019 3. Furman, Ya.A., Krevetskii, A.V., Peredereev, A.K., Rozhentsov, A.A., Khafizov, R.G., Egoshina, I.L., Leukhin, A.N.: Introduction to Contour Analysis. Fizmatlit, Moscow (in Russian) (2003) 4. Farbman, Z., Fattal, R., Lischinski, D., Szeliski, R.: Edge-preserving decompositions for multiscale tone and detail manipulation. J. ACM Trans. Graph. 27(30), 67.1–67.10 (2008) 5. Song, M., Watanabe, H.: Robust 3D reconstruction with omni-directional camera based on structure from motion. In: International Workshop on Advanced Image Technology, pp. 1–4 (2018) 6. Aliaga, D., Yanovsky, D., Carlbom, I.: A dense sampling approach for rendering large indoor environments. Comput. Graph. Appl. 22–30 (2003). Special Issue on 3D Reconstruction and Visualization 7. Fleck, S., Busch, F., Biber, P., Straber, W.: Omnidirectional 3D modeling on a mobile robot using graph cuts. In: 2005 IEEE International Conference on Robotics and Automation, pp. 1748– 1754 (2005) 8. Igbinedion, I., Han, H.: 3D stereo reconstruction using multiple spherical views. Available at: https://web.stanford.edu/class/ee368/Project_Autumn_1516/Reports/Igbinedion_Han.pdf. Accessed 10 Sept 2019 9. Hahnel, D., Schulz, D., Burgard, W.: Mobile robot mapping in populated environments. Adv. Robot. 17(7), 579–597 (2003) 10. Negenborn, R.: Robot localization and Kalman filters. On finding your position in a noisy world. Thesis number (for the degree of Master of Science): INF/SCR-03-09, Utrecht University (2003)