134 26 23MB
English Pages 605 [581] Year 2021
Lecture Notes on Data Engineering and Communications Technologies 61
Harish Sharma Mukesh Saraswat Sandeep Kumar Jagdish Chand Bansal Editors
Intelligent Learning for Computer Vision Proceedings of Congress on Intelligent Systems 2020
Lecture Notes on Data Engineering and Communications Technologies Volume 61
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15362
Harish Sharma Mukesh Saraswat Sandeep Kumar Jagdish Chand Bansal •
•
•
Editors
Intelligent Learning for Computer Vision Proceedings of Congress on Intelligent Systems 2020
123
Editors Harish Sharma Department of Computer Science and Engineering Rajasthan Technical University Kota, Rajasthan, India Sandeep Kumar Department of Computer Science and Engineering CHRIST (Deemed to be University) Bangalore, Karnataka, India
Mukesh Saraswat Department of Computer Science & Engineering and Information Technology Jaypee Institute of Information Technology Noida, Uttar Pradesh, India Jagdish Chand Bansal Department of Mathematics South Asian University New Delhi, Delhi, India
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-981-33-4581-2 ISBN 978-981-33-4582-9 (eBook) https://doi.org/10.1007/978-981-33-4582-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This volume contains the papers presented at the first Congress on Intelligent Systems (CIS-2020), a world conference in virtual format organized by Soft Computing Research Society during September 05–06, 2020. The Congress on Intelligent Systems (CIS-2020) invited ideas, developments, applications, experiences, and evaluations in the field of Intelligent Systems from academicians, research scholars, and scientists. The conference deliberation included topics specified within its scope. The conference offered a platform for bringing forward extensive research and literature across the arena of Intelligent Systems. It provided an overview of the upcoming technologies. CIS-2020 provided a platform for leading experts to share their perceptions, provide supervision, and address participant’s interrogations and concerns. CIS-2020 received 687 research submissions from 38 different countries, viz. Argentina, Australia, Bangladesh, Botswana, Canada, China, Colombia, Egypt, Ethiopia, France, India, Iran, Iraq, Italy, Japan, Latvia, Malaysia, Mauritius, Morocco, New Zealand, Oman, Pakistan, Palestine, Portugal, Russia, Saudi Arabia, Singapore, South Africa, South Korea, Spain, Sri Lanka, Taiwan, Tunisia, Turkey, Ukraine, United Arab Emirates, UK, USA, and Vietnam. The papers included topics pertaining to varied contemporary areas in technology, artificial intelligence, machine learning, and blockchain, etc. After a rigorous peer review with the help of program committee members and more than hundred external reviewers, 170 papers were selected for presentation and 45 papers selected for publication of this proceeding. CIS-2020 is a flagship event of Soft Computing Research Society, India. The conference was inaugurated by Prof. Kusum Deep and Prof. Atulya Nagar along with general chair Prof. Joong Hoon Kim, Prof. Jagdish Chand Bansal, Dr. Harish Sharma, and Dr. Mukesh Saraswat. The conference witnessed keynote addresses from eminent speakers, namely Prof. Maurice Clerc, Prof. Swagatam Das, Prof. Meng-Hiot Lim, Prof. Jonathan H. Chan, Prof. Mohammad Shorif Uddin,
v
vi
Preface
Prof. Andries Engelbrecht, Prof. Amir H Gandomi, Mr. Aninda Bose, Prof. Nishchal K. Verma, and Prof. Akhil Ranjan Garg. The organizers wish to thank editors from Springer Nature for their support and guidance. Kota, India Noida, India Bangalore, India New Delhi, India
Harish Sharma Mukesh Saraswat Sandeep Kumar Jagdish Chand Bansal
Contents
Development of Inter-ethnic Harmony Search Algorithm Based on Inter-ethnic Reconciliation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyun Woo Jung, Young Hwan Choi, Donghwi Jung, and Joong Hoon Kim
1
A Low-Cost Embedded Computer Vision System for the Classification of Recyclable Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karl Myers and Emanuele Lindo Secco
11
An Optimal Feature Selection Approach Using IBBO for Histopathological Image Classification . . . . . . . . . . . . . . . . . . . . . . . Mukesh Saraswat, Raju Pal, Roop Singh, Himanshu Mittal, Avinash Pandey, and Jagdish Chand Bansal Accuracy Evaluation of Plant Leaf Disease Detection and Classification Using GLCM and Multiclass SVM Classifier . . . . . . K. Rajiv, N. Rajasekhar, K. Prasanna Lakshmi, D. Srinivasa Rao, and P. Sabitha Reddy A Deep Learning Technique for Automatic Teeth Recognition in Dental Panoramic X-Ray Images Using Modified Palmer Notation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fahad Parvez Mahdi and Syoji Kobashi Detection of Parkinson’s Disease from Hand-Drawn Images Using Deep Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Akalpita Das, Himanish Shekhar Das, Arijeet Choudhury, Anupal Neog, and Sourav Mazumdar An Empirical Analysis of Hierarchical and Partition-Based Clustering Techniques in Optic Disc Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . J. Prakash and B. Vinoth Kumar
31
41
55
67
85
vii
viii
Contents
Multi-class Support Vector Machine-Based Household Object Recognition System Using Features Supported by Point Cloud Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smita Gour, Pushpa B. Patil, and Basavaraj S. Malapur
97
Initialization of MLP Parameters Using Deep Belief Networks for Cancer Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Barış Dinç, Yasin Kaya, and Serdar Yıldırım An Improved Inception Layer-Based Convolutional Neural Network for Identifying Rice Leaf Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 B. Baranidharan, C. N. S. Vinoth Kumar, and M. Vasim Babu Design and Implementation of Traffic Sign Classifier Using Machine Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Samarth Patel, Pankaj Agarwal, Vijander Singh, and Linesh Raja Designing Controller Parameter of Wind Turbine Emulator Using Artificial Bee Colony Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Ajay Sharma, Harish Sharma, Ashish Khandelwal, and Nirmala Sharma Text Document Orientation Detection Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shivam Aggarwal, Safal Singh Gaur, and Manju A Deep Learning-Based Segregation of Housing Image Data for Real Estate Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Annu Kumari, Vinod Maan, and Dhiraj Improved Image Super-resolution Using Enhanced Generative Adversarial Network a Comparative Study . . . . . . . . . . . . . . . . . . . . . . 181 B. V. Balaji Prabhu and Omkar Subburao Jois Narasipura Comparative Study of Supervised Machine Learning Algorithms for Healthcare Dataset Using Orange . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Vaibhav Bhatnagar and Ramesh C. Poonia Maximum Power Point Tracking of Photovoltaic System Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Kusum Lata Agarwal and Shubham Sharma IoT Security: A Survey of Issues, Attacks and Defences . . . . . . . . . . . . 219 Vinesh Kumar Jain and Jyoti Gajrani Detecting the Nuclei in Different Pictures Using Region Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Naiswita Parmar Chaotic Henry Gas Solubility Optimization Algorithm . . . . . . . . . . . . . 247 Nand Kishor Yadav and Mukesh Saraswat
Contents
ix
Electric Load Forecasting Using Fuzzy Knowledge Base System with Improved Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Bhavesh Kumar Chauhan and Praveen Kumar Shukla Deep Learning-Based Framework for Retinal Vasculature Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Shambhavi Shikha Tiwari, Akash Dholaria, Rajat Pandey, Gauri Nigam, Ranjana Agrawal, Rahee Walambe, and Ketan Kotecha EcDEALS: Adaptive Local Search Strategies in Differential Evolution for Escalating Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Harish Sharma, Prashant Sharma, Kavita Sharma, and Rajani Kumari Optimization of Regularization and Early Stopping to Reduce Overfitting in Recognition of Handwritten Characters . . . . . . . . . . . . . . 305 Naman Jain Employing Data Augmentation for Recognition of Hand Gestures Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Deepak Kumar, Abdul Aleem, and Manoj Madhava Gore Comparative Design Analysis of Optimized Learning Rate for Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Rashmi, Udayan Ghose, and Manushree Gupta Vision-Based Vehicle Detection and Tracking System . . . . . . . . . . . . . . 353 N. Kavitha and D. N. Chandrappa An Optimal Feature Based Automatic Leaf Recognition Model Using Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Aditi Ghosh and Parthajit Roy Super-Resolution of Level-17 Images Using Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 B. V. Balaji Prabhu, Nikith P. Salian, B. M. Nikhil, and Omkar Subbaram Jois Narasipura Hand Gesture Recognition System Using IoT and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 B. N. Ajay, S. Aditya, A. S. Adarsha, N. Deekshitha, and K. Harshitha Improved Video Compression Using Variable Emission Step ConvGRU Based Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Sangeeta and Preeti Gulia Video Surveillance System with Auto Informing Feature . . . . . . . . . . . . 417 Ekta Tyagi, Deeksha, Vikas Sahu, Chandranshu Malhotra, Shubham Poddar, Jayash Verma, and Lokesh Chouhan
x
Contents
Bio-propulsion Techniques for Bio-micro/nano-Robots . . . . . . . . . . . . . 431 Deepa Mathur and Deepak Bhatia A Comparative Analysis of Edge-Preserving Approaches for Image Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Niveditta Thakur, Nafis Uddin Khan, and Sunil Datt Sharma Impact of Quasi-Variable Nodes on Numerical Integration of Parameter-Dependent Functions: A Maple Suite . . . . . . . . . . . . . . . . 455 Navnit Jha A Novel Unified Scheme for Missing Image Data Suggestion Based on Collaborative Generative Adversarial Network . . . . . . . . . . . . 463 R. Angeline and R. Vani ComVisMD—Compact 2D Visualization of Multidimensional Data: Experimenting with Two Different Datasets . . . . . . . . . . . . . . . . . . . . . . 473 Shridhar B. Dandin and Mireille Ducassé Text Recognition Using Convolutional Neural Network for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Sunanda Dixit, Anuja Velaskar, Nidhi Munavalli, and Apurva Waingankar Design of Compact Size Tri-Band Stacked Patch Antenna for GPS and IRNSS Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Nitin Kumar Suyan, Fateh Lal Lohar, Yogesh Solunke, and Chandresh Dhote Smart Lady E-wearable Security System for Women Working in the Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 Shuchi Dave, S. D. Purohit, Ritu Agarwal, Aman Jain, Deepak Sajnani, and Saksham Soni A Review of Nature-Inspired Algorithm-Based Multi-objective Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Ruchi Kaushik, Vijander Singh, and Rajani Kumari Residual Vibration Suppression of Non-deformable Object for Robot-Assisted Assembly Operation Using Vision Sensor . . . . . . . . . 539 Chetan Jalendra, B. K. Rout, and A. M. Marathe Adaption of Smart Devices and Virtual Reality (VR) in Secondary Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 R. K. A. R. Kariapper, P. Pirapuraj, M. S. Suhail Razeeth, A. C. M. Nafrees, and M. Fathima Roshan Impacts of Environmental Pollution on the Growth and Conception of Biological Populations Involving Incomplete I-Function . . . . . . . . . . . 567 D. L. Suthar, S.D. Purohit, A.M. Khan, and S. Dave
Contents
xi
Artificial Intelligence-Based Power Quality Improvement Techniques in WECS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577 K. G. Sharma, N. K. Gupta, D. K. Palwalia, and M. Bhadu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
About the Editors
Harish Sharma is Associate Professor at Rajasthan Technical University, Kota, in the Department of Computer Science & Engineering. He has worked at Vardhaman Mahaveer Open University Kota and Government Engineering College Jhalawar. He received his B.Tech. and M.Tech. degree in Computer Engineering from Government of Engineering College, Kota, and Rajasthan Technical University, Kota, in 2003 and 2009, respectively. He obtained his Ph.D. from ABV-Indian Institute of Information Technology and Management, Gwalior, India. He is Secretary and one of the founder members of Soft Computing Research Society of India. He is a lifetime member of Cryptology Research Society of India, ISI, Kolkata. He is Associate Editor of “International Journal of Swarm Intelligence (IJSI)” published by Inderscience. He has also edited special issues of the many reputed journals like “Memetic Computing”, “Journal of Experimental and Theoretical Artificial Intelligence”, and “Evolutionary Intelligence”. His primary area of interest is nature-inspired optimization techniques. He has contributed to more than 65 papers published in various international journals and conferences. Dr. Mukesh Saraswat is Associate Professor at Jaypee Institute of Information Technology, Noida, India. Dr. Saraswat has obtained his Ph.D. in Computer Science & Engineering from ABV-IIITM Gwalior, India. He has more than 18 years of teaching and research experience. He has guided 02 Ph.D. students, more than 50 M.Tech. and B.Tech. dissertations, and presently guiding 05 Ph.D. students. He has published more than 40 journal and conference papers in the area of image processing, pattern recognition, data mining, and soft computing. He was the part of successfully completed DRDE funded project on image analysis and currently running two projects funded by SERB-DST (New Delhi) on Histopathological Image Analysis and Collaborative Research Scheme (CRS), under TEQIP III (RTU-ATU) on Smile. He has been an active member of many organizing committees of various conferences and workshops. He was also Guest
xiii
xiv
About the Editors
Editor of the Journal of Swarm Intelligence. He is an active member of IEEE, ACM, and CSI Professional Bodies. His research areas include image processing, pattern recognition, mining, and soft computing. Dr. Sandeep Kumar is currently Associate Professor of Computer Science and Engineering at CHRIST (deemed to be University), Bangalore, India. Dr. Kumar holds a Ph.D. degree in Computer Science & Engineering, M.Tech. degree from RTU, Kota, and B.E. degree from Engineering College, Kota. Dr. Kumar was Assistant Professor of Computer Science & Engineering at ACEIT, Jaipur, 2008– 2011, and Assistant Professor of Computer Science, Faculty of Engineering & Technology, Jagannath University, Jaipur, 2011–2017. Dr. Kumar was the head of computer science at Jagannath University, 2013–2017. He is also working as Guest Editor for many journals including International Journal of Intelligent Information and Database Systems (IJIIDS), International Journal of Agricultural Resources, Governance and Ecology (IJARGE), International Journal of Environment and Sustainable Development (IJESD), Inderscience, Recent Patents on Computer Science, and Bentham Science, member of editorial boards for many international journals, and member of technical program committees of many conferences. He has organized eight conferences as Conference Chair, Organizing Chair, and Technical Program Chair. Dr. Kumar has over fifty publications in well-known SCI/SCOPUS indexed international journals and conferences. He has authored/edited four books in the area of computer science and edited two conference proceedings. His research interests include nature-inspired algorithms, swarm intelligence, soft computing, and computational intelligence. Dr. Jagdish Chand Bansal is Associate Professor at South Asian University New Delhi and Visiting Faculty at Maths and Computer Science, Liverpool Hope University UK. Dr. Bansal has obtained his Ph.D. in Mathematics from IIT Roorkee. Before joining SAU New Delhi, he has worked as Assistant Professor at ABV-Indian Institute of Information Technology and Management Gwalior and BITS Pilani. He is Series Editor of the book series Algorithms for Intelligent Systems (AIS) published by Springer. He is Editor-in-Chief of International Journal of Swarm Intelligence (IJSI) published by Inderscience. He is also Associate Editor of IEEE ACESSS published by IEEE. He is the steering committee member and General Chair of the annual conference series SocProS. He is General Secretary of Soft Computing Research Society (SCRS). His primary area of interest is swarm intelligence and nature-inspired optimization techniques. Recently, he proposed a fission–fusion social structure-based optimization algorithm, Spider Monkey Optimization (SMO), which is being applied to various problems from engineering domain. He has published more than 70 research papers in various international journals/conferences. He has supervised Ph.D. theses from ABV-IIITM Gwalior and SAU New Delhi. He has also received Gold Medal at UG and PG levels.
Development of Inter-ethnic Harmony Search Algorithm Based on Inter-ethnic Reconciliation Hyun Woo Jung , Young Hwan Choi , Donghwi Jung , and Joong Hoon Kim
Abstract Harmony search (HS) has been applied in various fields and presented good results. However, since HS has several drawbacks (i.e., parameter settings, solution stagnation), various improved versions of HS developed to overcome these drawbacks. In this study, a new improved HS is proposed called inter-ethnic harmony search (IeHS). IeHS is developed to mimic the concept of the reconciliation of various ethnicities from Turkish history, and this algorithm considers the self-adaptive parameters and decision variables setting, balancing local and global search, overcoming the solution stagnation problem using the historical concept (i.e., Millet and Jannissary). To verify the performance of IeHS, the mathematical benchmark functions from CEC 2014 are applied and compared representative improved version of HS using performance indices. Keywords Harmony search · Turkish history · Millet · Jannissary · CEC 2014
1 Introduction Harmony search (HS) [1] is a meta-heuristic algorithm that mimics the musician’s improvisation to produce a harmony through repetitive practices. HS had been successfully applied in various fields such as engineering [2–4], water distribution [5, 6], energy [7], structural optimization [8], computer science [9, 10]. HS, like any other meta-heuristic algorithm, has the drawbacks of appropriate parameter settings. Also, H. W. Jung Department of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, South Korea Y. H. Choi Department of Civil Engineering, Gyeongnam National University of Science and Technology, 33 Dongjin-ro, Jinjum Gyeongnam 52725, South Korea D. Jung · J. H. Kim (B) School of Civil, Environmental and Architectural Engineering, Korea University, Seoul 02841, South Korea e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_1
1
2
H. W. Jung et al.
as the increase in Number of Function Evaluation (NFE), the solution’s stagnation in which optimal solutions unchanged occurs due to an increase in local optimums [11]. To overcome the drawback of parameter settings, various versions of HS using the modified parameter setting methods were proposed [12–16]. However, an improved version of HS for simultaneously overcoming parameter settings and solution stagnation has not been proposed. To cope with solution stagnation, it is necessary to secure the diversity of the solution in HM [11]. The Ottoman Empire, an ancient empire in Turkey, has embraced various ethnics through the Millet which is a political system in which each ethnic form villages and govern themselves. Also, the Ottoman empire’s various ethnic groups accepted the Ottoman cultures through the military organization Jannissary which consist of them. The concept of the Ottoman Empire systems can be applied to address the solution stagnation of meta-heuristic algorithm. In this paper, inter-ethnic harmony search (IeHS) is proposed. IeHS imitated a system for respecting the ethnics of the Ottoman Empire, to reconcile the various ethnics. Through this, keep the diversity of the solutions and develop the search space. Also, as the iteration proceeded, IeHS automatically set the parameters into consideration of HM. To verify the performance of the proposed algorithm, it was applied to benchmark function of CEC 2014 Reference and compared other improved HS.
2 Inter-ethnic Harmony Search (IeHS) IeHS mimics the Ottoman Empire system to unite various ethnic groups. The Ottoman Empire recognized the autonomy of each ethnic through. IeHS divides the initial HM according to fitness value, and each sub-HM called Millet Memory (MM) explores the optimal solution independently using different set of parameters. This ensures the diversity of the solution so that all the solutions of HM can be out of the limits to being local optimum [11].
2.1 Millet Memory The HS considers the decision variable in HM, so it has as many candidates as the harmony memory size (HMS). However, by dividing it into several MMs, more diverse decision variables can be considered. In Fig. 1, the fitness values of MM 1 are close to the global optimum. MM1 is more likely to have information of good solutions, therefore performing a high local search. Operations with Millet 3 is targeted for global search based on its diverse decision variables. Operations with Millet 2 performs as a medium between Millet 1 and Millet 2.
Development of Inter-ethnic Harmony Search Algorithm …
3
Fig. 1 Generating Millet memory
2.2 Jannissary IeHS generates a new solution from each millet to Jannissary. To resolve the ambiguity of the setting of the BW, create a new decision variable using expression (1) Millet ith Millet ith means the jth decision variable at the ith Millet. max x j as follows, x j ith ith and min(x Millet ) indicate maximum and minimum value in x Millet . j j
ith ith ith ith x Millet − min x Millet = x Millet ± max x Millet j j j j
(1)
As the iteration increases, HM is mostly made up of the local optimum. This leads to long-term stagnation as shown in the figure below (Fig. 2).
Fig. 2 Stagnation of HS
4
H. W. Jung et al.
In order to solve this problem, improvement in space search performance is needed. For this purpose, if the fitness value is the same for a certain number of is iterations, the decision variable is generated by expression (2) as follows, x Best j decision variables of best solution. ith ith = x Best ± x Best − x Millet x Millet j j j j
(2)
In Eqs. (1) and (2), the sign (±) is set according to the probability. If the new decision variable is generated by Eq. (2), then it is regenerated by the Self-adaptive Decision Variable Range in Sect. 2.3 below by comparing it with the average of the decision variable of HM.
2.3 Self-adaptive Parameter and Decision Variable Range The proper setting of the parameters according to the problem is linked to the performance of optimization, and the decision variable range means the search space, so the new range of decision variable along the way can improve the performance of the optimization [14]. Geem [16] generated parameters using the following Formulas (3) and (4):
HMCR j =
n y ij = HMCR ∪ y ij = PAR Millet n y ij = PAR
PAR j = n y ij = HMCR ∪ y ij = PAR
(3)
(4)
HMCR and PAR of the jth decision variable, while HMCR j and PARj are each n y ij = HMCR and n y ij = PAR indicates the number of decision variables generated by HMCR and PAR. The decision variables generate in the search space range determined by considering the decision variable within HM. To reflect this, the search space (decision variable range) is set using the expression (5) below. Boundary range(j) is a range of jth decision variables and determines a search space for the new solution. Each M M ) and SD(x H ) means average and standard deviation of x j in HM. Average(x H j j The sign (±) of Eq. (5) is set according to the comparison result that the decision variable generated by an Eq. (2) with the mean of jth decision variable in HM. A new decision variable is generated in a new boundary range considered the mean and standard deviation of HM. ± SD x HM Boundary range( j) = Average x HM j j
(5)
Development of Inter-ethnic Harmony Search Algorithm …
5
3 Application and Results To verify performance of IeHS, results were compared with other improved versions of HS. The algorithm to compare is HS, Improved Harmony Search (IHS) [12], Global Best Harmony Search (GHS) [13], Self-adaptive Harmony Search (SaHS) [14], Local-best Harmony Search (DLHS) [15], Copycat Harmony Search (CcHS) [11]. Benchmark problems in CEC2014 were used to verify the enhanced performance of IeHS. Dimension for all problems is 30, and global optimum has a value of 0. In Table 1, D means the number of decision variables, xi means ith decision variable. Iteration was carried out 300,000. For a fair comparison, the same HM was carried out and all conditions except parameters were set the same. For parameters other than HMS, referred Jun et al. [11]. The HMS was set when the best results of each algorithm were presented. The number of trials was 50 to accurately compare the performance of each algorithm. Table 2 is results. Table 1 Definition of mathematical benchmark problems Function name
Function
High conditioned elliptic function
f 1 (x) =
Bent Cigar function
f 2 (x) = x12 + 106
Discus function
f 3 (x) = 106 x12 +
Rosenbrock’s function
f 4 (x) =
Ackley’s function
f 5 (x) = −20 exp −0.2
D i=1
106
D−1 i=1
f 6 (x) =
D
2 i=2 x i
2 100 xi+1 − xi2 + (xi − 1)2
1 D 2 x i=1 i D
1 D cos(2π xi ) + 20 + e i=1 D
D 20 0.5k cos 2π × 3k (xi + 0.5) i=1
−D f 7 (x) =
xi2
2 i=2 x i
− exp
Griewank’s function
D−1
D
Weierstrass function
i−1
D
k=0
20 k=0
xi2 i=1 4000
[0.5k cos(π 3k ))
−
D
i=1 cos
xi √ i
+1 (continued)
6
H. W. Jung et al.
Table 1 (continued) Function name
Function
Rastrigin function
f 8 (x) = 10D +
Modified Schwefel’s function
D i=1
xi2 − 10 cos(2π xi )
D f 9 (x) = 418.9829 × D − i=1 g(z i ) (z i = xi + 4.209687462275036e + 002)
g(z i ) = ⎧ 0.5 ⎪ ⎪ ⎨ z i sin |z i |
|500 − mod(z i , 500)| − ⎪ ⎪ ⎩ (mod(z , 500) − 500) sin√|mod(z , 500) − 500| − i i 0.25 D D 2 f 10 (x) = x − D + 0.5 x2 i=1 i i=1 i D + xi ÷ D + 0.5 (500 − mod(z i , 500)) sin
Happy cat function
√
|z i | ≤ 500 (z i −500)2 10000D (z i +500)2 10000D
z i > 500 z i < −500
i=1
Katsuura function HGBat function
f 11 (x) =
10 D2
D i=1
1 + i 32 j=1
2j
10 D2
i=1
f 13 (x) = f 7 ( f 4 (x1 , x2 )) + f 7 ( f 4 (x2 , x3 )) + · · · + f 7 ( f 4 (x D−1 , x D )) + f 7 ( f 4 (x D , x1 ))
Expanded Scaffer’s F6 function
−
2 2 0.25 D D f 12 (x) = xi2 − xi i=1 i=1 D D + (0.5 xi2 + xi ) ÷ D + 0.5 i=1
Expanded Griewank’s plus Rosenbrock’s function
j 10 2 xi −round 2 j xi D 1.2
Scaffer’s F6 Function: g(x, y) = 0.5 +
sin2
√
x 2 +y 2 −0.5
(1+0.001(x 2 +y 2 ))2
f 14 (x) = g(x1 , x2 ) + g(x2 , x3 ) + · · · + g(x D−1 , x D )) + g(x D , x1 )
In the applied results, IeHS outperformed other algorithms in the six benchmark functions. At the Griwank’s function and the Rastrigin function, CcHS presented sufficient results in Best values, averagely it performed poorly than IeHS.
Development of Inter-ethnic Harmony Search Algorithm …
7
Table 2 Results for mathematical benchmark problems Problem High conditioned elliptic function
SaHS
CcHS
Mean 6.55.E+00
HS
5.71.E+00 1.00.E+02
7.50.E+03
0.00.E+00 0.00.E+00
Best
6.14.E+00
2.97.E+00 8.03.E+01
1.77.E+03
0.00.E+00 0.00.E+00
Worst 6.96.E+00
8.45.E+00 1.20.E+02
1.20.E+02
0.00.E+00 0.00.E+00
4.10.E−01 2.74.E+00 1.99.E+01
5.73.E+03
0.00.E+00 0.00.E+00
SD Bent cigar function
Discus function
IHS
GHS
IeHS
Mean 8.38.E+01
1.76.E−01 2.47.E+02
5.24.E+04
0.00.E+00 0.00.E+00
Best
8.01.E+01
1.73.E−01 1.30.E+02
4.73.E+04
0.00.E+00 0.00.E+00
Worst 8.75.E+01
1.78.E−01 3.63.E+02
3.63.E+02
0.00.E+00 0.00.E+00
SD
2.31.E−03 1.16.E+02
5.11.E+03
0.00.E+00 0.00.E+00
Mean 1.21.E−04 3.90.E−06 2.08.E+01
3.74.E+03
2.58.E+00
Best
3.72.E+00
1.12.E−04 3.60.E−06 4.24.E−02 3.39.E+03
0.00.E+00
2.92.E−04 0.00.E+00
Worst 1.31.E−04 4.21.E−06 4.15.E+01
4.15.E+01
5.16.E+00
0.00.E+00
SD
9.67.E−06 3.04.E−07 2.07.E+01
3.52.E+02
2.58.E+00
0.00.E+00
Rosenbrock’s Mean 7.96.E−02 8.82.E−02 1.16.E+00 function Best 3.00.E−15 9.04.E−04 1.08.E+00
1.10.E+00
1.10.E−04 0.00.E+00
1.06.E+00
0.00.E+00 0.00.E+00
Worst 2.35.E−01 4.12.E−01 1.29.E+00
1.18.E+00
7.58.E−04 0.00.E+00
SD Ackley’s function
8.10.E−02 1.28.E−01 6.38.E−02 4.11.E−02 2.29.E−04 0.00.E+00
Mean 7.95.E−03 2.13.E+00 0.00.E+00 3.57.E−02 2.29.E−10 0.00.E+00 Best
5.86.E−03 9.99.E−01 0.00.E+00 2.85.E−02 4.12.E−22 0.00.E+00
Worst 1.04.E−02 2.53.E+00 0.00.E+00 3.10.E−02 4.58.E−10 0.00.E+00 SD Weierstrass function
3.48.E−01 6.83.E−02 5.66.E−01 2.49.E−01 3.21.E−04
Best
1.38.E+00
3.34.E−01 3.49.E−02 5.58.E−01 1.98.E−01 2.35.E−04
Worst 1.48.E+00
3.40.E−01 1.02.E−01 4.08.E−04 3.01.E−01 4.08.E−04
SD Griewank’s function
1.59.E−03 5.74.E−01 0.00.E+00 7.18.E−03 2.29.E−10 0.00.E+00
Mean 1.43.E+00
4.83.E−02 2.82.E−03 3.34.E−02 8.05.E−03 5.15.E−02 8.64.E−05
Mean 9.84.E−03 2.74.E−05 1.10.E−08 1.04.E+00
1.72.E−09 2.29.E−10
Best
0.00.E+00 0.00.E+00
2.46.E−06 8.51.E−08 9.49.E−09 1.02.E+00
Worst 4.18.E−02 7.43.E−05 1.31.E−08 4.12.E−01 4.90.E−08 4.58.E−09 SD Rastrigin function
1.62.E−02 2.83.E−05 1.37.E−09 1.54.E−02 1.79.E−09 1.18.E−09
Mean 9.68.E−14 1.27.E−05 6.89.E−05 1.37.E−07 3.32.E−07 2.04.E−14 Best
0.00.E+00
2.71.E−06 6.53.E−05 1.06.E−07 0.00.E+00
0.00.E+00
Worst 1.94.E−13 2.75.E−05 7.25.E−05 1.74.E−07 9.91.E−07 4.09.E−14 SD Modified Schwefel’s function
8.93.E−14 1.07.E−05 2.94.E−06 2.85.E−08 4.65.E−07 2.89.E−14
Mean 2.42.E−05 1.64.E−04 3.02.E−05 1.37.E−01 2.41.E−08 3.49.E−11 Best
1.99.E−05 1.19.E−04 6.25.E−06 1.31.E−01 2.23.E−08 3.13.E−11
Worst 2.85.E−05 2.10.E−04 5.41.E−05 5.41.E−05 2.58.E−08 3.85.E−11 SD
4.28.E−06 4.53.E−05 2.39.E−05 2.48.E−05 1.73.E−09 1.22.E−11 (continued)
8
H. W. Jung et al.
Table 2 (continued) Problem Happy Cat function
HS
IHS
GHS
SaHS
CcHS
IeHS
Mean 9.79.E−05 1.47.E−07 1.22.E−02 4.33.E−02 1.16.E−12 6.67.E−39 Best
8.93.E−05 1.29.E−07 2.47.E−03 4.05.E−02 3.54.E−14 4.21E−83
Worst 1.07.E−04 1.66.E−07 2.19.E−02 2.19.E−02 4.55.E−12 1.33E−38 SD Katsuura function
HGBat function
8.64E−06
1.86E−08
9.70E−03
2.76E−03
1.59.E−12 9.43.E−39
Mean 2.08.E−03 3.89.E−04 0.00.E+00
8.36.E−05 4.72.E−05 0.00.E+00
Best
2.02.E−03 3.76.E−04 0.00.E+00
6.95.E−05 9.87.E−06 0.00.E+00
Worst 2.13.E−03 4.01.E−04 0.00.E+00
8.25.E−05 8.46.E−05 0.00.E+00
SD
1.40.E−05 3.74.E−05 0.00.E+00
5.48.E−05 1.24.E−05 0.00.E+00
Mean 3.17.E−01 3.86.E−01 6.14.E−01 6.18.E−01 3.76.E−01 3.28.E−01 Best
2.52.E−01 3.83.E−01 2.20.E−01 4.36.E−01 3.53.E−01 3.13.E−01
Worst 7.88.E−01 3.89.E−01 1.01.E+00 SD
1.01.E+00
3.98.E−01 3.44.E−01
2.36.E−01 3.14.E−03 3.93.E−01 1.81.E−01 2.24.E−02 1.53.E−02
Expanded Griewank’s plus Rosenbrock’s function
Mean 6.89.E−01 7.77.E−01 1.25.E+00
1.44.E+00
1.03.E+00
Best
1.40.E+00
8.20.E−01 4.30.E−03
1.10.E+00
1.23.E+00
Expanded Scaffer’s F6 function
Mean 6.87.E−01 1.29.E+00 6.77.E−01 1.73.E+00
6.61.E−01 3.66.E−01
Best
4.65.E−01 1.18.E−05
5.83.E−01 6.91.E−01 1.09.E+00
Worst 1.37.E+00 SD
7.27.E−01
7.20.E−02 8.61.E−02 6.84.E−01 4.44.E−02 2.07.E−01 4.07.E−03 3.35.E−01 9.96.E−01 2.91.E−01 1.70.E+00
Worst 1.04.E+00 SD
8.63.E−01 1.84.E+00
6.55.E−01
1.59.E+00 1.35.E+00
1.35.E+00
8.56.E−01 4.40.E−01
3.53.E−01 2.98.E−01 6.77.E−01 7.14.E−02 1.95.E−01 3.17.E−02
Bold values mean the best results
Acknowledgements This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2B5B03069810).
References 1. Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68 2. Talebpour MH, Kaveh A, Kalatjari VR (2014) Optimization of skeletal structures using a hybridized ant colony-harmony search-genetic algorithm. Iran J Sci Technol Trans Civ Eng 38(C1):1 3. De Paola F et al (2017) Optimal solving of the pump scheduling problem by using a Harmony Search optimization algorithm. J Hydroinform 19(6):879–889 4. Nazari-Heris M et al (2019) Harmony search algorithm for energy system applications: an updated review and analysis. J Exp Theor Artif Intell 31(5):723–749 5. Choi YH et al (2017) Self-adaptive multi-objective harmony search for optimal design of water distribution networks. Eng Optim 49(11):1957–1977 6. Jung D, Kang D, Kim JH (2018) Development of a hybrid harmony search for water distribution system design. KSCE J Civ Eng 22(4):1506–1514
Development of Inter-ethnic Harmony Search Algorithm …
9
7. Damodaran SK, Sunil Kumar TK (2017) Economic and emission generation scheduling of thermal power plant incorporating wind energy. TENCON 2017–2017 IEEE Region 10 conference. IEEE 8. Lee KS, Geem ZW (2004) A new structural optimization method based on the harmony search algorithm. Comput Struct 82(9–10):781–798 9. Shehab M, Khader AT, Al-Betar MA (2017) A survey on applications and variants of the cuckoo search algorithm. Appl Soft Comput 61:1041–1059 10. Sadollah A et al (2018) Mine blast harmony search: a new hybrid optimization method for improving exploration and exploitation capabilities. Appl Soft Comput 68:548–564 11. Mahdavi M, Fesanghary M, Damangir E (2007) An improved harmony search algorithm for solving optimization problems. Appl Math Comput 188(2):1567–1579 12. Omran MGH, Mahdavi M (2008) Global-best harmony search. Appl Math Comput 198(2):643– 656 13. Wang C-M, Huang Y-F (2010) Self-adaptive harmony search algorithm for optimization. Expert Syst Appl 37(4):2826–2837 14. Pan Q-K et al (2010) A local-best harmony search algorithm with dynamic subpopulations. Eng Optim 42(2):101–117 15. Jun SH et al (2019) Copycat harmony search: considering poor music player’s followship toward good player. In: Harmony search and nature inspired optimization algorithms. Springer, Singapore, pp 113–118 16. Geem ZW (2011) Parameter estimation of the nonlinear Muskingum model using parametersetting-free harmony search. J Hydrol Eng 16(8):684–688
A Low-Cost Embedded Computer Vision System for the Classification of Recyclable Objects Karl Myers and Emanuele Lindo Secco
Abstract Due to rapid urbanization, increasing population and industrialization, there has been a sharp rise in solid waste pollution across the globe. Here, we present a novel solution to this inefficiency, by the use of embedded computer vision (CV) in the material recovery facilities (MRF). The proposed architecture employs software (i.e., TensorFlow and OpenCV) and hardware (i.e., Raspberry Pi) as an embedded platform in order to classify daily life objects according to their visual aspect. The CV system is trained using modules contained within the TensorFlow API with two datasets, namely the TrashNet and a combination of the TrashNet and a set of Web images. This solution allows greater accuracy, with a baseline performance of 90% which drops to 70% when deployed on the embedded platform, due to the quality of the images taken by an integrated camera for the real-time classification. The speed results are also promising with a baseline speed of 10 FPS at simulation level, which drops to 1. 4fps when running on the platform. Such a system is cheap at less than £100, it is perfectly adequate to be used to identify recyclables in the MRF for sorting. Keywords Computer vision · TensorFlow low-cost prototype
1 Introduction Rapid urbanization, the increasing human population and industrialization have increased environmental pollution across the globe. With the increase in activities such as production and marketing, the use of natural resources has increased. However, with this increase, there has been an inevitable and significant increase in the solid waste products produced. Due to these consumption tendencies, the K. Myers · E. L. Secco (B) Robotics Lab, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK e-mail: [email protected] K. Myers e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_2
11
12
K. Myers and E. L. Secco
increasing levels of such waste products are now and will continue to be detrimental to the environment and ultimately to the humans and animals that live there. Due to this, there has been increasing research into the retrieval of such waste from the environment such as the SeaVax solution, a roaming, satellite-controlled aluminum platform powered by the sun and wind. It operates like a giant vacuum cleaner, shredding and compressing waste products into its hopper. The use of such retrieval methods has significantly increased the removal of waste from the environment by many thousands of tons. However, the issue with these systems is they do not discriminate between what waste they retrieve which poses the question “Do we have the capacity to handle such an increase in waste?” According to the Department for Environment, Food and Rural Affairs (DEFRA), the UK alone generated 222.9 million tons of solid waste in 2017 and of this 47% was recycled. The amount of waste generated is expected to rise by at least 20% by 2020, and the UK aims to increase the recycling rate to 50% in the same year (DEFRA, 2018). The methods of recycling may differ country by country. As an example, in the UK, it is known as single-stream recycling. Single-stream recycling refers to a system in which all kinds of recyclables such as paper, metals, and glass are put into a single receptacle by the consumer. Afterward, the recyclable waste is collected and sent to a material recovery facility (MRF) for sorting and processing. At the MRF, a combination of mechanical and manual methods is employed to sort the materials into their respective classes. In general, the sorting process is as follows: 1. All of the materials are placed on a conveyor and non-recyclables are manually removed by operatives. 2. The remaining waste moves to a triple deck screen. There, all cardboard, containers and paper are removed. 3. The remaining materials pass under a powerful magnet to remove tin and steel cans. 4. A reverse magnet, called eddy current, causes aluminum cans to fly off the conveyor and into a storage container. 5. Finally, the remaining plastics such as bottles are removed, crushed, and bailed. 6. From the collection point of view, costs for the hauling process are reduced versus separate pickups for different recycling streams. The main benefit of single-stream recycling is increased recycling rates. The increased rates are due to individuals or consumers not having to do the sorting themselves and are more likely to participate in the curbside recycling programs. From the collection point of view, costs for the hauling process are reduced versus separate pickups for different recycling streams, or the haulier having to place different materials into various truck compartments. However, the most notable disadvantages of the single-stream recycling system are that it has led to a decrease in the quality of recovered materials and the cost of recycling is inherently higher due to the human resource cost. For these reasons,
A Low-Cost Embedded Computer Vision System …
13
the industry trend is toward state of the art MRFs and a move away from legacy or “dirty” MRFs which are much more labor intensive. Therefore, the answer to the question is no, at present, we do not have the capacity to handle an increase in waste, and moreover, we do not have the capacity to handle the waste we are retrieving now. Hence, efforts must be made to streamline the waste sorting process and also efforts into intelligent retrieval of waste to further ease the pressure on the MRFs must be increased. The aim of this study is to develop a multi-purpose embedded object detection CV system that could be used both in the sorting process at the MRF when integrated with, for example, robot arms and for intelligent retrieval and sorting when integrated into a retrieval method. To do this the system must: • Be able to accurately identify the five classes of recyclables; glass, plastic, metal, paper, and cardboard. • Be fast enough to be used for real-time retrieval. • Be easily integrated with other equipment, with a view to fully automate the sorting process and to facilitate intelligent retrieval of waste to further ease the pressure on the MRF. • Be as cheap as possible, as the cheaper the system, the more likely it would be that the system would be adopted in the future. In order to match such requirements, the following objectives are defined: • Train a computer vision (CV) model that is accurate and fast enough to be used for both sorting and retrieval. • Develop programs for two modes of operation; sorting in the MRF and also intelligent retrieval. • Implement the system on an embedded platform which can be easily integrated into either system.
2 Materials and Methods 2.1 Hardware The Raspberry Pi is a low-cost, single-board computer that can be connected to a monitor, via HDMI, using a standard keyboard and mouse as an interface just as any other PC. It can also be controlled remotely via a PC or laptop using software such as VNC Viewer. It is a very capable device that enables people of all backgrounds to explore computing and to learn how to program in languages such as Python. It has the ability to do everything you would expect a low range desktop computer to do, from Internet browsing and playing HD video, to general computing such as creating spreadsheets and word processing.
14
K. Myers and E. L. Secco
Furthermore, the Raspberry Pi has the ability to interact with the outside world with the availability of a wide range sensors and modules. Due to this, it is has been adopted for use by both researchers and hobbyists for use in such things as IoTs and digital maker projects. For this project, the Raspberry Pi 3 Model B+, as shown in Fig. 1, will is used as the platform on which the system will be deployed. It was chosen firstly due to the success recorded by Phadnis et al. [1] using its predecessor, it is ease of use, and moreover, it is ultra-low-cost. The optical sensors are used to capture the image data, which is processed and used for inference. Two sensors will be used and compared. The first sensor is the PiCam module, as shown in Fig. 2a. The PiCam module is the new official camera board released by the Raspberry Pi foundation. The module contains a medium quality 5 Megapixel Sony IMX219 image sensor designed as a custom add-on board for Raspberry Pi. It features a fixed-focus lens and is capable of 2592 × 1944 pixels resolution on static images and supports 1080p30,
Fig. 1 Embedded PC
Fig. 2 Optical sensors
A Low-Cost Embedded Computer Vision System …
15
720p60, and 640 × 480p90 video. It connects to the Raspberry Pi by way of one of the small sockets on the boards upper surface and uses the dedicated MISI interface, specially designed for interfacing to cameras. The second is a standard mid-range 12MP Web camera, as shown in Fig. 2b. It features a multi-focus lens and is capable of 3280 × 2464 pixel resolution on static images, it supports 1080p and 720p video and connects to the Raspberry Pi via USB. For this project, the two sensors will be used and compared to determine which sensor performs best in terms of image quality and speed for both single capture and inference and real-time capture and inference. They were both chosen for comparison due to the success in use of the PiCam by Phadnis et al. [1] and the success in the use of a Web camera by Guennouni et al. [2].
2.2 Software 2.2.1
TensorFlow
Machine learning (ML) is a complex and convoluted discipline due to the designing and implementation of ML models. However, the designing and implementation of ML models is a far less difficult and daunting task than it used to be, due to the research into and development of ML frameworks such as Google’s TensorFlow that ease the training of models, serving predictions, and the refining of results. TensorFlow is an open-source library for numerical computation and large-scale ML created by the Google Brain team. TensorFlow bundles together a wide range of machine learning and deep learning models and algorithms which can be easily adapted and re-trained for any purpose; from object detection and recognition to optical character recognition (OCR). It uses Python to provide an easy to use front-end API for building applications with the framework, while executing those applications in high-performance C++. It works by allowing developers to create dataflow graphs. Dataflow graphs are structures that describe how data travels through a graph or a series node where each node in the graph represents a mathematical operation, and each connection or edge between nodes is a tensor or in other words a multidimensional data array. TensorFlow provides all of this for the developer by way of an API built using the Python language. For this project, both TensorFlow and TensorFlow API will be installed on the windows machine and will be used for training the model and the development of the single capture and inference and real-time capture and inference programmers. They will also be installed on the Raspberry Pi and will be used to run the said programmers. TensorFlow and TensorFlow API were chosen for backend and front-end development due to the ease of use and the availability of a wide range of pre-built and
16
K. Myers and E. L. Secco
trained models. Moreover, however, it was chosen due to the success in use demonstrated by Swain et al. [3], Phadnis et al. [1], and Guennouni et al. [2] in their research that was explained and evaluated earlier.
2.2.2
OpenCV
OpenCV is an open-source computer vision and machine learning software library containing more than 2500 optimized algorithms. It was primarily developed to provide a common infrastructure for CV applications and to accelerate the use of machine perception and is mainly aimed at real-time computer vision applications. In general, OpenCV is used for the processing of real-time images. It contains functions and algorithms for reading and writing images, the detection of faces and their features, detecting shapes such as circles and rectangles, text recognition in images such as number plates, modifying image quality and colors among many others. In general, it is utilized for the processing of real-time images. For this project, OpenCV will be used for the capturing of frames, processing the images so they can be inferred upon by TensorFlow and also to output the resulting images with the inference results. It was chosen due to the success and ease of use demonstrated in the research by Phadnis et al. [1] described earlier.
2.2.3
VNC Viewer
Virtual network computing (VNC) viewer is a type of software that makes it possible to control another computer over a network connection. Key presses and mouse clicks are transmitted from one computer to another, thus allowing remote control of a desktop, server, or other networked device without being in the same location. VNC works on a server/client model its operating procedure is as follows: A VNC Viewer is installed on the local computer, or client-side, and also the remote computer, or server-side The server-side then transmits a matching display of the server-side display to the viewer and interprets commands such as keystrokes and mouse click and movement coming from the client-side and transmits them and carries them out on the remote computer. VNC Viewer will be used as the main user interface for communication with and working on the Raspberry pi. It was chosen as the interface due to its ease of use and because it removes the need to connect the RPi to peripheral equipment such as monitor, keyboard, and mouse. It will also be useful as after the system is implemented users will need to perform regular maintenance on the system in terms of updates. Furthermore, it would also simplify the task of changing the program that the system is running, for example, changing the program to single image capture and inference to real-time capture and inference.
A Low-Cost Embedded Computer Vision System …
17
2.3 Model The class of model that will be used for this project is class of efficient models called MobileNets. MobileNets are specifically designed CV models for use on hardware that has significant constraints on memory and processing power such as mobile devices and embedded platforms. MobileNets, developed by Howard et al. [4], are based on an efficient architecture that utilizes depth-wise separable convolutions to build streamlined deep neural networks. They introduce two simple hyper-parameters that efficiently trade-off between speed and accuracy. The hyper-parameters allow the model developer to choose the right sized model for their application determined by the constraints of the problem, for example, memory and processing power. In their research, they perform extensive experiments on resource and accuracy trade-offs, and from this, they determine that MobileNet shows strong performance compared to other popular models used on ImageNet classification. They then demonstrate the effectiveness of MobileNets across a range of applications and use cases including object detection, fine-grain classification, and face attributes among others. In their research, they explain the MobileNet structure as being built on depthwise separable convolutions, as mentioned earlier, except for the first layer which is full convolution. Defining the network in such simple terms they are able to explore different network topologies to find a good network. For this study, both MobileNet V1 and MobileNet V2 will be trained using the same dataset and compared using the COCO metrics described earlier. Specific metrics will be loss and mAP. Both models will then be deployed on the embedded platform and will then be tested and compared using qualitative and quantitative techniques to determine which model performs best. Primarily, MobileNet was chosen as the base model due to it being specifically designed for use on embedded platforms and also, because of the success in the research undertaken by Phadnis et al. [1], Guennouni et al. [2], and Boyko et al. [5] as mentioned earlier.
2.4 Datasets and Data Preparation 2.4.1
Datasets
The data used to train the above models is a combination of two datasets; TrashNet and images scraped from Google images. TrashNet dataset contains a total of 2500 images of the five classes of recyclable waste; paper, glass, plastic, metal, and cardboard. The images in the dataset consist of photographs of rubbish taken on a white background. The different exposure and
18
K. Myers and E. L. Secco
Fig. 3 Samples of TrashNet data
Fig. 4 Google image dataset sample
lighting selected for each photograph include the variations in the dataset. The dataset is nearly 3.5 GB in size with each image being resized to 512 × 384. The content of the dataset is as follows: 604 images of paper, 601 images of glass, 410 of metal, 482 of plastic, and 403 of cardboard. A sample of the data is shown in Fig. 3. TrashNet was used as this was the largest repository of data available; however, the dataset does have issues. As explained earlier in the research performed by Phadnis et al. [1] to get the best results in terms of accuracy, the dataset used for training must be as varied as possible such as images of different sizes, containing multiple objects of both objects for classification and objects of no interest. TrashNet, however, contains images of single objects, it is in sequential order, and the images are all of the same sizes. For this reason, a further 1000 images, 200 of each class, were scraped from Google images. A sample of which is shown in Fig. 4. To determine the effectiveness of the TrashNet dataset, MobileNet V1 was first trained with TrasNet, and then MobileNet V1 and V2 are trained using the two datasets combined.
2.4.2
Data Preparation
ANN’s use a method of training called supervised learning. In general terms, supervised learning is where the system is given input and output variables to learn how they are related or mapped together. The ultimate goal is to produce an accurate mapping function so when a new input is given; the algorithm can then predict the output. This training method is an iterative process where each time the algorithm makes a prediction, and it is corrected or given feedback until it reaches an acceptable performance level. The training data used for supervised learning includes a set of examples with input subjects and desired output, also denoted as the supervisory signal. In an application of supervised learning for image processing, for example, an AI system is
A Low-Cost Embedded Computer Vision System …
19
provided with labeled pictures of objects in categories such as car or person. After a sufficient amount of iterations, the system should be able to discriminate between and categorize the unlabeled images, at which time the training is complete. Hence, all the images in the datasets described earlier must be labeled with a description of the object or objects in the image and their location and size in terms of pixels and pixel coordinates. To do this, software call LabelImg was used. LabelImg is a graphical image annotation tool, written in Python and uses Qt for its graphical interface. The procedure for preparing and labeling the image data is as follows: 1. TrashNet and Google image datasets are combined and then randomized using a .bat script. This step is performed due to the research findings of [1] where they determined that non-sequential data would give better results in terms of accuracy due to reducing overfitting of the model to the data. 2. The combined dataset is then opened in labelImg, a box is drawn around each object present in each image and the object is annotated with the name of the class it belongs to. An example of which is shown in Fig. 5. 3. An XML document is then produced for each image containing the annotations, size, and positions of the objects in PASCAL VOC format. 4. The data is then split into training and testing sets in a ratio of 80:20, respectively. 5. The XML documents in the training and testing datasets are then converted to CSV format using a conversion script. 6. Finally, the CSV files are then converted to TFRecord format, using a further script, as this is the format that TensorFlow requires.
Fig. 5 Example of labeling images in LabelImg
20
K. Myers and E. L. Secco
3 Results and Discussion This paragraph summarizes the training procedure, the implementation of the code, and then the results of the testing.
3.1 Training As mentioned previously, the training method employed in this study is transfer learning. Transfer learning is defined by Pan and Yang [6] as leveraging knowledge attained from previously learnt domains and applying it to new domains and tasks. As humans, we have an inherent ability to transfer knowledge across tasks. Hence, what knowledge we acquire while learning about one task, we can utilize this knowledge to solve related tasks. Therefore, the more related the task, the easier it is for us to cross-utilize this knowledge to learn and perform a new task. For example, if we know how to ride a bicycle, it is easier to learn to ride a motorcycle. In this scenario, we do not learn everything from scratch when attempting to learn new aspects or topics. We transfer and leverage our knowledge from what has been learnt previously. Traditionally, however, conventional ML and DL algorithms have been designed to work in isolation. Meaning, these algorithms are trained to solve specific tasks. Therefore, the models have to be built from scratch once the feature-space distribution changes [6]. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing the knowledge acquired for one task to solve related ones. The motivation behind transfer learning is the increasing research into true artificial general intelligence (AGI). Where researchers and data scientists believe transfer learning is essential to further progress toward AGI (Ling et al., 2015). Furthermore, Andrew Ng, a renowned data scientist, was quoted in an article by Malisiewicz [7] saying “after supervised learning, transfer learning will be the next driver of machine learning commercial success.” As previously mentioned, for this project, TensorFlow API, specifically the modules contained in the API, will be used to perform transfer learning on MobileNet, the procedure is as follows: 1. Download the pre-trained model/models and their associated config files. 2. Create a .pbtxt file containing the class labels and the integer values associated with them. As it is shown in Fig. 6. 3. Modify the config files by: (a) Changing the number of classes, (b) Adding paths to the directory where the .pbtxt file is stored, (c) Adding paths to the directory where the config file is stored. 4. Start the training by running TensorFlow APIs model_main module with command-line arguments to which directory the model and config files are stored and where to store the output graph and checkpoint files.
A Low-Cost Embedded Computer Vision System …
21
Fig. 6 .pbtxt file containing class labels and integer IDs
5. Monitor the training via Tensorboard, specifically the loss. 6. When the loss metric stabilizes as close too one as possible stop the training, as shown in Fig. 7. As mentioned earlier, this process is repeated three times; the first MobileNet V1 is trained with the TrashNet dataset only and the remaining two both V1 and V2 will be trained with a combination of TrashNet and Google images datasets. This is to determine, firstly, the effects of data quality with a comparison of MobileNet V1
Fig. 7 Loss metric from Tensorboard
22
K. Myers and E. L. Secco
with both TrashNet and a combination of the two datasets and to compare MobileNet V1 and V2 to determine which performs best in terms of accuracy and speed.
3.2 Implementation 3.2.1
Deployment
In order to deploy the trained models, the top layer or classification, as described above, must first be frozen and the whole inference graph exported for use. This is performed, again, using a module that comes included with TensorFlow API. The module is named export_inference_graph and is run with command-line arguments that first describe the input type, in this case, an image tensor and then paths to the directories in which the saved checkpoints are stored and the config file as described earlier, and the path to the directory where the new graph will be exported. As mentioned earlier, the models will be deployed on the windows machine, used for training, and quantitative tests will be performed. The quantitative tests will firstly be used to determine if the use of TrashNet dataset effects the accuracy of the model and secondly too attain a baseline on which the results of deployment testing on the Raspberry Pi will be compared against.
3.2.2
Program Design
Two programs were designed for this project, the first being single image capture and inference, and the second is real-time capture and inference. Single Image Capture and Inference The single-inference and capture program, as shown in Fig. 8, is designed to be used with both the PiCam and Web camera. When running the program the user can select the Web camera for use with the command-line argument—USB, when the program loads it first imports the necessary libraries and then initializes the frozen inference graph and the box and tensor classes. Next, depending on the users choice, either the Web camera or the PiCam is initialized. The program then initiates a continuous loop, where the frames captured from the camera are displayed on the screen. If the capture key is pressed, a frame is captured, the inference is run on that frame, the results are drawn and the results are displayed. If the exit key is then pressed, the program will exit the loop and remove the display windows. Real-Time Capture and Inference The real-time capture and inference program, as shown in Fig. 9, works in nearly exactly the same way. However, the user interaction is removed, therefore, the frame capture and inference are continuous and the results are drawn and displayed in real time. The program, again, can be exited by pressing the exit key.
A Low-Cost Embedded Computer Vision System …
23
Fig. 8 Single capture and inference program design
3.3 Testing After deployment, the models were tested for both accuracy and speed. As mentioned earlier, a baseline accuracy and speed were attained for comparison with the results from testing on the Raspberry Pi. For deployment testing on the Raspberry Pi, both accuracy and speed were determined using both the PiCam and Web camera, again, for comparison.
3.3.1
Accuracy
To determine a baseline accuracy on the windows machine 50 images, 10 of each class were scraped from Google images as per the procedure explained earlier. They
24
K. Myers and E. L. Secco
Fig. 9 Real-time capture and inference program design
were then run through a dedicated accuracy-test program. The accuracy-test program works in nearly the same way as the single-inference and capture program. However, there is no user interaction, the program utilizes a for loop which iterates through each of the fifty images and outputs the image with inference and confidence score, as shown in Fig. 10, and also writes the inference results and confidence scores to a txt file for processing.
A Low-Cost Embedded Computer Vision System …
25
Fig. 10 Sample of accuracy image outputs
For deployment accuracy, 50 images, again ten of each class, were taken with both the PiCam and Web camera and were run through the same program as the baseline testing and the results were outputted in the same way. Sample results for the PiCam and Web camera are shown in Fig. 10b, c, respectively. To determine the accuracy of the models, the accuracy of each class was calculated using the following expression, where the accepted value is the total number of possible correct results and the error is the total number of incorrect results. Therefore, it holds: % accuracy = (accepted_value − error)/accepted_value × 100% The average accuracy across each class was then calculated using the following expression, where x¯ is the average accuracy, n is the number of terms, and x i is the value of each term in the list of numbers being averaged. x¯ = 1/n
xi
i=1,n
3.3.2
Speed
To determine a baseline speed for the models on the windows machine, an item of one of the classes was placed in the view of the windows machine’s camera and the baseline speed script was run. The script iterates through 500 ticks or 500 iterations was inference, drawing results, and displaying results takes place, and at the end of each tick, the calculated frames per second (FPS) is outputted to a text
26
K. Myers and E. L. Secco
file for processing. For deployment accuracy, the above process was repeated on the Raspberry Pi using both the PiCam and Web camera for comparison. The FPS was calculated using the following expression, where FPS is the calculated speed, n is the number of frames to be considered, 1 in our case, t 1 is the time at the beginning of the iteration, and t 2 is the time at the end of the iteration. FPS = n/(t2 − t1 )
3.4 Results 3.4.1
Baseline Accuracy
The baseline accuracy results for MobileNet V1 with TrashNet only, MobileNet V1 with a combination of both datasets and MobileNet V2, as shown in Fig. 11, show an accuracy of 54%, 92%, and 90%, respectively. The results for V1 with TrashNet and with a combination of both datasets clearly illustrate that the use of TrashNet alone has a significant impact in terms of accuracy, with a drop of nearly 40%. Therefore, the determination that data must be nonsequential and be varied in terms of sizes in the research by Phadnis et al. [1] is correct and due to this MobileNet V1 and V2 were both trained on a combination of the two datasets as to increase accuracy. The comparison of both MobileNet models in terms of accuracy, however, does not show any significant difference in accuracy. Therefore, both models were deployed on the Raspberry Pi to determine if deploying the models on the Raspberry Pi using the PiCam and Web camera have an effect on the accuracy of the models.
Fig. 11 Baseline accuracy results
A Low-Cost Embedded Computer Vision System …
27
Fig. 12 Deployment accuracy results
3.4.2
Deployment Accuracy with PiCam and Web Camera Accuracy
The deployment accuracy results, as shown in Fig. 12, show the accuracy of both V1 and V2 using the PiCam and Web camera. The results of V1 show an accuracy of 70% with the PiCam and 74% using the Web camera. The results for V2 show an accuracy of 68% with the PiCam and 76% with the Web camera. Again, the differences in accuracy between the models are minimal; therefore, the models perform equally well, and either would be suitable for use in the system. The results in comparison with the baseline, however, show a drop in accuracy of around 20% when both V1 and V2 are deployed on the Raspberry Pi. However, this drop in the accuracy is not likely to be due to the Raspberry Pi. It is more likely that the drop in the accuracy is due to the image quality being impacted by a combination of varying light intensities and the focusing of the lens.
3.4.3
Baseline Speed
As above, a baseline speed was taken on both V1 and V2 using the windows machine they were trained on. The baseline results, as shown in Fig. 13, show that both V1
Fig. 13 Baseline speed results
28
K. Myers and E. L. Secco
and V2 both models achieved an average speed of between nine and ten frames per second. The results of the speed test, again, show no significant difference between the models. Therefore, both models were deployed on the Raspberry Pi and tested again to determine if deployment on the Pi adversely affects the speed of either model more than the other and to determine if there is any difference in speeds when using the PiCam compared to using the Web camera. Furthermore, a speed of around ten frames per second would be adequate to be used for the intelligent retrieval of recyclables.
3.4.4
Deployment Speed
The deployments speed results for V1 and V2, as shown in Fig. 14, both show an average speed of around 1.4 frames per second, using both the PiCam and Web camera. Thus, showing again, no difference in speed between the two models, and surprisingly, no difference in speed when using either the PiCam or Web camera. The speed comparison results between the cameras were expected to show a significant difference in terms of speed. The difference was expected due to the difference in the resolution of the cameras. With the PiCam having a resolution of 5MP and the Web camera having a resolution of 12MP, it was expected that the Web camera images would take significantly longer to process. However, the deployment speed of the models on the Raspberry pi shows a significant drop in speed from 10 frames per second to around 1.4 FPS. This, again, is not believed to be due to the models themselves but due to the processing power of the Raspberry Pi. The drop in speed will, however, make the system virtually unusable for the intelligent retrieval of recyclable objects.
Fig. 14 Speed results for MobileNet V1 and V2 (top and bottom panels, respectively)
A Low-Cost Embedded Computer Vision System …
29
Table 1 Cost breakdown and total Hardware
Cost (£)
Software
Cost (£)
Raspberry Pi 3 Model B+
37.85
TensorFlow/API
Free and open source
PiCam module
11.99
VNC Viewer
Free and open source
LabelImg
Free and open source
Total cost
67.81
Web camera
6.99
RPi case and fan
7.99
RPi heat sinks
2.99
3.4.5
Cost
The other consideration of the system was its cost as the lower the cost of the system the higher the likelihood of the system being put to use is significantly higher. The cost breakdown and total is shown in Table 1. With a total cost of the system being £67.81 (≈$85), the system is undoubtedly cheap and would, therefore, would increase the likelihood of the system being adopted for use [8–10]. The accuracy and cost of the system are perfectly adequate to be used for both single capture and real-time capture. However, more investigation into optical sensors must be carried out with a focus on increasing the quality of the captured images to increase the inference accuracy. The speed of the system, however, is an issue. Therefore, the system would not be suitable for the intelligent retrieval of recyclable objects. Therefore, more research must be carried out, and more powerful platforms should be investigated, as to attempt to increase the speed of the system.
4 Conclusion The world in which we live is increasingly becoming more and more polluted by man-made waste. There have been great strides made recently in the retrieval of this waste; however, this will increase the pressure on the already inefficient material recovery facilities. Therefore, more efforts must be made to increase the efficiency and lower the cost of the MRFs. The research into CV, more specifically, embedded CV could be the answer to this and many other related problems. Especially now, due to the advances in technologies and software making, it is more accessible, easier to use, and more robust. The main aims of this project were to build a cheap universal embedded computer vision system that could be used in both the MRF for sorting and in the field for the intelligent retrieval of waste. As a whole, the aims have been achieved, however, the speed of the system would make it virtually impossible to use for intelligent retrieval of waste. Therefore, more research with a focus on the embedded platform must be undertaken.
30
K. Myers and E. L. Secco
Acknowledgements This work was presented in dissertation form in fulfillment of the requirements for the M.Sc. in Robotics Engineering for the student Karl Myers under the supervision of E.L. Secco from the Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University.
References 1. Phadnis R, Mishra J, Bendale S (2018) Objects talk-object detection and pattern tracking. IEEE, Coimbatore, India 2. Guennouni S, Ahaitouf A, Mansouri A (2014) Multiple object detection using OpenCV on an embedded platform. IEEE, Tetouan, Morocco 3. Swain M, Dhariwal S, Kumar G (2018) A python (openCV) based automatic tool for parasitemia calculation in peripheral blood smear. In: International conference on intelligent circuits and systems, pp 445–448 4. Howard AG et al (2017) MobileNets: efficient convolutional neural networks for mobile vision applications, s.l.: arXiv 5. Boyko N, Basystiuk O, Shakhovska N (2018) Performance evaluation and comparison of software for face recognition, based on Dlib and OpenCV Library. IEEE, Lviv 6. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345– 1359 7. Malisiewicz T (2017) Nuts and bolts of building deep learning applications: Ng @ NIPS2016 (Online). Available at: https://www.datasciencecentral.com/profiles/blogs/nuts-and-bolts-ofbuilding-deep-learning-applications-ng-nips2016. Accessed 4 Oct 2019 8. McHugh D, Buckley N, Secco EL (2020) A low cost visual sensor for gesture recognition via AI CNNS. Intelligent systems conference (IntelliSys). Amsterdam, The Netherlands, accepted 9. Secco EL, Abdulrahman R, Felmeri I, Nagar AK, Development of cost-effective endurance test rig with integrated algorithm for safety. In: Soft computing for problem solving. Advances in intelligent systems and computing, vol. 1139(14). Springer, Berlin 10. Secco EL, Moutschen C (2018) A soft anthropomorphic & tactile fingertip for low-cost prosthetic & robotic applications. EAI Trans Pervasive Health Technol 4(14)
An Optimal Feature Selection Approach Using IBBO for Histopathological Image Classification Mukesh Saraswat, Raju Pal, Roop Singh, Himanshu Mittal, Avinash Pandey, and Jagdish Chand Bansal
Abstract The automated methods for the categorization of the histopathological images is very useful in disease diagnosis and prognosis. However, due to complex image background and morphological variations these images generate very large feature vectors which make the automated classification task difficult. Therefore, in this paper, a new feature selection method based on improved biogeography-based optimization algorithm is proposed to select the prominent features. These feature are further used for the classification process. The elimination rate of the proposed method is 74.25% and 71.28% on BreaKHis and BACH histopathological image datasets respectively. A comparative analysis has been performed using different classifiers on the the selected features and simulation analysis depict that the IBBObased feature selection and classification method outperforms. Keywords Feature selection · Improved Biogeography-based optimization · Histology images · Image classification
1 Introduction The automated classification process of histopathological images is used to assign predefined labels to the histopathological images. This process sometimes helpful to the pathologists in disease diagnosis. The pathologists generally analyze the histopathological images on computer monitors which is a very time consuming and costly process and may be biased in nature [35]. Therefore, there is requirement of a method that automatically categorize the histology images. In the literature, the M. Saraswat · R. Pal (B) · H. Mittal · A. Pandey Jaypee Institute of Information Technology, Noida, India e-mail: [email protected] R. Singh Uttarakhand Technical University, Dehradun, Uttarakhand, India J. Chand Bansal South Asian University, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_3
31
32
M. Saraswat et al.
image categorization methods are divided into two types, namely tradition classification methods and deep learning based methods. In recent years, deep learning based image classification methods have shown good results. Recently, Mittal et al. [22] used intelligent gravitational search algorithm to optimize the weights of a neural network that is used to classify histopathological images. Xie et al. [36] also used Inception_ResNet_V2 as the transfer learning for the automated classification of histology images. Further, Sudharshan et al. [34] used the multiple instance learning to train the classifier. Pal et al. [24] modified the bag-of-features method by using AlexNet for feature extraction and improved biogeography-based optimization for codebook construction. However, these deep learning models have some limitations as they need large annotated medical datasets to achieve better accuracy and these are sensitive to background noise [27]. Therefore, these methods are failed when the datasets have limited images. Furthermore, in traditional classification methods, the classification accuracy is dependent on the extracted features from the images. To extract the features, various feature extraction methods are available such as SURF (speeded-up robust features) [2], SIFT (scale-invariant feature transform) [14], and others. But due to the complex background and morphological variations in the histopathological images, the feature extraction methods generate large number of feature vectors which increases the computational cost of classification method [25]. Moreover, the large number of feature maps may include redundant and irrelevant features also which decreases the classification accuracy [10]. Therefore, there is a requirement of feature selection methods to select relevant features for the classification task [3]. There are mainly three types of feature selection methods in the literature, namely filters, wrappers and embedded methods [5]. The filter based methods are of low computational complexity however, they show poor performance on a given classifier [39] while the wrapper [37] and embedded methods have large computational complexity. The feature selection problem can be mapped as a combinatorial optimization problem [8]. A combination of relevant features is to be selected from a pool of n features which may results in 2n such combinations and lead to an exhaustive search for the large values of n. To find an optimal solution in such type of scenario meta-heuristic optimization methods have been used [26, 28]. In the literature, metaheuristic methods used widely to solve combinatorial optimization problems [17, 23]. Some of them are grey wolf optimization (GWO) [19], spider monkey optimization (SMO) [31], whale optimization algorithm (WOA) [18], biogeography-based optimization (BBO) [28] and many others. These optimization approaches have also been utilized successfully in the feature selection problem. Chen and Hsiao [4] used genetic algorithm for feature selection and support vector machine for the classification. Kumar et al. deployed variant of SMO fro plant leaf classification [12] and soil classification [11]. Further, Derrac et al. [6] also used genetic algorithm for feature selection, instance selection and for both. Lane et al. [13] presented a combined statistical clustering and PSO based feature selection method. Furthermore, an adaptive DE based method for feature selection has been introduced in which the parameters were dynamically adapted as per the problems [7]. Mafarja et al. [16] presented a hybrid feature selection approach method using WOA and SA (simulated annealing).
An Optimal Feature Selection Approach Using IBBO …
33
BBO is the widely used meta-heuristic method which has been applied in various engineering and medical applications. There are two main phases of BBO, namely migration and mutation. In the migration phase, the optimal solution is searched in the neighborhood areas which is also known as exploitation while mutation phase is used to explore the search space randomly. However, sometimes BBO may be trapped into local optima due to less diversity in the population and slow movement of migration as it migrates only single feature at a time. To mitigate these problems of BBO, a new variant has been developed as IBBO (improved biogeography-based optimization) [29] in which both operators are modified and improved. In this paper, IBBO is used to find the optimal set of features from the extracted feature vectors in the BOF and these optimal feature vectors are used for the classification. The remaining paper is arranged in five sections. Section 2 provides description of SURF and IBBO methods. The proposed method has been discussed in Sect. 3 followed by experimental analysis in Sect. 4. Finally, the future scope and the conclusion are discussed in Sect. 5.
2 Prelimaries The proposed method uses SURF for feature extraction and IBBO algorithm for selection of relevant features. This section briefly explained the SURF and IBBO algorithm.
2.1 SURF Bay et al. [2] have developed a local feature extractor and descriptor method, namely SURF (Speeded-up Robust Features). This methods is used in object classification, segmentation, recognition and many more applications. It mainly consists of three parts, namely detection of interest points, description of neighborhood, and keypoint matching. For interest point detection, SURF used integral images and compute the determinant of Hessian corners. The integer approximation of this determinant is considered as interest point. To make the feature descriptor from interest point sum of Haar wavelet responses in its neighborhood is computed. Further, the comparing between descriptors computed from different variations of an image is performed and matching pairs are detected. This makes these features invariant to rotation, illumination, scale, and noise.
34
M. Saraswat et al.
2.2 IBBO BBO [32] is an efficient and popular evolutionary algorithm which is employed in several application areas. The concept of BBO involves island biogeography [15]. Each individual among the population is treated as island and island features are represented by suitability index variables (SIVs). There are two important flows in BBO [29]: (i) slow migration rate, and (ii) poor population diversity. To mitigate the aforementioned drawbacks, IBBO [29] is proposed as a new variant of BBO. In IBBO, operation of migration and mutation operator are changed which are explained below. Improved Migration Operator There are two ways to change the migration operator, namely: (i) all the features of immigrating island are modified, (ii) best island is also used as a emigrating islands. The emigration (μ) and immigration (λ) rates of BBO [32] are similar to IBBO. Consider a population of n islands each having d SIVs. The fitness of ith island is computed according to Eq. (1). si = f (x1 , x2 , . . . , xd ) i = 1, 2, . . . , n
(1)
The modified migration operator algorithm is illustrated in Algorithm 1. Algorithm 1 Improved Migration Operator Let Pmodi f y represent the island changing probability for i = 1 to n do if rand < Pmodi f y then continue; end if for j = 1 to d do if rand < λi then sk is selected as emigrating island based on μi using roulette wheel selection si (x j ) ← sk (x j ) else si (x j ) ← sbest (x j ) end if end for end for
Improved Mutation Operator To improve the exploration search of mutation operator, it is enhanced by modifying the islands according to random walk and step size. Let population P contains n islands which is characterized by Eq. (2). Equation (3) show the mutated island at next iteration (t + 1). P = {s1 , s2 , . . . , sn−1 , sn }
(2)
si (t + 1) = si (t) + step ∗ r (t) f or i = 1, 2, . . . , n
(3)
where r ∈ (0, 1) and step is the step size of ith island which is given in Eq. (4).
An Optimal Feature Selection Approach Using IBBO …
35
Fig. 1 The IBBO based features selection method
step = P1n (i) − P2n (i)
(4)
P1 and P2 are the two random populations generated within the search bounds of the problem space. The islands from these populations are used to calculated the step size for the island to be mutated. Based on this step size the islands changed their positions using Eq. (3). This enhanced the exploration capability of the IBBO. Moreover, the procedure of improved mutation operator is also shown in Algorithm 2. Algorithm 2 Improved Mutation Operator for t = 1 to Max I t do for i = 1 to n do P mutation rate m i = m max ∗ (1 − P i ) max if m i > rand then Two permutation ( P1n (i) and P2n (i)) of i th island are randomly generated. step = P1n (i) − P2n (i) si (t + 1) = si (t) + step ∗ r (t) Check the feasility of si (t + 1) end if end for end for
3 Proposed Method In the proposed method, the optimal set of features are selected using IBBO and these feature are used for the classification of the histopathological images. The overall visual description of the method is depicted in Fig. 1. Firstly, SURF is used to extract the features from the histopathological images which result in a larger number of feature vectors due to complexity of these images. Therefore, these extracted features are given to IBBO based feature selection method which select the optimal features for the classification task. The overall step-wise working is also described below.
36
M. Saraswat et al.
1. Initialize the population of n solutions randomly within the bounds zero and one. Each solution is a d dimensional vector where d is the number of extracted features from SURF. Therefore, an i th solution si in the population P can be represented as Eq. (5). (5) si = {si1 , si2 , . . . , si j , . . . , sid }, f or i = 1 . . . n 2. To make the population set, equivalent to combinatorial optimization, the solutions are converted to binary values. For the same, a threshold value (th) is taken as 0.5 empirically. 3. Each value of the solution (si j ) is compared with the defined th value and if it is greater then si j is considered as one otherwise it is considered as zero. Equation (6) represents the same process. 1 xi j > th xi j = 0 xi j ≤ th
(6)
4. Calculate the fitness value ( f i ) for each solution (si ) by considering those feature whose si j value is one. 5. For fitness calculation, SVM with tenfold cross is considered as the objective function and the accuracy returned is assigned as the fitness for the solution. 6. After calculation of the fitness values for each solution, IBBO update the solutions according its migration and mutation operator. 7. Repeat step two to six till termination condition is not satisfied. 8. After termination criteria is met, the si with best fitness value is returned by the algorithm and the selected features in this solution set is further used for the classification.
4 Experimental Results The simulation has been done using opencv library in python on Intel Core i5-6500U CPU, having with 4 GB RAM. For the comparative analysis of the new feature selection method, it is compared with other meta-heuristic based feature selection methods, namely DE [9], GWO [21], WOA [20], BA [38], WOASA [16], GOA [30] based feature selection on two standard histopathological image datasets, namely BreaKHis and BACH (BreAst Cancer Histology) datasets. The selected features from the considered feature selection methods are fed to different classifiers for the comparison of the accuracy. These classifiers are SVM, Gaussian naive baise (GNN), logistic regression (LR), and random forest (RF). The first dataset has been provided by Spanhol et al. [33] at P&D Laboratory, Brazil. It consists of 2480 benign and 5429 malignant histopathological images from
An Optimal Feature Selection Approach Using IBBO …
(a) Benign
37
(b) Malignant
Fig. 2 Represented images of benign and malignant classes from BreaKHis dataset [33]
(a) Normal
(b) Benign
(c) In-situ carcinoma
(d) Invasive carcinoma
Fig. 3 Represented images of Normal, Benign, In-situ carcinoma, and Invasive carcinoma classes from BACH dataset [1]
the tissues of 79 patients. All the images are stained with H & E staining and captured at four magnification levels and stored in PNG format. These images are labeled by experts manually and contain two classes, namely benign and malignant. Figure 2 depicts represented images from both the classes. The BACH histopathological image dataset [1] has 400 images of breast cancer, stained with H & E staining. All the 400 images of equal dimensions (2048–1536) were divided into four classes by the two experts on the basis of level of cancer. These are Normal, Benign, In-situ carcinoma, and invasive carcinoma. The one representative image from each class has been depicted in Fig. 3. From both the dataset, training is done by 70% randomly picked images from each class and rest of the 30% images are used for validation. Table 1 shows the count of selected features by the considered feature selection methods with their reduction rates. On an average 147,077 and 156,096 features are extracted using SURF from BreaKHis dataset images and BACH dataset images respectively. It can be observed from the table that feature elimination rate of IBBO based feature selection method is 59% and 62% for BreaKHis and BACH image datasets respectively, which is higher than all other considered methods. Moreover, the accuracy returned by different classifiers are depicted in the table and it can been that IBBO return better accuracy as compared to other methods in case of each classifier. This validates that the proposed method not only have better feature elimination rate but also select the relevant features. Further, among the different classifiers, the proposed method returned better results on SVM as it returns 71.28% and 74.25% accuracy on BreaKHis and BACH dataset images respectively. The reduction in irrelevant feature will also increase the computation efficiency of the classifiers. To analyze the same, a comparative analysis of the computation time is depicted in
38
M. Saraswat et al.
Table 1 Selected features and classification accuracy returned by the considered methods Method
None
DE
GWO
WOA
BA
WOASA
GOA
BBO
IBBO
72067 (52%)
67655 (55%)
73541 (50%)
73539 (50%)
64713 (57%)
60302 (59%)
BreaKHis images dataset Selected features
147077 (0%)
75009 (49%)
69126 (53%)
SVM
51.48
60.39
65.34
65.04
67.32
64.55
63.36
68.61
71.28
GNN
40.59
46.53
49.5
49.5
50.49
49.3
47.52
50.49
53.46
LR
39.5
49.6
62.27
50.69
65.04
50.39
49.7
65.54
67.32
RF
43.66
50.29
54.15
53.06
56.03
52.67
51.38
57.32
59.40
BACH dataset Selected features
156096 (0%)
76487 (52%)
70244 (55%)
73365 (54%)
68682 (57%)
74930 (52%)
74926 (53%)
63999 (60%)
60877 (62%)
SVM
51.48
62.37
69
68.31
70.29
66.33
65.34
72.47
74.25
GNN
36.33
41.68
49.4
47.62
49.9
46.43
46.04
50.49
51.48
LR
48.71
55.14
60.79
60.09
61.48
57.72
56.73
64.05
65.34
RF
41.68
49.40
54.45
53.06
55.14
51.88
51.58
56.43
58.41
Table 2 The computational time (in seconds) taken by the classifiers before and after feature selection by the proposed and considered methods Method None
DE
BreaKHis images dataset SVM 145440 5000 GNN 133320 5252 LR 125240 5040 RF 147460 5454 BACH dataset SVM 159580 5757 GNN 144430 5838 LR 137360 5808 RF 162610 6060
GWO
WOA
BA
WOASA
GOA
BBO
IBBO
4747 4959 4686 5101
4787 5010 4747 5252
4697 4747 4596 4949
4848 5050 4858 5303
4848 5050 4858 5303
4545 4596 4535 4747
4530 4454 4424 4641
5505 5454 5373 5848
5555 5565 5636 5929
5454 5393 5313 5757
5707 5757 5656 5999
5707 5757 5656 5999
5323 5242 5151 5595
5171 5101 4949 5454
Table 2. As the reduction rate of IBBO is higher, it decrease the time consumed by the each considered classifier on both datasets. Therefore, it can be validated that IBBO based feature selection method not only enhances the classification accuracy but also achieve better computational efficiency.
5 Conclusion This paper presents a new feature selection methods to find relevant and unique features which are used to classify a histopathological image into its respective category. The optimal set of features are selected by IBBO based feature selection method. To
An Optimal Feature Selection Approach Using IBBO …
39
validate the efficacy of the proposed method two histopathological image datasets have been used, namely BreaKHis and BACH image datasets. The simulation have been performed with different classifiers and IBBO returns best results with SVM. In future, the other datasets may be considered to evaluate the proposed approach. Furthermore, hybridization of deep learning model and proposed method may be presented to improve the accuracy.
References 1. Aresta G, Araújo T, Kwok S, Chennamsetty SS, Safwan M, Alex V, Marami B, Prastawa M, Chan M, Donovan M et al (2019) Bach: grand challenge on breast cancer histology images. Med Image Analys 56:122–139 2. Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359 3. Bhattacharyya S, Sengupta A, Chakraborti T, Konar A, Tibarewala D (2014) Automatic feature selection of motor imagery eeg signals using differential evolution and learning automata. Med Biolog Eng Comput 52(2):131–139 4. Chen C, Shi YQ (2008) Jpeg image steganalysis utilizing both intrablock and interblock correlations. In: Proceeding of IEEE international symposium on circuits and systems, pp 3029–3032 5. Deng H, Runger G (2012) Feature selection via regularized trees. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8 6. Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evol Comput 1:3–18 7. Ghosh M, Das D, Mandal S, Chakraborty C, Pala M, Maity AK, Pal SK, Ray AK (2010) Statistical pattern analysis of white blood cell nuclei morphometry. Students’ Technology Symposium (TechSym). IEEE, pp 59–66 8. Gupta R, Pal R (2018) Biogeography-based optimization with léVY-flight exploration for combinatorial optimization. In: 2018 8th International conference on cloud computing, data science & engineering (Confluence), IEEE, https://doi.org/10.1109/confluence.2018.8442942 9. Gupta V, Singh A, Sharma K, Mittal H (2018) A novel differential evolution test case optimisation (detco) technique for branch coverage fault detection. In: Smart computing and informatics, Springer, pp 245–254 10. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422 11. Kumar S, Sharma B, Sharma VK, Poonia RC (2018a) Automated soil prediction using bag-offeatures and chaotic spider monkey optimization algorithm. Evol Intell 1–12. https://doi.org/ 10.1007/s12065-018-0186-9 12. Kumar S, Sharma B, Sharma VK, Sharma H, Bansal JC (2018b) Plant leaf disease identification using exponential spider monkey optimization. Inf, Sustain Comput Syst https://doi.org/10. 1016/j.suscom.2018.10.004 13. Lane MC, Xue B, Liu I, Zhang M (2013) Particle swarm optimisation and statistical clustering for feature selection. In: Australasian joint conference on artificial intelligence, Springer, pp 214–220 14. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 15. MacArthur RH, Wilson EO (2016) The theory of island biogeography. Princeton University Press 16. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312
40
M. Saraswat et al.
17. Mehta K, Pal R (2017) Biogeography based optimization protocol for energy efficient evolutionary algorithm: (BBO: EEEA). In: 2017 International conference on computing and communication technologies for smart nation (IC3TSN), IEEE. https://doi.org/10.1109/ic3tsn.2017. 8284492 18. Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133 19. Mirjalili S, Lewis A (2014) Adaptive gbest-guided gravitational search algorithm. Neu Comput Appl 25:1569–1584 20. Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67 21. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61 22. Mittal H, Saraswat M, Pal R (2020) Histopathological image classification by optimized neural network using igsa. In: International conference on distributed computing and internet technology, Springer, pp 429–436 23. Pal R, Saraswat M (2017) Data clustering using enhanced biogeography-based optimization. In: Proceeding of IEEE international conference on contemporary computing. IEEE, pp 1–6 24. Pal R, Saraswat M (2018a) Enhanced bag of features using alexnet and improved biogeographybased optimization for histopathological image analysis. In: 2018 Eleventh international conference on contemporary computing (IC3), IEEE, pp 1–6 25. Pal R, Saraswat M (2018b) Grey relational analysis based keypoint selection in bag-of-features for histopathological image classification. Recent Patents on Comput Sci 26. Pal R, Saraswat M (2018c) A new bag-of-features method using biogeography-based optimization for categorization of histology images. Int J Inf Syst Manage Sci 1(2): 27. Pal R, Saraswat M (2019) Histopathological image classification using enhanced bag-of-feature with spiral biogeography-based optimization. Applied Intelligence 49(9):3406–3424 28. Pal R, Mittal H, Saraswat M (2019) Optimal fuzzy clustering by improved biogeography-based optimization for leukocytes segmentation. In: 2019 Fifth International Conference on Image Information Processing (ICIIP), IEEE, pp 74–79 29. Saraswat M, Pal R (2018) Improved biogeography-based optimization. Int J Adv Intell Paradigms 10(1):1. https://doi.org/10.1504/ijaip.2018.10022960 30. Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47 31. Sharma H, Hazrati G, Bansal JC (2019) Spider monkey optimization algorithm. In: Evolutionary and swarm intelligence algorithms. Springer, pp 43–59 32. Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713 33. Spanhol FA, Oliveira LS, Petitjean C, Heutte L (2016) A dataset for breast cancer histopathological image classification. IEEE Trans Biomed Eng 63(7):1455–1462. https://doi.org/10. 1109/tbme.2015.2496264 34. Sudharshan P, Petitjean C, Spanhol F, Oliveira LE, Heutte L, Honeine P (2019) Multiple instance learning for histopathological breast cancer image classification. Expert Syst Appl 117:103–111 35. Vishnoi S, Jain AK, Sharma PK (2019) An efficient nuclei segmentation method based on roulette wheel whale optimization and fuzzy clustering. Evol Intell pp 1–12 36. Xie J, Liu R, Luttrell J IV, Zhang C (2019) Deep learning based analysis of histopathological images of breast cancer. Front Genet 10:80 37. Xu J, Luo X, Wang G, Gilmore H, Madabhushi A (2016) A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing 191:214–223 38. Yang XS (2010) A new metaheuristic bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), Springer, pp 65–74 39. Zhao X, Yu Y, Huang Y, Huang K, Tan T (2012) Feature coding via vector difference for image classification. Proceeding of IEEE international conference on image processing. Austria, Vienna, pp 3121–3124
Accuracy Evaluation of Plant Leaf Disease Detection and Classification Using GLCM and Multiclass SVM Classifier K. Rajiv , N. Rajasekhar , K. Prasanna Lakshmi , D. Srinivasa Rao , and P. Sabitha Reddy Abstract Plants are extremely disposed to diseases that mark the growth of the plant which in chance marks the natural balance of the agriculturalist. The yield of crop drops due to contagions instigated by numerous types of illnesses on parts of the houseplant. Leaf illnesses are principally instigated by fungi, bacteria, virus, etc. Verdict of the illness would be completed precisely and suitable activities should be occupied at the suitable period. Image processing techniques are more significant in detection and classification of plant leaf disease, machine learning models, principal component analysis, probabilistic neural networks, fuzzy logic, etc. Proposed work designates in what way to notice plant leaf illnesses. The proposed scheme will deliver a profligate, natural, accurate, and actual reasonable technique in identifying and categorizing plant leaf diseases. The proposal method is intended to support in the identifying and categorizing plant leaf illnesses using multiclass support vector machine (SVM) classification method. First, input image acquisition, second, preprocessing of images, third, segmentation for discovering the affected region from the leaf images by utilizing K-means clustering algorithm and then gray-level co-occurrence matrix (GLCM) features like (color, shape, and texture) are mined for classification. Finally, classification technique is utilized in identifying the category of plant leaf disease. The projected method compared with K-nearest neighbors (KNN) algorithm K. Rajiv · N. Rajasekhar (B) · K. Prasanna Lakshmi Gokaraju Rangaraju Institute of Engineering & Technology, Hyderabad, India e-mail: [email protected] K. Rajiv e-mail: [email protected] K. Prasanna Lakshmi e-mail: [email protected] D. Srinivasa Rao VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, India e-mail: [email protected] P. Sabitha Reddy St. Martin’s Engineering College, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_4
41
42
K. Rajiv et al.
and successfully identifies the disease and also classifies the plant leaf illness with 96.65% accuracy. Keywords SVM · GLCM · K-means · Accuracy · Deep learning · KNN
1 Introduction One of the significant segments of Indian economy is agriculture. Occupation to almost 50% of the nation’s labor force is on condition that by Indian cultivation segment. India is known to be the world’s biggest maker of pulses, rice, wheat, spices, and spice crops. Agriculturalist is financial development that rests on the excellence of the foodstuffs that they harvest, which trusts on the plant’s development and the harvest they get. Consequently, in ground of farming, discovery of disease in plants shows a contributory character. Discovery of plant illness done with around instinctive method is helpful as it decreases a huge effort of observing in big ranches of harvests, and at right point the situation it identifies is the indications of illnesses, i.e., when they look on herb greeneries. Proposed work grants a procedure for image subdivision method which is utilized for instinctive recognition and cataloguing of plant greenery illnesses. It is too emphasis review on various illnesses cataloguing methods that can be applied for vegetable greenery disease recognition. Image subdivision is a significant feature for illness recognition in plant greenery illness, which is completed by applying genetic procedure approach [1]. A novel image recognition scheme grounded on multiple linear regressions is projected. In image subdivision, an enhanced histogram subdivision technique which can compute onset routinely and precisely is projected. In the meantime, the provincial progress technique and factual color image treating are united with this scheme to progress the precision and astuteness. Through building the recognition scheme, multiple linear regressions and image marks mining are exploited. Comparing to existing techniques, the proposed method is evidenced to have effective image appreciation capability, high exactness, and consistency [2]. The purpose of the scheme is to recognize and categorize the illness precisely from the leaf images using image processing techniques. The phases essential in the procedure are pretreating, preparation, and identification. The diseases measured are powdery mildew and downey mildew which can source substantial damage to grape fruit. For empathy of illness structures of greenery such as major axis, minor axis is excavated from greenery and quantified to classifier for cataloguing [3]. Empathy of the plant diseases is the key to averting the damages in the harvest and capacity of the agricultural outcome. The examinations of the plant diseases callous the revisions of visually notice able outlines perceived on the plant. Health observing and illness recognition on plant are actually serious for supportable farming. It is very problematic to observe the plant diseases physically. It entails incredible quantity of effort, proficient in the plant diseases and also entails the undue computing period. Therefore, image treating is utilized for the recognition of plant illnesses.
Accuracy Evaluation of Plant Leaf Disease Detection …
43
Disease recognition contains the phases like image attainment, image preprocessing, image subdivision, feature mining, and cataloguing. The work discussed the various approaches functional for the appreciation of vegetable sicknesses through leaf images and also debated some segmentation and feature mining procedure applied in the plant disease detection [4]. Primary and precise recognition and analysis of plant sicknesses are important issues in plant fabrication and the decreasing of together qualitative and quantifiable fatalities in crop harvest. Optical methods established their prospective in computerized, impartial, and reproducible discovery schemes for the empathy and quantification of plant illnesses at initial period arguments in increase in. Lately, 3D skimming has also been supplementary added as an optical examination that materials added evidence on harvest plant strength. Diverse stages from proximal to distant detecting are obtainable for multiscale checking of single crop structures or complete grounds. Precise and dependable discovery of illnesses is simplified by extremely refined and inventive approaches of facts examination that principal to novel intuitions resultant from device data for compound plant pathogen schemes. Non-destructive, sensor-based approaches funding enlarge upon pictorial and/or molecular methods to plant illness evaluation. The greatest pertinent parts of submission of sensor-based examines are exactness cultivation and plant phenotyping [5]. Fuzzy and neuro-fuzzy-based iterative image fusion is implemented and proved that quality of the image is improved [6]. Fuzzy logic-based image fusion is implemented on satellite and medical images and proved that image content is improved [7]. Satellite image fusion and medical image fusion are performed through iterative image fusion by means of neuro-fuzzy logic and confirmed that proposed fusion process improved the image quality [8]. Disease fungi take their energy from the plants on which they live. They are answerable for a prodigious deal of destruction and are branded by flaccid, scabs, shabby coatings, rusts, blotches, and rotted tissue. Abnormal disorder changes the presence or purpose of a plant. A physiological procedure touches some or all plant purposes. Harm the crop. Decrease the amount and excellence of crop. Upsurge the price of invention. Recognizing a disease properly when it primary looks is a critical step for effective disease organization and usage of image processing and learning methods motivated us to propose a method for leaf disease detection and classification using GLCM and SVM approaches.
2 Related Works Furthermost deep neural learning prototypes for programmed recognition of illnesses in shrubberies agonize from the incurable defect that after verified on autonomous facts, their concert drips expressively. The effort examines a possible resolution to this difficult by means of divided image information to prepare the convolutional neural network (CNN) prototypes. As associated to the F-CNN prototypical qualified through occupied imageries, S-CNN model is skilled through segmented imageries
44
K. Rajiv et al.
more than duos in concert to 98.6% correctness when verified on autonomous information before concealed by the prototypes straight with 10 illness classifications. By utilizing tomato vegetable and target spot illness category as a specimen, it is proved that the assurance of personality cataloguing for S-CNN ideal progresses suggestively compared to F-CNN method [9]. The projected method benefits in detection of plant disease and delivers medications that can be utilized as a defense approach to cure the disease. The database attained from the Internet is correctly separated and the dissimilar plant species are recognized and are retitled to form a suitable databank then acquired test-databank which entails of numerous plant diseases that are utilized for inspecting the accuracy and assurance level of the proposal. Utilized convolutional neural network (CNN) which encompasses of diverse covers is utilized for forecast. With the proposed work and training method, authors have attained accuracy equal to 78% [10]. Deep learning through convolutional neural networks (CNNs) has attained great accomplishment in the cataloguing of numerous plant sicknesses. Conversely, a partial number of revisions has illuminated the method of implication, parting it as an unattainable component. Enlightening the CNN to excerpt the cultured feature as an interpretable practice not only confirms its consistency but also allows the endorsement of the method genuineness and the training databank by human interference. A variety of neuronwise and layerwise imagining approaches was smeared using a CNN and trained with an openly obtainable plant disease image inputs. By understanding the produced consideration plots, identified numerous layers were not subsidizing to implication and detached such layers in the system, lessening the quantity of factors by 75% deprived of touching the classification correctness. The outcomes afford motivation for the CNN gloomy box users in the ground of plant discipline to improved comprehend the conclusion procedure and prime to additional well-organized use of deep learning for plant disease conclusion [11]. Deep learning methods, and specifically convolutional neural networks (CNNs), have directed to substantial development in image handling applications. Later 2016, many solicitations for the automatic detection of crop diseases have been established. These presentations could assist as a foundation for the improvement of proficiency support or instinctive transmission outfits. Such gears could donate to added defensible farming observes and superior food manufacturing safety. To evaluate the prospective of these systems for such solicitations, work conducted 19 examinations that are sure of on CNNs to routinely recognize crop diseases. Foremost execution phases and their presentation accept us to recognize the main concerns and limitations of mechanisms in this investigation region [12]. Plant diseases mark the progress of their particular types; therefore, their initial empathy is appropriate imperative. Several machine learning (ML) prototypes have been engaged for the appreciation and cataloguing of plant illnesses but, after the advancements in a division of ML, that is, deep learning (DL), this area of investigation seems to have excessive prospective in terms of augmented accurateness. Many established/adapted DL designs are executed along with numerous conception methods to identify and classify the indications of plant illnesses. Furthermore, numerous performance parameters are utilized for the assessment of these designs/procedures. This examination delivers an inclusive description
Accuracy Evaluation of Plant Leaf Disease Detection …
45
of DL prototypes utilized to envisage numerous plant sicknesses. In accumulation, some investigation breaks are recognized from which to attain superior limpidity for identifying illnesses in plants, even before their indications perform evidently [13]. Agriculture is a significant foundation of living, and Indian budget is contingent on farming construction. It is significant to distinguish the plant greenery illnesses at primary period to upsurge the yield harvest and income. Image processing method is utilized to perceive the leaf diseases precisely meanwhile bare sense comment of the illnesses does not deliver precise outcome entirely the period particularly throughout the initial phase. SVM-based classification techniques are surveyed thoroughly [14]. Disease cataloguing on plant is actual serious for sustainable cultivation. It is very problematic to observe or delicacy the plant diseases physically. It needs enormous quantity of effort, and also essential the extreme treating time, therefore image treating is utilized for the recognition of plant diseases. Plant disease cataloguing includes the stages like acquire input, preprocessing, subdivision, feature mining, and SVM cataloguing is discussed here [15]. Leaf illness cataloguing by means of progressive SVM procedure is proposed in which the descriptions taken by cameras can be utilized to examined that climate the plants are unhealthy or not. A numerous approaches and procedures like color adaptation, subdivision, Kmean, KNN, etc., are utilized to classify such input descriptions. The investigation is concentrating on the understanding of appearance for primary phase hassle discovery so that the yield can ought to be prohibited from impairment [16]. Assessment of plant disease identification using GLCM and KNN algorithms is discussed in input gathering, preprocessing of images to prepare them for experimentation, and segmentation to streamline and to convert an image into somewhat that is more expressive and tranquil to analyze and GLCM for typical feature extraction followed by KNNbased plant disease identification and classification [17]. Instinctive discovery of plant diseases is important to inevitably notice the indications of illnesses as primary as they perform on the increasing phase. A procedure for the examination and recognition of plant leaf illnesses by means of digital image treating methods is projected. The investigational outcomes determine that the projected scheme can efficaciously identify and categorize foremost herbal greeneries illnesses [18]. Fuzzy and neurofuzzy medical image fusion is executed and improves classification accuracy after fusion than regular classification [19]. A novel method exponential spider monkey optimization which is engaged to fix the important features from great dimensional set of features produced by SPAM and the selected features are nursed to support vector machine for classification of plants into unhealthy plants and well plants by means of some significant features of the leaves. The investigational consequences demonstrate that the designated features by exponential SMO efficiently proliferates the classification consistency of the classifier in contrast to the measured feature assortment methodologies [20]. This author also deployed SMO for soil classification [21]. In plant disease identification and classification using Image Processing developed a segmentation method for automatic detection and classification of plant leaf diseases and compared various methods for plant disease identification and classification, i.e., K-means, C 4.5, Naïve Bayes, multilayer perceptron (MLP), and SVM
46
K. Rajiv et al.
in which data is divided such that 80% for training and 20% for testing where overall accuracy varies between 85.3 and 95.87 [22]. This paper shields review on various diseases classification methods that can be utilized for plant leaf disease detection. K-nearest neighboring algorithm for disease classification calculates the minimum distance between the points. Relaxed to instrument and pretty good in outcome but it is unhurried in learning, not strong to the noise data in great preparation instances. Fuzzy Logic-based disease classification uses membership function to translate original data value to membership grade. It has high speed in computing and is desirable in limited precision, and it gives poor performance in high-dimensional context. Artificial neural network (ANN) is a multilayer perception is elementary practice of ANN that modernizes the weight through basic propagation. Respectable prospective with capability to detect plant leaf disease but it requires more time. Above-mentioned methods give overall accuracy from 71 to 89%.
3 Proposed Methodology Steps in Projected procedure are as follows. 1. Training phase—In preparation stage, all the gathered images are educated to the prototype and all structures/characteristics are mined and deposited in the databank for future usage. 2. Disease Classification—Once training phase is completed, the multiclass SVM will categorize the specified different participation image as which kind of illness is exaggerated. The proposed scheme mostly comprises following five steps 1. 2. 3. 4. 5.
Dataset gathering Image preprocessing Image subdivision Feature mining Training and classification (Fig. 1).
3.1 Dataset Collection Dataset for the experimentation is taken from (https://www.kaggle.com/vipoooool/ new-plant-diseases-dataset). This dataset is reconstructed by means of offline intensification from the unique dataset. The unique dataset can be originated on this github repo. This dataset comprises of about 87 K rgb images of healthy and unhealthy crop leaves which are characterized into 38 various classes. The entire dataset is separated into 80/20 ratio of training and validation set conserving the directory edifice. A no. of images per each type sampled for experimentation are mentioned in Table 1. The
Accuracy Evaluation of Plant Leaf Disease Detection …
47 Image Databank
Image Databank
Image Attainment
Image Attainment
Image Pre-Treating Image Pre-Treating Image Subdivision (K-Means)
Image Subdivision (K-Means)
Feature Extraction (GLCM)
Feature Extraction (GLCM)
Disease Detection
Disease Classification
Images Features Database
Fig. 1 Workflow of the proposed methodology Table 1 Disease detection testing results from SVM S. No.
Image used
No. of images
No of iterations
KNN (K = 3, 5, 7, 9)
Accuracy from proposed SVM
Kernel used
1
4(a)
156
300
89.34
95.77
Linear
300
88.2
94.51
RBF
300
89.13
96.65
Polynomial
2
3
4
4(b)
4(c)
4(d)
174
124
166
300
87.42
95.23
Sigmoid
300
85.43
94.68
Linear
300
84.75
93.67
RBF
300
86.53
95.42
Polynomial
300
84.76
94.51
Sigmoid
300
85.42
93.62
Linear
300
82.71
92.44
RBF
300
83.45
95.42
Polynomial
300
82.64
94.83
Sigmoid
300
85.75
94.82
Linear
300
84.62
93.65
RBF
300
82.1
95.83
Polynomial
300
81.63
93.66
Sigmoid
48
K. Rajiv et al.
leaf images that are diseased by fungal, bacterial, viral, and healthy leaf are also included.
3.2 Image Preprocessing It processes purpose to enlighten the image and organizes it for succeeding procedures by eliminating noise and undesirable substances and refines the visualization; it also provides a confident outcome on both the process of segmentation and features extraction and consequently has an impression on the ending outcomes of the process and accuracy. The practice of image processing starts with the attainment of the image from the source environment through the digital camera and deposited on the hard disk of the computer and then downloaded to the system for the rest of the processes. Image pretreating is important for open data that are regularly strident and potholed. The conversion is utilized to change the image into alternative image to advance the excellence that improved ensembles for examination. This phase signifies a critical stage in image processing presentations since the usefulness of succeeding jobs (e.g., features extraction, segmentation, classification) be contingent extremely on images excellence. Similarly, it meaningfully progresses the efficiency of data mining and image analysis procedures. Possessions like borders and boundaries are improved and observed in black and white descriptions; numerical possessions associated to strengths are detected in grayscale image, and the evidence connected to color is perceived glowing in primary colors and additional color setups of the input images. In the proposed model, the appearance determination is adapted to 256 × 256 and utilized Otsu’s technique which adapts the intensity image to binary image, changing the RGB image setup to a grayscale format.
3.3 Image Segmentation Image segmentation is a significant phase in image treating, and it looks universally if we want to examine what is the confidential image. Necessity of image segmentation to distinct substances and analyze each item independently to plaid anything it is. Image subdivision typically helps as the preprocessing earlier form identification, feature mining, and density of the image. Image subdivision is the grouping of an image into various clusters. Numerous classes of investigation need remained completed in the part of image subdivision by means of grouping. Image dissection is the procedure of dividing a digital image into numerous different sections covering each pixel with similar characteristics. In this projected method, K-means grouping procedure is utilized in subdividing the agreed image into three groups as a constellation that comprises the unhealthy portion of the leaf. Then we need to reflect all of the colors for subdivision, concentrations are retained, and sole color evidence is occupied into deliberation.
Accuracy Evaluation of Plant Leaf Disease Detection …
49
1. The given dataset must be separated into K amount of groups and data topics essential to be allocated to apiece of these groups arbitrarily. 2. For every data fact, the detachment from data fact to individual group is calculated by means of Euclidean distance. 3. The data fact which is closer to the group to which it goes to must be absent as it is. 4. The data fact which is not near to the group to which it goes to should be then moved to the neighboring group. 5. Reprise all the overhead phases for whole data facts in the dataset. 6. Once the groups are persistent, grouping procedure desires to be immobile.
3.4 Feature Extraction and Feature Selection GLCM is texture character outline and this profile references to trace, i.e., smooth, silky, rough, etc. The order of appeal profile statics are: First-order texture measures are statistics professed from the unique image standards, like variance, and pixel adjacent association is not executed. A second-order process describes the association between clusters of two (typically adjacent) pixels in the source image. Third and advanced order touches (observing the associations among three or additional pixels) are hypothetically conceivable but essentially or usually not applied due to computing time and understanding exertion. As part of meeting objectives of proposed method since the inputs, the structures are to be mined. To do so in its place of selecting the complete set of pixels need to select which are essential and adequate to designate the completion of section. The divided image is primarily nominated by physical intrusion. The pretentious part of the appearance can be originated from computing the zone linking the constituents. Primarily, the linked constituents with locality pixels are originated. Well along the rudimentary are possessions of the participating binary image that are originated. The attention now is only with the needed part. The exaggerated part is originated out. The percent part enclosed in this section speaks around the eminence of the outcome. Using GLCM inputs are characteristically implicit as color, texture, and shape arrangements. Color is typically defined as instants and color intensity computations. Properties like variance, symmetry, difference, and entropy can be involved to texture. Correspondingly, for shape, curviness, area, eccentricity, and concavity features are recognized. GLCM gives amount of the difference in intensity at the pixel of attention. GLCM texture cogitates the association between two pixels at a time, called the reference and the neighbor pixel.
50
K. Rajiv et al.
3.5 GLCM Algorithm for Feature Extraction Features considered for the experimentation are contrast, correlation, energy, homogeneity, mean, standard deviation, entropy, RMS, variance, smoothness, kurtosis and skewness, as we have feature extractions like shape, color, and texture features mainly. • • • • •
Calculate all the number of pixels in the image matrix Save the calculated pixels in image matrix Check similarity between pixels in the matrix by using histogram procedure. Calculate dissimilarity factor from the matrix The elements to be normalized by dividing the pixels.
3.6 Training and Classification Support vector machine is a linear prototype for labeling complications. It can resolve linear and nonlinear complications and effort fine for numerous real-world complications. The impression of SVM is modest. The procedure produces a stripe or a hyperplane which splits the facts into classes. LIBSVM is a library for SVMs. A typical use of LIBSVM utilized here involves two steps: first, training a dataset to obtain a model and second, using the model to predict information of a testing data set. Conferring to the SVM procedure, we catch the data points adjoining to the stroke from both the classes. These data facts are named support vectors. Now, we calculate the space among the line and the support vectors. This space is named the border. Our objective is to capitalize on the border. The hyperplane for which the border is extreme is the ideal hyperplane (Fig. 2). Two dissimilar collections for preparation and examination are produced. The stages for preparation and examination are identical, conversely, tailed by the examination is accomplished.
3.7 Steps in Training and Testing Phases in Proposed System 1. 2. 3. 4.
Begin with input images of which groups/classes are well-known for assured. Discover the characteristics or feature set for every individual of input and at that time tag appropriate. Consider the subsequent image as input and catch features of this one as different input. Instrument the binary SVM to multiclass SVM technique.
Accuracy Evaluation of Plant Leaf Disease Detection …
51
Fig. 2 Linear SVM model
5.
Prepare SVM by utilizing kernel function of optimal. The productivity will comprise the SVM arrangement and evidence of support vectors, bias value, etc. 6. Catch the label of the input image. 7. Be contingent on the result classes, the label to the following image is specified. Augment the features set to the image database. 8. Phases 3–7 are repetitive for all the pictures that are to be utilized as databank. 9. Testing process comprises of footsteps 3–6 of the preparation process. The result classes are the class of the participation image. 10. To compute the correctness of the scheme or the SVM, in this instance, arbitrary customary of inputs is selected for preparation and examination from the image databank. Multiclass classification is a prevalent tricky in supervised machine learning. Given an image dataset of m training examples, each of which comprises evidence in the form of several features and a label. Each label matches to a class, to which the training example fits to. In multiclass classification, we have a determinate set of classes. Each training example also has characteristics/features. In a multiclass classification, we train a classifier by utilizing our training data and practice this classifier for categorizing different instances.
52
K. Rajiv et al.
Rust
Gray mold Downey mildew
Root rot a) Fungal Diseases
b) Bacterial Diseases
Canker
Spots
Wilt light
C) Viral Diseases
Fig. 3 Leaf images: a fungal diseases, b bacterial diseases, and c viral diseases
Fig. 4 Test images
4 Results and Discussions Input images are taken from various diseases, i.e., fungi, bacterial, and virus. MATLAB is utilized to implement the proposed algorithm (Figs. 3, 4 and 5). From Table 1 for 4(a) image after 300 iterations Accuracy of linear kernel is 95.77% and for same 300 iterations with radial basis function (RBF) the accuracy is 94.51% and with polynomial kernel with 300 iterations accuracy is: 96.65 and with sigmoid the accuracy for 300 iterations is 95.23% and for other test images also results are given through Table 1.
5 Conclusions and Future Work In machine learning, support vector machine is supervised learning and any of the machines knowledge gaining method, SVM is applied in this work for cataloging of leaf illnesses. The exactness for changed kernel highlights upon the extent of points in interplanetary. The accuracy outcomes in an obtainable range from 92.44 to 96.65% which is more than accuracy results obtained KNN algorithm used for the disease detection. Accuracy can be improved by using autonomous learning, deep learning techniques, hybrid approaches by combining two or more popular methods, and by increasing the databank size so that size of training set and testing set will be
Accuracy Evaluation of Plant Leaf Disease Detection …
53
Cluster 1
Cluster 1
Cluster 1
Cluster 1
Cluster 2
Cluster 2
Cluster 2
Cluster 2
Cluster 3
Cluster 3
Cluster 3
Cluster 3
Fig. 5 K-means clustering images of test images
increased and acquiring source images with high clarity. Also, dataset used can be taken from real environment instead of prepared leaf images and by means of CNN capabilities plant leaf disease detection and classification can be improved in real environment. The regular SVM is a non-probabilistic binary linear classifier SVM and can be applied for a multiple class categorization. LIBSVM helped us to implement SVM in easy manner with different kernels. Currently the method is semi-automatic and occurrence of intricate backgrounds that cannot be easily separated from the region of interest, borders of the indications regularly are not well defined, and unrestrained seizure circumstances may exist features that mark the image examination more challenging. With the appropriate database, this technique can be practical to additional diseases.
54
K. Rajiv et al.
References 1. Vijai S, Misr AK (2017) Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf Process Agric 41–49 2. Guiling S, Xinglong J, Tianyu G (2018) Plant diseases recognition based on image processing technology. J Electr Comput Eng 3. Sushil RK (2016) Detection of plant leaf disease using image processing approach. Int J Sci Res Publ 6(2) 4. Khirade SD, Patil AB (2015) Plant disease detection using image processing. In: 2015 international conference on computing communication control and automation, Pune, pp 768–771 5. Anne-Katrin M (2016) Plant disease detection by imaging sensors—parallels and specific demands for precision agriculture and plant phenotyping. The American Phytopathological Society, pp 241–251 6. Srinivasa Rao D, Seetha M, Krishna Prasad MHM (2014) Quality assessment parameters for iterative image fusion using fuzzy and neuro fuzzy logic and applications. Procedia Technol 19:888–894 7. Srinivasa Rao D, Seetha M, Krishna Prasad MHM (2011) Quality evaluation measures of pixel-level image fusion using fuzzy logic, SEMCCO 2011, Part I, LNCS 7076, pp 485–493 8. Srinivasa Rao D, Seetha M, Hazarath M (2012) Iterative image fusion using neuro fuzzy logic and applications. In: International conference on machine vision and image processing, MVIP, pp 121–124 9. Parul S, Yash Paul Singh B, Wiqas G (in press) Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Inf Process Agric 10. Adnan M, Ali K, Drushti G, Tejal C (2019) Plant disease detection using CNN & remedy. Int J Adv Res Electr Electron Instrum Eng 8(3) 11. Yosuke T, Fumio O (2019) How convolutional neural networks diagnose plant disease. Plant Phenomics Sci Partner J 1–19 12. Justine B, Samuel F, Jérôme T, Pierre-Luc S-C (2019) Convolutional neural networks for the automatic identification of plant diseases. Front Plant Sci 10:1–15 13. Akila M, Deepan P (2018) Detection and classification of plant leaf diseases by using deep learning algorithm. Int J Eng Res Technol (IJERT) 6(7):1–5 14. Harshitha P, Shedthi BS (2018) A survey on plant disease detection using support vector machine. In: International conference on control, power, communication and computing technologies (ICCPCCT) 15. Elangovan K, Nalini S (2017) Plant disease classification using image segmentation and SVM techniques. Int J Comput Intell Res 13(7):1821–1828 16. Rima H, Siburian S, Rahmi K, Phong T, Nguyen EL, Lydia KS (2019) Leaf disease classification using advanced SVM algorithm. Int J Eng Adv Technol 8(6S) 17. Ramesh Babu C, Dammavalam SR, Sravan Kiran V, Rajasekhar N (2020) Assessment of plant disease identification using GLCM and KNN algorithms. Int J Recent Technol Eng 8(5) 18. Yin Min O, Nay Chi H (2018) Plant leaf disease detection and classification using image processing. Int J Res Eng 5(9):516–523 19. Srinivasa Reddy K, Ramesh Babu CH, Srinivasa Rao D, Gopi G (2018) Performance assessment of fuzzy and neuro fuzzy based iterative image fusion of medical images. J Theor Appl Inf Technol 96(10):3061–3074 20. Sandeep K, Basudev S, Vivek KS, Harish S, Jagdish Chand B (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput: Inform Syst. https:// doi.org/10.1016/j.suscom.2018.10.004 21. Kumar S, Sharma B, Sharma VK, Poonia RC (2018) Automated soil prediction using bagof-features and chaotic spider monkey optimization algorithm. Evol Intel. https://doi.org/10. 1007/s12065-018-0186-9 22. Vamsidhar E, Jhansi Rani P, Rajesh Babu K (2019) Plant disease identification and classification using image processing. Int J Eng Adv Technol (IJEAT) 8(3S)
A Deep Learning Technique for Automatic Teeth Recognition in Dental Panoramic X-Ray Images Using Modified Palmer Notation System Fahad Parvez Mahdi
and Syoji Kobashi
Abstract Dental healthcare providers need to examine a large number of panoramic X-ray images every day. It is quite time consuming, tedious, and error-prone job. The examination quality is also directly related to the experience and the personal factors, i.e., stress, fatigue, etc., of the dental care providers. To assist them handling this problem, a residual network-based deep learning technique, i.e., faster R-CNN technique, is proposed in this study. Two kinds of residual networks, i.e., ResNet-50 and ResNet-101, are used as the base network of faster R-CNN separately. A modified version of Palmer notation (PN) system is proposed in this research for numbering the teeth. The modified Palmer notation (MPN) system does not use any notation like PN system. The MPN system is proposed for mainly three reasons: (i) teeth are divided into total eight categories, and to keep this similarity, a new numbering system is proposed that has the same number of category, (ii) 8-category MPN system is less complex to implement than 32-category universal tooth numbering (UTN) system, and with some preprocessing steps, MPN system can be converted into 32-category UTN system, and finally (iii) for the convenience of the dentist, i.e., it is more feasible to utilize 8-category MPN system than 32-category UTN system. Total 900 dental X-ray images were used as training data, while 100 images were used as test data. The method achieved 0.963 and 0.965 mean average precision (mAP) for ResNet-50 and ResNet-101, respectively. The obtained results demonstrate the effectiveness of the proposed method and satisfy the condition of clinical implementation. Therefore, the method can be considered as a useful and reliable tool to assist the dental care providers in dentistry. Keywords Tooth recognition · Deep learning · Faster R-CNN · Residual network · Modified palmer notation system · Panoramic X-ray F. P. Mahdi (B) · S. Kobashi Graduate School of Engineering, University of Hyogo, 2167 Shosha, Himeji, Hyogo 671-2280, Japan e-mail: [email protected] S. Kobashi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_5
55
56
F. P. Mahdi and S. Kobashi
1 Introduction Examining the dental panoramic X-ray image is the most common practice in dentistry to diagnose dental diseases and anomalies as visual examination is often not clear enough due to the location of teeth in mineralized tissues. Besides that, the radiographic images provide detailed and clear visualization of the whole dentition, i.e., teeth in both jaws and surroundings in a single image with minimum amount of radiation exposure. Everyday dental care providers have to examine a large number of dental panoramic X-ray images, which put a huge burden on them and therefore limit the number of patients they can serve each day. Furthermore, the quality of examination of dental panoramic X-ray images depends on various factors like stress, fatigue, and experience of the dental care providers. Thus, it is highly desirable to use computer-aided dental examination to reduce workload as well as human-made error. Automatic recognition of teeth from dental panoramic X-ray images can be a great way to counter such problem. It will not only reduce the workload but also help to minimize human-made error. Conventional algorithms often included segmentation and feature extraction steps before they could utilize classification methods for teeth recognition. Various image processing techniques were utilized for teeth segmentation that included thresholding [1], morphological operations [2], level-set method [3], and active contour [4], while Miki et al. [5] considered manual approach for tooth segmentation. For feature extraction, textures [3], contours [1], and wavelet Fourier descriptor [2] were considered. Finally, classification was done by using Bayesian techniques [6], sequence alignment algorithm [1], support vector machine [7], and feed-forward neural network [2]. However, those methods had a number of disadvantages including the need of preprocessing steps before segmentation and feature extraction, and absence of automatic feature extraction process. As a result, they constituted huge workload. Furthermore, expert hands were needed to collect the features effectively as recognition performance largely depended on the quality of feature extraction. To overcome those difficulties, deep learning-based algorithms were proposed. The unique feature of deep learning-based technique is that it does not require any preprocessing or manual extraction of feature. They can be fed with raw input and the features are collected automatically [8]. Several approaches based on deep learning techniques were proposed for automatic recognition of tooth as reported in the literature [9, 10]. However, in [9], periapical dental X-ray images were utilized for teeth recognition, while in [10], deep learning technique was materialized only for teeth detection. A heuristic method was then used to classify the teeth. In this paper, a new tooth numbering system is proposed based on Palmer notation (PN) system, where teeth are divided into total eight categories unlike 32-category universal tooth numbering system (UTN). The authors define it as modified Palmer notation (MPN) tooth numbering system. And, the study proposes residual networkbased faster R-CNN technique for automatic teeth recognition in MPN system. Faster R-CNN [11] is the modified and upgraded version of fast R-CNN [12], where a region
A Deep Learning Technique for Automatic Teeth Recognition …
57
proposal network (RPN) replaces the region proposal algorithm to avoid bottlenecking during region proposals. Two versions of residual network, i.e., ResNet-50 and ResNet-101 networks, are used as a base network of faster R-CNN separately. Residual network [13] is known for its ability to diminish vanishing gradient problem. Thus, deeper networks perform better than their shallower counterparts at the cost of additional computational expense. Rest of the paper is organized as follows. Section 2 describes about the data, i.e., dental panoramic X-ray images that were used to conduct the experiment and the datasets, i.e., training dataset and test dataset. Section 3 presents the methodology of this research and Sect. 4 provides the obtained experimental results with brief analysis. Finally, the paper is concluded with a conclusion section explaining the main gist of the paper with a short future research direction.
2 Datasets This study considered total 1000 dental panoramic X-ray images with the dimensions of (1400–3100) × (800–1536) pixels and stored as a jpeg format. The images were collected anonymously from multiple clinics with permission and following the ethical guidelines. Figure 1a shows an example of collected dental panoramic X-ray image. In order to process through training and testing procedure, all the X-ray images were labeled manually by putting a rectangular bounding box around each tooth. The research however excluded severely broken teeth from this experiment. Figure 1b shows an example of dental panoramic X-ray image with bounding box around each tooth. Annotated X-ray images are divided into two datasets: (i) training dataset and (ii) test dataset. Total 900 images are selected as training dataset to train faster R-CNN, while the remaining 100 images are used as test dataset to validate the performance of faster R-CNN for tooth recognition.
Fig. 1 a An example of a collected dental panoramic X-ray image and b rectangular bounding box around each tooth following MPN system
58
F. P. Mahdi and S. Kobashi
3 Proposed Method A new tooth numbering system, i.e., MPN system, is proposed and used in this research. Figure 2 shows schematic diagram of the proposed MPN system. Unlike PN system, notation is not used in MPN system. The new numbering system is proposed mainly for three reasons: (i) teeth are divided into total eight categories (central incisor, lateral incisor, cuspid/canine, first cuspid/premolar, second cuspid/premolar, first molar, second molar, and third molar) and thus, by definition they can be identified by these eight different categories, (ii) 8-category MPN system is less complex than 32-category UTN system and using some postprocessing steps 32-category system can be derived from 8-category system, and (iii) for the convenience of the dentist, i.e., it is convenient and feasible for the dentist to roam around eight distinguished categories rather than 32-category system during diagnosis of dental problems. A recently developed two-stage image-based CNN detector, i.e., faster R-CNN [11] technique, is utilized for teeth recognition in dental panoramic X-ray images. In two-stage CNN detectors, feature maps are at first used to generate region proposals from the anchor boxes. A set of predefined bounding boxes with different aspect ratio is known as anchor boxes. Faster R-CNN has its own CNN-based RPN to generate region proposals unlike its predecessors, i.e., R-CNN [14] and fast R-CNN [12]. By predicting whether an anchor box contains object or not, RPN generates region proposals. The generated region proposals with high confidence scores are forwarded into the next stage for further classification and regression. This two-stage filtering of the region proposals produces higher recognition accuracy at the cost of additional computational expense [15].
Fig. 2 Proposed modified Palmer notation (MPN) system
A Deep Learning Technique for Automatic Teeth Recognition …
59
Fig. 3 Illustration of proposed network
Transfer learning technique is employed in this research to improve recognition performance. Two different versions of residual network, i.e., ResNet-50 and ResNet101, are used as the base network of faster R-CNN, separately. Residual network was originally proposed by He et al. [13] to overcome the difficulties of training deeper networks to increase accuracy. A deep residual learning framework is implemented to address the degradation problem. The idea of residual learning successfully overcomes the degradation problem and accuracy gain is made possible by increasing the depth of the network. ResNet-50 consists of four stages with total 50 layers, whereas ResNet-101 is the deeper version of ResNet-50, consisting of additional 17 blocks (three-layer block) in the third stage with total 101 layers. Figure 3 illustrates the proposed method used in this study. Average precision (AP) is calculated for each tooth to evaluate the recognition performance of the proposed model. At first, the detected boxes are compared with the ground truth boxes by calculating the intersection over union (IOU) as defined below IOU =
AreaDetectedBox ∩ AreaGroundTruthBox AreaDetectedBox ∪ AreaGroundTruthBox
(1)
The IOU threshold value is set as 0.5, i.e., if the IOU is greater or equal to 0.5 then the detected box is considered as true positive. To calculate the evaluation index, i.e., AP, precision and recall are calculated using the equations as follows Precision = Recall =
TP TP + FP
TP TP + FN
(2) (3)
where TP describes the true positive and is the number of detected boxes with IOU ≥ 0.5; FP describes the false positive and is the number of detected boxes with IOU
60
F. P. Mahdi and S. Kobashi
< 0.5; and FN describes the false negative and is the number of teeth that are not detected or detected with IOU < 0.5. AP can be defined as the arithmetic mean of precision at equally spaced fixed recall levels, where recall level, r = [0.0, 0.1, 0.2, …, 1.0]. Calculation of AP usually involves interpolation of precision in such a way that at a certain recall level (r), interpolated precision (Pint ) is defined as the highest precision found for any recall level r ≥ r [16]. P r Pint (r ) = max r ≥r
(4)
AP for each tooth, thus, can be represented by the equation below AP =
1 Pint (r ) 11 r ∈{0.0,0.1,...,1}
(5)
Mean average precision (mAP) is the arithmetic mean of average precision over all teeth. This evaluating index is calculated to evaluate overall recognition performance of the model. If the total number of category is N, then mAP can be formulated as below mAP =
1 AP N
(6)
4 Results and Discussions This section presents the simulation results obtained by applying the residual network-based faster R-CNN technique on dental panoramic X-ray images for recognition of teeth in the proposed MPN system. The simulation experiment was implemented in MATLAB 2019a and was executed with Ryzen 7 2700 Eight-Core Processors (16 CPUs) with clock speed ~3.2 GHz. TITAN RTX 24 GB display memory (VRAM) was used for training and testing. Total 10 epochs were considered to train the network. Initial learning rate was kept 0.001, while the mini-batch size was 1 for this experiment. Apart from that, other hyperparameters, i.e., number of regions to sample from each training image, number of strongest regions to be used for generating training samples, negative overlap range, and positive overlap range, were set as 256, 2000, [0–0.3], and [0.6–1], respectively. Table 1 summarizes the parameter settings for this experiment. Figures 4 and 5 show the recall–precision curves of different teeth categories for ResNet-50 and ResNet-101, respectively, in MPN system. The curves obtained by ResNet-50 seem to be perfect examples of balances between precision and recall values. However, the curves obtained by ResNet-101 consist of some turbulences but they converge well at the end. Turbulence in recall–precision curves occurs when
A Deep Learning Technique for Automatic Teeth Recognition … Table 1 Parameter settings of faster R-CNN
Parameter
61 Value
Epoch
10
Mini-batch size
1
Initial learning rate
0.001
Number of regions to sample
256
Number of strongest regions
2000
Negative overlap ratio
[0–0.3]
Positive overlap ratio
[0.6–1]
Fig. 4 Recall–precision curves with ResNet-50-based faster R-CNN in MPN system
the system fails to classify the detected box correctly, i.e., false positive or fails to identify a present object, i.e., false negative. If it happens when the recall is small, the turbulences are large and visualization is clear. However, if such thing happens when the recall is big, the turbulences are minor and not clearly visible. For details, interested readers may check [1]. Figure 6 presents the average precisions of different teeth categories in MPN system. Out of eight teeth categories, ResNet-101 performed better in five teeth categories and ResNet-50 performed better in two teeth categories, while in other tooth category, i.e., T5, both of them performed equally. As described earlier, total 900 images were used as training and 100 images for test. MPN was considered for tooth numbering. In MPN system, the teeth are divided into eight categories. The results are presented in Table 2, which shows that the method achieved very encouraging recognition results. The table also shows that the deeper network, i.e., ResNet-101, was computationally more expensive. Japanese dentists
62
F. P. Mahdi and S. Kobashi
Fig. 5 Recall–precision curves with ResNet-101-based faster R-CNN in MPN system ResNet-50
ResNet-101
Average Precision
1.00 0.98 0.96 0.94 0.92 0.90 0.88
1
2
3
4
5
6
7
8
Tooth Number Fig. 6 Average precision for different category of teeth in MPN system
follow similar tooth numbering system like MPN. All teeth categories achieved mAP over 0.95 for both ResNet-50 and ResNet-101. Also, ResNet-101 performed little better than ResNet-50. Although the recognition results were very satisfactory and clinically implementable, it is slightly lower than what the method achieved in UTN system [17]. This little result variation might be caused by the randomness of the datasets, i.e., training and test datasets.
A Deep Learning Technique for Automatic Teeth Recognition …
63
Table 2 Recognition results (in mAP) of teeth using residual network-based faster R-CNN Tooth number
ResNet-50
ResNet-101
T1
0.951
0.968
T2
0.956
0.953
T3
0.972
0.963
T4
0.968
0.969
T5
0.962
0.962
T6
0.981
0.983
T7
0.963
0.967
T8
0.955
0.957
Training time (in seconds)
60,879
139,556
Testing time (in seconds)
62
71
mAP
0.963
0.965
Table 3 Comparison of mAP between the two residual networks for UTN and MPN system Tooth numbering system
ResNet-50
ResNet-101
UTN [17]
0.974
0.981
MPN
0.963
0.965
The faster R-CNN technique successfully performed the tooth recognition task in dental panoramic X-ray image following MPN system with mAP above 0.96 in both cases. Table 3 shows comparison of results between UTN and MPN systems. Both results were obtained using similar training and test datasets as well as same kind of networks and parameters. It shows in while following UTN system, the method performed little better than that in MPN system. The results demonstrated that the method can be implemented clinically to assist dental care providers. Two types of residual network were used as the base network for faster R-CNN, where deeper ResNet-101 performed bit better than its shallower counterpart, i.e., ResNet50. However, deeper network put additional computational burden on the system. Thus, where computational expense is a problem, ResNet-50 can be recommended to replace ResNet-101 as the base network of faster R-CNN. Finally, this section is concluded with Fig. 7, which shows four examples of detected teeth in dental panoramic X-ray images using ResNet-101-based faster R-CNN technique.
5 Conclusion An automatic teeth recognition method was proposed in this study to assist dental care providers in dentistry. Deep learning-based faster R-CNN technique was utilized, where residual network-based ResNet-50 and ResNet-101 were used as the base
64
F. P. Mahdi and S. Kobashi
Fig. 7 Four examples of successful teeth detection in dental panoramic X-ray images using ResNet101-based faster R-CNN technique
networks of faster R-CNN. Transfer learning technique was utilized to improve the recognition result. A MPN system was proposed to increase the feasibility of dental care providers. The method achieved more than 0.95 of mAP, which demonstrated that it can solve such problem quite efficiently. The performance of the method was sufficient enough to be implemented clinically, and therefore, it can be used as a useful and reliable tool in dentistry. Future research should consider more training images as well as post-detection refinement technique to increase the overall recognition results. The recognition system should also include different conditions of teeth to assist the dental care providers more efficiently.
References 1. Lin PL, Lai YH, Huang PW (2010) An effective classification and numbering system for dental bitewing radiographs using teeth region and contour information. Pattern Recogn 43(4):1380– 1392. https://doi.org/10.1016/j.patcog.2009.10.005 2. Hosntalab M, Aghaeizadeh Zoroofi R, Abbaspour Tehrani-Fard A, Shirani G (2010) Classification and numbering of teeth in multi-slice CT images using wavelet-Fourier descriptor. Int J Comput Assist Radiol Surg 5(3):237–249. https://doi.org/10.1007/s11548-009-0389-8 3. Rad AE, Rahim MSM, Norouzi A (2013) Digital Dental X-Ray Image Segmentation and Feature Extraction. Indones J Electr Eng Comput Sci 11:3109–3114 4. Shah S, Abaza A, Ross A, Ammar H (2006) Automatic tooth segmentation using active contour without edges. In: 2006 biometrics symposium: special session on research at the biometric consortium conference, 19 Sept–21 Aug 2006, pp 1–6 5. Miki Y, Muramatsu C, Hayashi T, Zhou X, Hara T, Katsumata A, Fujita H (2017) Classification of teeth in cone-beam CT using deep convolutional neural network. Comput Biol Med 80:24– 29. https://doi.org/10.1016/j.compbiomed.2016.11.003
A Deep Learning Technique for Automatic Teeth Recognition …
65
6. Mahoor MH, Abdel-Mottaleb M (2005) Classification and numbering of teeth in dental bitewing images. Pattern Recogn 38(4):577–586. https://doi.org/10.1016/j.patcog.2004.08.012 7. Yuniarti A, Nugroho AS, Amaliah B, Arifin AZ (2012) Classification and numbering of dental radiographs for an automated human identification system. TELKOMNIKA 10(1):137–146 8. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436. https://doi.org/10.1038/ nature14539 9. Chen H, Zhang K, Lyu P, Li H, Zhang L, Wu J, Lee C-H (2019) A deep learning approach to automatic teeth detection and numbering based on object detection in dental periapical films. Sci Rep 9(1):3840. https://doi.org/10.1038/s41598-019-40414-y 10. Tuzoff DV, Tuzova LN, Bornstein MM, Krasnov AS, Kharchenko MA, Nikolenko SI, Sveshnikov MM, Bednenko GB (2019) Tooth detection and numbering in panoramic radiographs using convolutional neural networks. Dentomaxillofac Radiol 48(4):20180051. https://doi.org/ 10.1259/dmfr.20180051 11. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi. org/10.1109/tpami.2016.2577031 12. Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), 7–13 Dec 2015, pp 1440–1448 13. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 27–30 June 2016, pp 770–778 14. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE conference on computer vision and pattern recognition, 23–28 June 2014, pp 580–587 15. Liu W, Liao S, Hu W (2019) Perceiving motion from dynamic memory for vehicle detection in surveillance videos. IEEE Trans Circuits Syst Video Technol 29(12):3558–3567. https://doi. org/10.1109/TCSVT.2019.2906195 16. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338. https://doi.org/10.1007/s11 263-009-0275-4 17. Mahdi FP, Yagi N, Kobashi S (2020) Automatic teeth recognition in dental X-ray images using transfer learning based faster R-CNN. In: IEEE international symposium on multiple-valued logic (ISMVL), 9-11 Nov 2020, pp 1–6
Detection of Parkinson’s Disease from Hand-Drawn Images Using Deep Transfer Learning Akalpita Das, Himanish Shekhar Das, Arijeet Choudhury, Anupal Neog, and Sourav Mazumdar
Abstract Parkinson’s disease mainly occurs in older people and unfortunately no specific cure is available till date. With early detection of this disease and with proper medication, a patient can lead a better life. This imparts the importance for early detection of this disease. In this paper, our aim is to process hand-drawn images such as spiral, wave, cube, and triangle shapes drawn by the patients using deep learning architectures. For computer-based diagnosis of Parkinson’s disease in early stage, deep convolutional neural networks are investigated. In this paper, three approaches are considered. In first approach, all types of images are fed into various pretrained models VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception, and InceptionResNet-v2 which are trained from the scratch. In second approach, exactly same techniques are being repeated with the exception that fine-tuning has been performed using transfer learning. In third approach, two shallow convolutional neural networks have been proposed. For all the three approaches mentioned above, the experimental work is conducted on two different datasets and the results reflect that the fine-tuned networks VGG19, ResNet50, and MobileNet-v2 from second approach perform better than the rest of the models with accuracy of 91.6% and 100% for dataset 1 and dataset 2, respectively. A. Das (B) Girijananda Chowdhury Institute of Management and Technology, Guwahati, Assam 781017, India e-mail: [email protected] H. S. Das Cotton University, Guwahati, Assam 781001, India e-mail: [email protected] A. Choudhury · A. Neog · S. Mazumdar Jorhat Engineering College, Jorhat, Assam 785007, India e-mail: [email protected] A. Neog e-mail: [email protected] S. Mazumdar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_6
67
68
A. Das et al.
Keywords Parkinson’s disease · Image analysis · Deep learning · Convolutional neural networks
1 Introduction Parkinson’s disease (PD) is the neurodegenerative disorder which is mostly visible in among older people aged more than 65 years. PD is featured by the loss of dopamine hormone generation by the neurons in the region of brain called “substantia nigra”. Dopamine hormone acts as the neurotransmitter to transmit the neural signals responsible for making the coordination between the brain and the body. It affects the human motor nervous system which causes the disorders like shaking, difficulty in walking, stiffness, balancing, and coordinating movements [1]. PD also affects the voice and other cognitive difficulties. Due to the progressive nature of the PD, if the patients cannot be diagnosed at the early stage, then the severity of the disease gets increased. Many researchers are carrying out their research in this field to find out an effective way for early detection of PD. Lots of work have been done utilizing speech utterances, using voice features, time frequency features, etc., by [2–6]. A common symptom to detect PD is various kinds of tremors [7]. Among other symptoms, hand tremor is the mostly used symptom to detect PD from the hand-drawn images/sketches/handwriting by the PD patients. To discriminate the PD patients from the healthy person, sketches of different patterns like spiral, wave, texts are found to be useful [8]. Many researchers and clinical personnel have already found the clue related to the hand-drawn patterns and handwritten texts in detection of PD at early stages [9, 10]. In this paper, we have tried to develop a model based on variants of convolutional neural network architectures using different patterns drawn by the healthy people as well as the PD patients. In this study, two approaches are considered to discriminate the PD patient and healthy person. Pretrained models are trained from scratch in first approach, thereafter fine-tuning is performed using transfer learning in the second approach. At the end, a shallow convolutional network has been proposed and compared with other approaches. The organization of this paper is as follows: detail literature survey is presented in Sect. 2. Section 3 presents the dataset description. Explanation of the proposed methodology is elaborately described in Sect. 4. Experimental results and discussions are presented in Sect. 5 followed by Sect. 6 which concludes the paper.
2 Related Works In last few years, a good number of research works have been performed to detect PD using the handwriting as well as hand-drawn images. Drotár et al. [11] have used PaHaW Parkinson’s disease handwriting database, consisting of a total of 37 numbers
Detection of Parkinson’s Disease from Hand-Drawn Images …
69
Table 1 Detailed description of datasets Datasets
Dataset 1
Dataset 2
Spiral
Wave
Spiral
Cube
Triangle
Parkinson
51
51
54
54
58
Healthy
51
51
54
54
58
of PD patients and 38 numbers of healthy persons performing eight different types of handwriting activities which include drawing an Archimedean spiral, repetitively writing simple syllables, words, and sentences. In this study, three classifiers such as KNN, SVM, and AdaBoost were used to predict PD based on the conventional kinematic and handwriting pressure features, and they obtained an accuracy of 81.3%. C. Loconsole et al. [12] investigated a different method of using Electro-Myo Graphy (EMG) signals and computer vision techniques such as morphology operators and image segmentation process. In this study, ANN with two different cases (dataset 1 with 2 dynamic and 2 static features, dataset 2 with only 2 dynamic features) was carried out and obtained an accuracy of 95.81% and 95.52%, respectively. Pereira et al. [13] used convolutional neural network (CNN) for the discrimination of PD group and healthy patients based on the data acquired by a pen with several sensors attached with it. The data were comprised of spiral and meanders drawn by both the PD and healthy individuals. In 2019, Folador et al. [14] worked on handwritten drawings to classify the PD and healthy individual by using histogram of gradients (HOG) descriptor and random forest classifier and reached a good accuracy.
3 Database Description For experimental part of this research, two different datasets are used. The dataset 1 is collected from Kaggle’s which is a public data repository and the dataset was shared by [15]. The data procurement was performed at RMIT University, Melbourne, Australia. Dataset 2 has been taken from [16] with a mutual agreement for this research work. This dataset has been prepared by the support of Brazilian National Council for Research and Development (CNPq) via grants no. 304315/2017-6 and 430274/2018-1. Table 1 shows the detailed description of both the datasets. Few images from both the datasets are shown in Fig. 1.
4 Proposed Methodology The main feature of deep learning architectures is that it can extract low-level to high-level features by their own. Convolutional neural networks (CNNs) are the best models to extract deep features from images. In this work, two different approaches
70
A. Das et al.
Fig. 1 Sample images a spiral, b wave images from dataset 1 and c cube, d triangle images of PD patients from dataset 2
on variants of pretrained CNNs such as VGG19, ResNet50, MobileNet-v2, Inceptionv3, Xception, and Inception-ResNet-v2 are used for experimental findings. In the first approach, these pretrained models are trained from scratch using both the datasets. In the second approach, the concept of transfer learning (TL) is applied. TL can be achieved using two different ways. In one way of feature extraction, the deep features are extracted by means of chopping off all the layers from the fully connected layer to the out layer. In other way, transfer learning is achieved using fine-tuning, it means that we can freeze some layers, change the dimension of fully connected layer, and make a new architecture utilizing pretrained model. In this paper, we have used transfer learning using fine-tuning where learned weights from already trained network are used to obtain the deep features. These features are used to train model for the new problem. In the third approach, a shallow convolutional neural network (SCNN) is proposed and the efficiency of SCNN is compared with the previous two approaches.
4.1 Training from Scratch To explore the performance of different pretrained CNNs for detection of PD, VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception, and Inception-ResNetv3 networks are trained from scratch. Initially for all the networks, the learning rate and epochs are considered as 0.001 and 50, respectively. Learning rate has been reduced by factor of 0.5 if validation loss does not improve in 3 epochs. For both datasets, batch size for each image type is taken as 16 and that for mixture of images it is taken as 32. During training process, model checkpoint is used to keep track of the best model parameters. For each network, the best learning rate achieved for each pattern is shown in Table 2.
4.2 Fine-Tuning Using Transfer Learning with CNNs In Table 3, different CNN architectures along with their tuned parameter values for the transfer learning used in this work are listed.
0.00025
0.0000625
0.0000625
0.00025
0.0000625
ResNet50 [18]
MobileNet-v2 [19]
Inception-v3 [20]
Xception [21]
Inception-ResNet-v2 [22]
Wave
0.0000625
0.00025
0.0000625
0.0000625
0.000125
0.00025
Spiral wave
0.0000625
0.00025
0.0000625
0.0000625
0.00025
0.00025
0.00003125
0.0000625
0.0000625
0.0000625
0.000125
0.000125
Spiral
0.000125
Spiral
VGG19 [17]
Dataset 2
Dataset 1
Table 2 Best learning rate achieved for training from scratch Cube
0.00003125
0.0000625
0.0000625
0.0000625
0.00025
0.000125
Triangle
0.00003125
0.0000625
0.0000625
0.0000625
0.000125
0.00025
Spiral–Cube–Triangle
0.00003125
0.0000625
0.0000625
0.0000625
0.0000625
0.000125
Detection of Parkinson’s Disease from Hand-Drawn Images … 71
72
A. Das et al.
Table 3 CNN architectures and their tuned parameters Network VGG19 [17]
Number of layers cut off Number of layers added
Number of nodes added
16
4
1024, Dropout (0.4)
ResNet50 [18]
174
2
256, Dropout (0.4)
MobileNet-v2 [19]
154
2
1024, Dropout (0.4)
Inception-v3 [20]
310
2
1024, Dropout (0.4)
Xception [21]
131
2
1024, Dropout (0.4)
Inception-ResNet-v2 [22]
779
2
1024, Dropout (0.4)
4.3 Shallow Convolutional Neural Network (SCNN) Architecture The essential parameters for the proposed shallow CNN architectures are image size, number of filters, pooling window size, and batch size. In this work, we have proposed two methods to verify the efficiency with shallow networks. Method 1. In this, image size of 150 × 150, 10 filters each of size 4 × 4, pooling window of size 2 × 2 batch size as 25 and a single convolutional layer has been considered. Method 2. In this, image size of 150 × 150, 32 filters each of size 3 × 3, batch normalization, pooling window of size 2 × 2, and ReLU activation function is used in first convolutional layer. For second convolutional layer, 64 filters each of size 3 × 3, pooling window of size 2 × 2, batch normalization and ReLU activation function is used. After flattening of the features in the next dense layer, 128 neurons and ReLU activation function are used. The final layer is having only one neuron for classification which is achieved by using sigmoid activation function.
5 Results and Discussion To assess the performance of the proposed model in different scenarios, two different approaches have been taken into account for both the datasets. In first approach, out of all different categories such as spiral, wave, triangle, and cube, similar kind of images is trained and tested. In second approach, mixture of different categories of images from respective dataset is taken into consideration for the training phase, and thereafter, the model is tested with a random figure. In this paper, the second approach has been opted for the less availability of image data for each category. Table 4 gives the various performance measures evaluated in this work.
Detection of Parkinson’s Disease from Hand-Drawn Images … Table 4 Assessment of performance measures
73
Measure
Formulae
Accuracy
TP+TN TP+TN+FP+FN TP TP+FP TP TP+FN TN TN+FP
Precision Recall/sensitivity Specificity
5.1 Analysis for Dataset 1 Tables 5, 6 and 7 contain the result sets for spiral, wave, and mixture of spiral wave images, respectively. Confusion matrices are shown in Fig. 2. Among spiral and wave images, spiral images show maximum recognition rate of 93.3% for VGG19 and for the mixture of images, VGG19, ResNet50, and MobileNet-v2 achieve 91.6% accuracy. From Tables 5, 6 and 7, it can be concluded that fine-tuned models outperform the same model which are trained from scratch.
5.2 Analysis for Dataset 2 Tables 8, 9, 10 and 11 contain the result sets for spiral, cube, triangle, and mixture of spiral–cube–triangle images. Confusion matrices are shown in Fig. 3. Each category of images shows maximum accuracy of 100% and for the mixture of images, VGG19, ResNet50, MobileNet-v2, and Inception-v3 achieve 100% accuracy. From Tables 8, 9, 10 and 11, it can be found that fine-tuned models perform better than those trained from scratch and for less training data, small networks like VGG19 performs better than large networks such as Inception-ResNet-v2. Tables 12 and 13 contain the performance evaluation of the two proposed methods for shallow CNN. This analysis is performed to verify whether any SCNN can perform better than any pretrained networks for less training data. From the result sets, it can be concluded that fine-tuned approach outperforms the other two approaches. Experimental results from Tables 5, 6, 7, 8, 9, 10, 11, 12 and 13 reflect that out of the three different approaches, the fine-tuning approach shows the best accuracy rate. VGG19, ResNet50, and MobileNet-v2 models provide accuracy of 91.6%, 100% for dataset 1 and dataset 2, respectively. When compared to Folador et al. [14], where the authors have used dataset 1 with hand-crafted features with machine learning techniques, they achieved up to 84% recognition rate, whereas the second approach proposed in this work able to achieve 91.6%.
83.3
50
60
53.3
63.3
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
100
52
56
0
50
83
Recall
27
80
100
0
100
100
Specificity
100
27
20
100
0
80
70
80
86.7
90
90
93.3
Accuracy
Precision
Accuracy
90
From fine-tuning
From scratch
VGG19
Network
Table 5 Result set for spiral images (in %)
69
80
92
100
93
93
Precision
73
80
80
80
87
93
Recall
67
80
93
100
93
93
Specificity
74 A. Das et al.
73.3
50
60
63.3
76.7
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
72
100
100
0
100
92
Recall
87
27
20
0
47
80
Specificity
67
100
100
100
100
93
83.3
73.3
93.3
93.3
96.7
93
Accuracy
Precision
Accuracy
86.7
From fine-tuning
From scratch
VGG19
Network
Table 6 Result set for wave images (in %)
86
71
88
93
100
93
Precision
80
80
100
93
93
93
Recall
87
67
87
93
100
93
Specificity
Detection of Parkinson’s Disease from Hand-Drawn Images … 75
73.3
50
61.6
66.7
75
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
71
86
89
0
79
83
40
27
0
63
87
Recall
Specificity
67
93
97
100
83
80
71.6
76.7
85
91.6
91.6
91.6
Accuracy
81
Precision
Accuracy
83.3
From fine-tuning
From scratch
VGG19
Network
Table 7 Result set for the mixture of spiral wave images (in %)
67
77
92
96
96
93
Precision
87
77
77
87
87
90
Recall
57
77
93
97
97
93
Specificity
76 A. Das et al.
Detection of Parkinson’s Disease from Hand-Drawn Images …
77
(a) VGG19
(b) ResNet50
(c) MobileNet-v2
(d) Inception-v3
(e) Xception
(f) Inception-ResNet-v2
Fig. 2 Confusion matrices of fine-tuning for mixture of spiral wave images from dataset 1
6 Conclusion This paper investigates the use of deep convolutional neural networks in Parkinson’s disease detection from hand-drawn images. For this work, different types of pattern shapes such as spiral, cube, wave, and triangle from two different datasets have been used. In this work, three different approaches have been considered. In the first approach, the networks are trained from scratch with random weights. In second approach, fine-tuning is used for training where finely tuned values of already trained network weights are taken. In third approach, two shallow convolutional neural networks have been trained and tested. The first two approaches are trained and tested for VGG19, ResNet50, MobileNet-v2, Inception-v3, Xception, and InceptionResNet-v2, respectively. Experimental results show that the fine-tuning approach gives better results than the other two approaches. Out of all models, VGG19,
91.7
75
75
75
100
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
100
75
75
75
90
100
Recall
100
100
100
100
100
100
Specificity
100
0
0
0
67
100
83.3
95.8
95.8
100
100
100
Accuracy
Precision
Accuracy
100
From fine-tuning
From scratch
VGG19
Network
Table 8 Result set for spiral images (in %)
85
95
95
100
100
100
Precision
94
100
100
100
100
100
Recall
50
83
83
100
100
100
Specificity
78 A. Das et al.
100
71.4
71.4
28.6
100
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
100
0
71
71
100
100
Recall
100
0
100
100
100
100
Specificity
100
100
0
0
100
100
95.2
76.1
95.2
100
100
100
Accuracy
Precision
Accuracy
100
From fine-tuning
From scratch
VGG19
Network
Table 9 Result set for cube images (in %)
94
75
94
100
100
100
Precision
100
100
100
100
100
100
Recall
83
17
83
100
100
100
Specificity
Detection of Parkinson’s Disease from Hand-Drawn Images … 79
Recall
100
75
75
75
95.8
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
100
75
75
75
100
94
100
100
100
100
100
Specificity
100
0
0
0
100
100
95.8
100
100
100
100
100
Accuracy
100
Precision
Accuracy
100
From fine-tuning
From scratch
VGG19
Network
Table 10 Result set for triangle images (in %)
95
100
100
100
100
100
Precision
100
100
100
100
100
100
Recall
83
100
100
100
100
100
Specificity
80 A. Das et al.
86.9
73.9
73.9
73.9
89.8
ResNet50
MobileNetv2
Inceptionv3
Xception
InceptionResNetv2
90
88
74
74
74
100
100
100
100
92
98
Recall
Specificity
61
0
0
0
72
100
86
98
100
100
100
100
Accuracy
Precision
100
Accuracy
98.5
From fine-tuning
From scratch
VGG19
Network
Table 11 Result set for the mixture of cube–spiral–triangle images (in %)
84
100
100
100
100
100
Precision
90
98
100
100
100
100
Recall
82
100
100
100
100
100
Specificity
Detection of Parkinson’s Disease from Hand-Drawn Images … 81
82
A. Das et al.
(a) VGG19
(b) ResNet50
(c) MobileNet-v2
(d) Inception-v3
(e) Xception
(f) Inception-ResNet-v2
Fig. 3 Confusion matrices of fine-tuning for mixture of spiral–cube–triangle images from dataset 2 Table 12 Performance evaluation of SCNN using Method 1 Datasets
Patterns
Dataset 1
Spiral
80
Wave Mixture Dataset 2
Accuracy
Precision
Recall
Specificity
80
80
80
50
50
100
0
65
60
90
40
Spiral
100
100
100
100
Cube
71
71
100
0
Triangle
92
100
89
100
Mixture
95
91
100
89
Detection of Parkinson’s Disease from Hand-Drawn Images …
83
Table 13 Performance evaluation of SCNN using Method 2 Datasets
Patterns
Dataset 1
Spiral
77
Wave
73
71
80
67
Mixture
70
68
77
63
100
100
100
100
Dataset 2
Spiral
Accuracy
Precision 83
Recall 67
Specificity 87
Cube
100
100
100
100
Triangle
100
100
100
100
Mixture
100
100
100
100
ResNet50, and MobileNet-v2 perform better than the rest of the models with accuracy of 91.6%, 100% for dataset 1 and dataset 2, respectively. Acknowledgements This work belongs to the research project that has been sanctioned by Assam Science and Technology University, Guwahati, under Collaborative Research scheme of TEQIP-III. We are grateful to TEQIP-III and the sponsored University for providing us the financial support and also the opportunity to carry out this research work.
References 1. Mohamed GS (2016) Parkinson’s disease diagnosis: detecting the effect of attributes selection and discretization of Parkinson’s disease dataset on the performance of classifier algorithms. Open Access Libr J 3(11):1–11 2. Hariharan M, Polat K, Sindhu R (2014) A new hybrid intelligent system for accurate detection of Parkinson’s disease. Comput Methods Programs Biomed 113(3):904–913 3. Aich S, Younga K, Hui KL, Al-Absi AA, Sain M (2018) A nonlinear decision tree based classification approach to predict the Parkinson’s disease using different feature sets of voice data. In: 20th international conference on advanced communication technology (ICACT). IEEE, pp 638–642 4. Peker M, Sen B, Delen D (2015) Computer-aided diagnosis of Parkinson”s disease using complex-valued neural networks and mRMR feature selection algorithm. J Healthc Eng 6(3):281–302 5. Zhang YN (2017) Can a smartphone diagnose parkinson disease? a deep neural network method and telediagnosis system implementation. Parkinson’s Disease 6. Grover S, Bhartia S, Yadav A, Seeja KR (2018) Predicting severity of Parkinson’s disease using deep learning. Procedia Comput Sci 132:1788–1794 7. Andrade AO, Pereira AA, Soares MF, de Almeida GLC, Paixão APS, Fenelon SB, Dionisio VC (2013) Human tremor: origins, detection and quantification. In: Andrade AO (ed) Pract Appl Biomed Eng. InTech, Croatia 8. Saunders-Pullman R, Derby C, Stanley K, Floyd A, Bressman S, Lipton RB, Deligtisch A, Severt L, Yu Q, Kurtis M, Pullman SL (2008) Validity of spiral analysis in early Parkinson’s disease. Movem Disorders: Off J Movem Disorder Soc 23(4):531–537 9. Graça R, Castro RS, Cevada J (2014) Parkdetect: early diagnosing Parkinson’s disease. In: IEEE international symposium on medical measurements and applications (MeMeA). IEEE, pp 1–6
84
A. Das et al.
10. San Luciano M, Wang C, Ortega RA, Yu Q, Boschung S, SotoValencia J, Bresman SB, Lipton RB, Pullman S, Saunders Pullman R (2016) Digitized spiral drawing: a possible biomarker for early Parkinson’s disease. PloS One 11(10):e0162799 11. Drotár P, Mekyska J, Rektorová I, Masarová L, Smékal Z, Faundez-Zanuy M (2016) Evaluation of handwriting kinematics and pressure for differential diagnosis of Parkinson’s disease. Artif Intell Med 67:39–46 12. Loconsole C, Trotta GF, Brunetti A, Trotta J, Schiavone A, Tatò SI, Losavio G, Bevilacqua V (2017) Computer vision and EMG-based handwriting analysis for classification in Parkinson’s disease. In: International conference on intelligent computing. Springer, Berlin, pp 493–503 13. Pereira CR, Weber SA, Hook C, Rosa GH, Papa JP (2017) Deep learning-aided Parkinson’s disease diagnosis from handwritten dynamics. In: 29th SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). IEEE, pp 340–346 14. Folador JP, Rosebrock A, Pereira AA, Vieira MF, de Oliveira Andrade A (2019) Classification of handwritten drawings of people with Parkinson’s disease by using histograms of oriented gradients and the random forest classifier. In: Latin American conference on biomedical engineering. Springer, Cham, pp 334–343 15. Zham P, Kumar DK, Dabnichki P, Poosapadi Arjunan S, Raghav S (2017) Distinguishing different stages of Parkinson’s disease using composite index of speed and pen-pressure of sketching a spiral. Front Neurol 8:435 16. Bernardo LS, Quezada A, Munoz R, Maia FM, Pereira CR, Wu W, de Albuquerque VHC (2019) Handwritten pattern recognition for early Parkinson’s disease diagnosis. Pattern Recogn Lett 125:78–84 17. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 18. Akiba T, Suzuki S, Fukuda K (2017) Extremely large minibatch sgd: Training resnet-50 on imagenet in 15 minutes. arXiv preprint arXiv:1711.04325 19. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 20. Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In: 2nd international conference on image, vision and computing (ICIVC). IEEE, pp 783–787 21. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258 22. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
An Empirical Analysis of Hierarchical and Partition-Based Clustering Techniques in Optic Disc Segmentation J. Prakash and B. Vinoth Kumar
Abstract Optic disc segmentation in fundus image is a significant phase in diagnosis of eye disease like diabetic retinopathy and glaucoma. Segmenting the portion of optic disc which is bright yellowish in color is called as optic disc segmentation. Automated optic disc segmentation is essential to diagnose the eye disease at the earliest stage to prevent the eye sight loss. Segmentation of optic disc can be performed using clustering techniques. In this work, hierarchical and partitionbased clustering techniques are used to segment the optic disc. Five datasets namely DIARETDB1, CHASE DB, HRF DB, INSPIRE DB, and DRIONS DB are used to evaluate the clustering techniques. A comparative study was made based on the results using the performance parameters like accuracy, error rate, positive predicted value, precision, recall, false discovery rate, and F1 score. The results show that the hierarchical clustering technique proves to be better than partition-based clustering for the all considered datasets. Keywords Optic disc segmentation · Partition-based clustering techniques · Hierarchical clustering technique · Positive predicted value · False discovery rate
1 Introduction In the recent years, eye disease in human is increasing rapidly. It also found that over 150 millions of world population are affected by eye disease. The major causes of this disease are diabetic retinopathy and glaucoma which may lead to blindness. Early detection of eye disease using the retinal images could be very much useful in curing and preventing the blindness.
J. Prakash (B) · B. V. Kumar PSG College of Technology, Coimbatore, Tamil Nadu, India e-mail: [email protected] B. V. Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_7
85
86
J. Prakash and B. V. Kumar
The early prevention of glaucoma can be done by determining the optic disc and optic cup along with the cup-to-disc ratio (CDR). The optic disc structure of the healthy eyes and affected eyes may not be similar and will be of varying size [1]. So, the optic disc segmentation from the fundus image will be very much useful in diagnosis of eye disease, which is the primary stage in glaucoma and diabetic retinopathy screening. The presence of lesions and exudates in the fundus image makes the optic disc detection difficult as they look similar to optic disc [2]. The major objective of this paper is to effectively segment the optic disc from the fundus images using hierarchical (i.e., agglomerative clustering) and partitionbased clustering (i.e., K-means) techniques on five fundus dataset which are available publicly: DIARETDB1, CHASE DB, HRF DB, INSPIRE DB, and DRIONS DB. The performance of the clustering techniques is compared by considering the performance parameters: accuracy, error rate, positive predicted value, precision, recall, false discovery rate, and F1 score. In this paper, K-means clustering and agglomerative clustering are preferred for optic disc segmentation over the other methods because K-means clustering is simple robust and faster [3–5] and agglomerative clustering is stable, which does not require number of clusters to be predefined [6]. This work is necessary for the researchers in choosing the suitable clustering methods from either hierarchical or partition-based clustering for their field of research based on the results inferred from this paper. The paper is organized as follows: ‘Related Work’ provides the various optic disc segmentation methods that are present in the literature. ‘Methodology’ provides the details of the implementation work. ‘Performance Analysis’ provides the information about the various evaluation results and ‘Conclusion’ to conclude.
2 Related Work Huge number of researchers have worked on many research publications on optic disc segmentation and localization [7]. Some of the papers on optic disc detection and segmentation are discussed. Panda et al. [8] made a comparative study between fuzzy C-mean (i.e., hard clustering) clustering and K-means clustering (soft clustering). In this paper, the influence of distance measure in clustering performance is determined using Euclidean distance and Manhattan distance. The performance is measured by considering iris dataset, wine dataset, and lens dataset. The results inferred that the K-means clustering performed better than fuzzy C-mean in terms of speed. Mendonca et al. [9] proposed a method to detect the optic disc in fundus images automatically. In this method, the optic disc localization is based on the intensity of blood vessels using entropy of vascular directions concept. This performance was measured by considering four datasets: DRIVE, STARE, MESSIDOR, and INSPIRE-AVR. In a total of 1361 images from four dataset, the optic disc location of 1357 images was found to be valid with a success rate of 99.7%.
An Empirical Analysis of Hierarchical and Partition-Based Clustering …
87
Marin et al. [10] used a methodology that produces an enhanced image with bright region using morphological operations. Region of interest is obtained using two-step automatic thresholding procedure. Finally, by applying circular hough transform on the candidates of optic disc boundary, the optic disc center is obtained. This method was evaluated on MESSIDOR database containing 1200 images and MESSIDOR-2 database containing 1748 images. The obtained results suggest that the methodology has very high accuracy rate in the location optic disc center. Morales et al. [11] proposed a principal component analysis with mathematical morphology-based optic disc contour extraction which uses various operation like: the stochastic watershed, generalized distance function (GDF), geodesic transformations, and a variant of the watershed transformation. Initially the input is obtained using PCA to attain grayscale image. The implementation was validated using publicly available DRIONS, DIARETDB1, DRIVE, MESSIDOR, and ONHSD datasets. It attained an accuracy of 99.47%, true positive fraction on 0.9275, and 0.0036 fraction of false positive. Hsiao et al. [12] achieved localization of optic disc using SGVF snake model (i.e., supervised gradient vector flow). Since the convention GVF snake model had difficulty in handling fuzzy disc boundaries vessel, occlusion SGVF snake model is prolonged to classify the contour points and update the feature information. Statistical information and feature vector of training images are relied for classification. This method was evaluated on publicly available DRIVE dataset and STARE dataset. The experimental results suggest that the method has an overall performance of 95% in localization of optic disc and 91% in determining optic disc boundaries. Xue et al. [13] proposed a clustering-based saliency model in optic disc detection. Initially K-means clustering is applied to extract the optic disc region candidates from fundus images. Optic disc region is selected by determining maximum saliency region, which are computed from two saliencies of subregions. Then the ellipse fitting is used to detect the optic disc contour. Finally, active contours are used to segment the optic disc contour accurately. This model was tested by considering the four datasets: DRIVE, STARE, MESSIDOR, and DRISHTI-GS. The experimental results infer that the DRISHTI-GS dataset has an accuracy of 94% in optic disc detection and 88% in segmentation.
3 Methodology Optic disc detection from fundus image has various stages of implementation shown in Fig. 1. Initially fundus images are obtained from publicly available datasets and preprocessed to remove the noises and obtain good contrast fundus image [14]. As a preprocessing step the fundus images are resized for effective processing, Dilate and Gaussian blur filters were applied to smoothen the image by removing the noises. Once the images are preprocessed the hierarchical (i.e., agglomerative) and partitionbased (i.e., K-means) clustering techniques were applied to obtain the optic disc, which is the brightest region of the fundus image.
88
J. Prakash and B. V. Kumar
Fig. 1 Stages of optic disc detection
3.1 Hierarchical Clustering It performs the clustering in the strategic order by forming a cluster tree. The clusters will be formed initially and which will be compared with each other. The clusters that are having almost same property are grouped into a single cluster until a single cluster is obtained. Divisive and agglomerative clustering methods are the category of hierarchical clustering which are top down and bottom up, respectively. The agglomerative clustering technique is applied in this paper.
3.2 Partition-Based Clustering In the partition-based clustering methods, the clustering is formed by partitioning the data based on their properties. The quantity of partitions (i.e., clusters) needs to be formed is specified by the user. K-means, fuzzy C-means, K-medoids, and CLARA are some of the categories of partition-based clustering. The K-means clustering technique is applied in this paper. Once the optic disc region is determined by the clustering techniques, the attained images are masked to obtain the optic region and centroid. The masking is performed by selecting the optic disc region and masking out the other regions. Once the masking is complete with the optic disc region the centroids of it are determined. The obtained optic disc region can be used for the diagnosis of eye disease.
4 Performance Analysis The hierarchical and partition-based clustering techniques as evaluated by considering the publicly available five dataset DIARETDB1 with 89 fundus images [15], CHASE DB with 28 fundus images [16], HRF DB with 45 fundus images [17], INSPIRE DB with 40 fundus images [18], and DRIONS DB with 110 fundus images [19].
An Empirical Analysis of Hierarchical and Partition-Based Clustering …
89
The hit rate (i.e., correct detection of optic disc) of the clustering techniques was obtained and shown in Table 1 and its graphical representation in Fig. 2. The hit rate is determined using the below Eq. (1) (Pa − Pb )2 + (Q a − Q b )2 < Amean
(1)
where Pa and Qa are the center points of the ground truth values of the optic disc, Pb and Qb are the center points of determined values, and Amean is the radii standard average [20]. Figure 3 shows the input and output obtained by performing clustering techniques for a sample input fundus image. The segmented region of optic disc is marked in a green color circle. The performance of hierarchical clustering and partition-based clustering is determined by considering the metrics: accuracy, error rate, positive predicted value, Table 1 Hit rate of clustering technique Hierarchical clustering
Partition-based clustering
Dataset
Total images
Total hit
Total miss
Hit rate
DIARETDB1
89
67
22
75
CHASE DB
28
26
2
93
HRF DB
45
45
0
100
INSPIRE DB
40
33
7
82.5
DRIONS DB
110
110
0
100
DIARETDB1
89
68
21
76
CHASE DB
28
27
1
96
HRF DB
45
45
0
100
INSPIRE DB
40
32
8
80
DRIONS DB
110
82
28
75
HIT RATE Hir Rate (%)
120 100 80 60 40
20 0
DIARETDB1
CHASE DB
HRF DB
INSPIRE DB
Dataset Hierarchical Clustering
Partition Based Clustering
Fig. 2 Performance of clustering techniques on the dataset
DRIONS DB
90
J. Prakash and B. V. Kumar
Input Fundus Image
Segmented Optic disc
Fig. 3 Optic disc segmentation using clustering techniques
precision, recall, false discovery rate, and F1 score [21]. The performance outcome of hierarchical clustering and partition-based clustering is shown in Tables 2 and 3, respectively. The average values of the performance parameters shown in the tables reflect that the hierarchical clustering is better compared to partition-based clustering. Table 2 Performance of hierarchical clustering Dataset
Precision Recall
Hierarchical DIARETDB1 92.54 clustering CHASE DB 92.85
Accuracy F1 score
Error rate
PPV
FDR
73.81
69.66
82.12
30.34
92.53
7.40 7.14
100.00
92.85
96.29
7.10
92.85
97.78
100.00
97.78
98.88
2.22
97.70
2.20
INSPIRE DB 89.10
91.60
82.50
90.40
17.50
89.18
10.81
100.00
95.45
97.67
4.55
95.45
HRF DB DRIONS DB
95.45
Average
93.544
93.082 87.648
93.072 12.342 93.542
4.50 6.41
Table 3 Performance of partition-based clustering Dataset
Precision Recall Accuracy F1 score
Partition-based DIARETDB1 86.76 clustering CHASE DB 92.85 HRF DB
Error rate
PPV
FDR
73.75 66.29
79.73
33.71
86.70
13.20
100.00 92.85
96.29
7.10
92.85
7.14
91.11
100.00 91.11
95.35
8.89
91.11
8.80
INSPIRE DB 86.40
91.40 80.00
88.88
20.00
86.48
13.51
DRIONS DB 90.24
72.55 67.27
80.43
32.73
90.20
9.70
Average
87.54 79.504
88.136 20.486 89.468 10.47
89.472
An Empirical Analysis of Hierarchical and Partition-Based Clustering …
91
The confusion matrix of the total images used for evaluating the hierarchical clustering and partition-based clustering is shown in Figs. 4 and 5, respectively. The results infer that the hierarchical clustering has a higher percentage of predicting the actual values than the partition-based clustering techniques. The hierarchical clustering and partition-based clustering was applied on the databases: DIARETDB1, CHASE DB, HRF DB, INSPIRE DB, and DRIONS DB. The precision was determined. The results infer that performance of hierarchical clustering in optic disc detection shown in Fig. 6 is better when compared to partition-based clustering in for all the datasets. Fig. 4 Confusion matrix of hierarchical clustering
Fig. 5 Confusion matrix of partition-based clustering
92
J. Prakash and B. V. Kumar
Fig. 6 Comparison of clustering technique based on precision
Figure 7 represents the performance of the clustering technique on determining the recall values. The obtained results state that the performance of clustering techniques on DIARETDB1, CHASE DB, HRF DB, and INSPIRE DB are similar, while in case of DRIONS DB the performance of hierarchical clustering is better. The accuracy and F1 score of the hierarchical and partition-based clustering were determined and shown in Figs. 8 and 9, respectively. The results infer that accuracy and F1 score of hierarchical clustering are healthier for DIARETDB1, HRF DB,
Fig. 7 Comparison of clustering technique based on recall
Fig. 8 Comparison of clustering technique based on accuracy
An Empirical Analysis of Hierarchical and Partition-Based Clustering …
93
Fig. 9 Comparison of clustering technique based on F1 score
Fig. 10 Comparison of clustering technique based on error rate
INSPIRE DB, and DRIONS DB when compared to partition-based clustering. While on CHASE DB the accuracy and F1 score of both clustering techniques are similar. Figure 10 represents the error rate of clustering techniques on the considered datasets. The results denote that the error rate of hierarchical clustering is lesser compared to partition-based clustering on DIARETDB1, HRF DB, INSPIRE DB, and DRIONS DB. In case of CHASE DB, the error rate is similar. The positive predicted value and false discovery rate of clustering techniques on considered datasets were determined and shown in Figs. 11 and 12, respectively. The inference from the results states that the positive predicted value of hierarchical clustering is better compared to partition-based clustering. While the false discovery rate of hierarchical clustering lower than partition-based clustering shows that the hierarchical clustering is better.
94
J. Prakash and B. V. Kumar
Fig. 11 Comparison of clustering technique based on PPV
Fig. 12 Comparison of clustering technique based on FDR
5 Conclusion In this paper, the optic disc of the fundus image is segmented using hierarchical clustering and partition-based clustering and a comparative study was made on the attained results. The results are obtained by performing clustering techniques on publicly available datasets namely DIARETDB1, CHASE DB, HRF DB, INSPIRE DB, and DRIONS DB. The results inferred that the accuracy of hierarchical clustering on all the dataset is better when compared to partition-based clustering. Also, the overall positive predicted value on all the considered datasets for hierarchical clustering is 93.5% and for partition-based clustering is 89.5%. It is also observed that the hierarchical clustering outperforms partition-based clustering in all other parameters. Thus, from overall results, we shall conclude that hierarchal clustering performs better for all considered datasets. In the future work, the accuracy of optic disc segmentation can be improved by using deep learning techniques and also evolutionary techniques could be used for effective segmentation of optic disc.
An Empirical Analysis of Hierarchical and Partition-Based Clustering …
95
References 1. Setiawan AW, Mengko TR, Santoso OS, Suksmono AB (2013) Color retinal image enhancement using CLAHE. In: ICT for Smart Society (ICISS), pp 1–3. IEEE 2. Basit A, Fraz MM (2015) Optic disc detection and boundary extraction in retinal images. Appl Opt 54:3440–3447. https://doi.org/10.1364/AO.54.003440 3. Kumar BV, Karpagam GR, Rekha NV (2015) Performance analysis of deterministic centroid initialization method for partitional algorithms in image block clustering. Indian J Sci Technol 8(S7):63–73 4. Kumar BV, Janani K, Priya NM (2017) a survey on automatic detection of hard exudates in diabetic retinopathy. In: IEEE International Conference on Inventive Systems and Control, JCT College of Engineering and Technology, Coimbatore, Tamil Nadu 5. Kumar BV, Sabareeswaran S, Madumitha G (2018) A decennary survey on artificial intelligence methods for image segmentation. In: International conference on advanced engineering optimization through intelligent techniques, Sardar Vallabhbhai National Institute of Technology, Surat, Gujarat, India 6. Makrehchi M (2016) Hierarchical agglomerative clustering using common neighbours similarity. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, pp 546–551. https://doi.org/10.1109/WI.2016.0093 7. Haleem MS, Han L, Van Hemert J, Li B (2013) Automatic extraction of retinal features from colour retinal images for glaucoma diagnosis: a review. Comput Med Imaging Graph 37:581– 596. https://doi.org/10.1016/j.compmedimag.2013.09.005 8. Panda S, Sahu S, Jena P, Chattopadhyay S (2012) Comparing fuzzy-C means and K-means clustering techniques: a comprehensive study. In: Advances in Computer Science, Engineering & Applications, AISC, vol. 166, pp 451–460 9. Mendonca AM, Sousa A, Mendonca L, Campilho A (2013) Automatic localization of the optic disc by combining vascular and intensity information. Comput Med Imaging Graph 37:409–417. https://doi.org/10.1016/j.compmedimag.2013.04.004 10. Marin D, Gegundez Arias ME, Suero A, Bravo JM (2015) Obtaining optic disc center and pixel region by automatic thresholding methods on morphologically processed fundus images. Comput Methods Programs Biomed 118:173–185. https://doi.org/10.1016/j.cmpb. 2014.11.003 11. Morales S, Naranjo V, Angulo J, Alcañiz M (2013) Automatic detection of optic disc based on PCA and mathematical morphology. IEEE Trans Med Imaging 32:786–796. https://doi.org/ 10.1109/TMI.2013.2238244 12. Hsiao H-K, Liu C-C, Yu C-Y, Kuo S-W, Yu S-S (2012) A novel optic disc detection scheme on retinal images. Expert Syst Appl 39:10600–10606. https://doi.org/10.1016/j.eswa.2012.02.157 13. Xue L-Y, Lin J-W, Cao X-R, Yu L (2018) Retinal blood vessel segmentation using saliency detection model and region optimization. J Algorithms Comput Technol 3–12. https://doi.org/ 10.1177/1748301817725315 14. Prakash J (2018) Enhanced mass vehicle surveillance system. J Res 4(3):5–9 15. Kauppi T, Kalesnykiene V, Kamarainen J, Lensu L, Sorri I, Raninen A, Voutilainen R, Uusitalo H, Kalviainen H, Pietila J (2007) The DIARETDB1 diabetic retinopathy database and evaluation protocol. In: Proceedings of the British Machine Vision Conference 2007. https://doi.org/ 10.5244/c.21.15 16. Owen CG, Rudnicka AR, Mullen R, Barman SA, Monekosso D, Whincup PH, Ng J, Paterson C (2009) Measuring retinal vessel tortuosity in 10-year-old children: validation of the computerassisted image analysis of the retina(caiar) program. Invest Ophthalmic Vis Sci 50(5):2004– 2010 17. Budai A, Bock R, Maier A, Hornegger J, Michelson G (2013) Robust vessel segmentation in fundus images. Int J Biomed Imaging 1–11. https://doi.org/10.1155/2013/154860 18. Niemeijer M, Xu X, Dumitrescu A, Gupta P, van Ginneken B, Folk J, Abramoff M (2011) Automated measurement of the arteriolar-to-venular width ratio in digital color fundus photographs. In: IEEE Trans Med Imaging. [Epub ahead of print] PubMed PMID: 21690008
96
J. Prakash and B. V. Kumar
19. Carmona EJ, Rincon M, Garcia-Feijoo J, Martinez-de-la-Casa JM (2008) Identification of the optic nerve head with genetic algorithms. Artif Intell Med 43(3):243–259 20. Kumar BV, Karpagam GR, Zhao Y (2019) Evolutionary algorithm with memetic search capability for optic disc localization in retinal fundus images. Intell Data Anal Biomed Appl 191–207. https://doi.org/10.1016/b978-0-12-815553-0.00009-4 21. Almazroa A, Burman R, Raahemifar K, Lakshminarayanan V (2015) J Ophthalmol 2015:1–28. https://doi.org/10.1155/2015/180972
Multi-class Support Vector Machine-Based Household Object Recognition System Using Features Supported by Point Cloud Library Smita Gour, Pushpa B. Patil, and Basavaraj S. Malapur
Abstract The proposed system aims to design and develop an object recognition system with the help of the Point Cloud Library (PCL). The object recognition problem is addressed with a three-stage mechanism. In the initial stage using PCL, the object image undergoes segmentation methods which make the image suitable for extracting the features. In the second stage, a segmented image is used to extract suitable shape-based features that can separate each object type. The last stage involves the classification/recognition of the object of the particular type using support vector machine (SVM). The system reached the expected results using Point Cloud Library (PCL) and support vector machine (SVM) as a classifier. The system has given 94% accuracy for 10 different household object types. Ten samples for each object type are used for training the SVM, and to perform testing, complete different five samples from training samples are considered. Keywords Object recognition · Point cloud library · Support vector machine
1 Introduction Object recognition is a major research area in the field of computer vision since the appearance of objects is different in its size, shape, etc. due to the varying distance between the camera and the object. Also, there is a drastic change in its visual parameters due to the use of different sources of illumination and capturing conditions during image acquisition. The main goal of the object recognition system is to make S. Gour (B) · B. S. Malapur Basaveshwar Engineering College, Bagalkot 587102, Karnataka, India e-mail: [email protected] B. S. Malapur e-mail: [email protected] P. B. Patil BLDEA’s V P Dr PG Halakatti College of Engineering & Technology, Vijayapura 586101, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_8
97
98
S. Gour et al.
the object recognition by computer without human assistance. This is the major issue in the field of robotics which is not yet solved completely. In these days, technology has made our mind to use robots in our homes to assist. To accept such robots, they should have the capability of recognizing household objects with high accuracy. Also, the mobility of robots and static environment help them to recognize easily by imaging objects in different views before any judgement. Computer vision is a field that includes methods for acquiring, processing, analysing, and understanding images to produce numerical or symbolic information to form such decisions. The two main problems that need to be addressed while designing an object recognition system are selecting the appropriate set of features for image representation and for object recognition choosing the best classifier. The first problem is still challenging to solve because the extracted features should not be sensitive to the transformations such as scale, translation, and rotation. The second problem can be addressed by choosing such a classifier which is proven to be and suitable to perform with high accuracy of classification. Point Cloud Library (PCL) is one such emerging framework that supports various tasks of object recognition systems such as object segmentation in an image, feature extraction, and classification of objects. It is discussed in ensuing sections.
1.1 Point Cloud Library Point Cloud Library is a framework that is freely available and includes the huge amount and state-of-the-art functions defined mainly for analysis of 2D and 3D object images. Object recognition algorithms in PCL use hard and fast small- and largescale functions to have a representation of factor ‘cloud’ in the word PCL. However, the exact/correct number of key points required to form a cloud is a major challenge. A large-scale function carries enough records of the nearby space of a key point which is desirable to have shape invariant features, but not robust to occlusion and clutter. Such functions in PCL are called global descriptors. Some of these in PCL are Viewpoint Feature Histogram (VFH), Ensemble of Shape Function (ESF), Global Fast Point Feature Histogram (GFPFH), and Global Radius-Based Surface Descriptor (GRSD). They represent the shape of the object by computing geometry for entire clusters (cloud) and not for the individual pixel. These descriptors can be employed for geometric evaluation, pose estimation, object recognition, and version retrieval. On the other hand, small-scale functions called local descriptors are the features which are very robust to occlusion and clutter because it computes local geometry associated with each point. It however suffers from low descriptiveness. The local descriptors available in PCL are Fast Point Feature Histogram (FPFH), Radius-Based Surface Descriptor (RSD), etc. Scientifically, it is evident that global descriptors are more desirable for vision systems rather than pixel-wise (local) descriptors. Either of these descriptors follows such a method that computes unique values for a particular point. Several such values are the distances between the clouds or the distinction between the angles between a point and its neighbours.
Multi-class Support Vector Machine-Based Household Object Recognition …
99
1.2 Literature Survey The research works represented in the literature on object recognition are employing both local and global components analysis. A few contemporary mechanisms for the recognition of household objects found in related literature are presented in this section. All those mechanisms can be broadly categorized as follows. (1) Constitutional part-based approach: Typically, methods that attempt to recognize the object based on the recognition of constituent parts. (2) Methods based on how objects look like [1]: Attempts to recognize objects based on comparison to template objects. (3) Methods that rely on invariant features: These methods involve mechanisms to identify and extract the invariant features of objects precisely and making decisions based on extracted features to recognize objects. Very few works employed these descriptors from PCL. It is discussed further in this section. The methods for object recognition presented in [2–6], and [7] employ support vector machine (SVM) and K-nearest neighbour (KNN) as a classifier and extracted features which are invariant to size, translation, and rotation. Also, they employed principal component analysis (PCA) for feature reduction. The method presented in [8] relies on local and global features of the objects for 3D object recognition using Point Cloud Library (PCL). The method follows a hybrid approach employing Viewpoint Feature Histogram (VFH) and Fast Point Feature Histogram (FPFH) methods for object recognition. VFH is used as global descriptor for object recognition, whereas FPFH method is used as a local descriptor to estimate the position of the object in a scene. A strategy learned system to select a 3D object in point cloud descriptor is presented in [9]. The mechanism is based on a 3D classifier pipeline which involves reinforcement learning. The approach in [10] exhibits calculations and information structures for segmentation 3D point clouds in light of processing topological persistence. Basically, image-based object recognition has various sub tasks like preprocessing (removal of noise and other degradations in image) [11–13], and [14], object segmentation in image [15–18], and [19], and recognition using learnt mechanisms [20, 21], and [22] based on features that are extracted from image [23–28], and [29]. Efficient methods are yet required to perform these subtasks for object recognition. The current computer vision systems are now relying on machine learning [30] for recognition of real-world objects acquired images like soil and plant leaves. Through machine learning, a system can acquire the knowledge to recognize an object from the scene when accurately designed. In designing, such an intelligent system focus should be on building computer programmes that are capable of learning the context, evolve, and adapt when exposed to newer contexts. One such new technique called deep learning [31, 32] which is the part of machine learning is getting more attention from researchers. The results of applying deep learning in many applications [33, 34] are becoming more promising. But in deep learning, models get trained by a huge
100
S. Gour et al.
amount of labelled data. A neural network is structured with many layers and requires computationally high processors. PCL can also provide techniques that reduce the burden of pre-processing, feature extraction, as well as segmentation, and the results can be combined with any classifier not only neural networks. Hence, one such methodology using PCL is proposed and presented in this paper.
1.3 Overview of the Work In the initial stage, the object image undergoes segmentation using PCL which make the image suitable for extracting the features. In the second stage, a segmented image is used to extract suitable shape-based features using Fast Point Feature Histogram (FPFH) technique from PCL. The last stage involves the classification/recognition of the object of the particular type using multi-class support vector machine (SVM). The system reached the expected results using Point Cloud Library (PCL) and support vector machine (SVM) as a classifier. The details of proposed system are depicted in Sect. 2. In Sect. 3, the detailed experimentation conducted has been discussed. And Sect. 4 depicts the conclusion and some future work.
2 Proposed Methodology The problem definition concluded after a literature survey is to develop an efficient object recognition system using Point Cloud Library that can accurately and reliably recognize among different object types such as a chair, keyboard, laptop, mouse, and object images regardless of their viewpoint and illumination condition. The system ensures that it overcomes some of the pre-processing issues by selecting efficient segmentation, feature methods, and classifier for object recognition and to perform accurate classification of objects. The proposed model for this work is indicated in Fig. 1. It comprises two phases: training phase and testing phase. These phases include a few stages which are explained in the following sections.
2.1 Object Image Database The proposed methodology is built for a real object image database. About 10 different categories of household object images like cooling glass, chair, laptop, mouse, keyboard, fan, watch, alarm, speaker, and toy elephant have been collected shown in Fig. 2. About 15 samples for each of these categories are captured using a cell phone camera of resolution 5mp.
Multi-class Support Vector Machine-Based Household Object Recognition …
101
2.2 PCL-Based Object Segmentation Segmentation of objects from an image starts with segregating pixels based on some similarity metric analysis. By this, it attempts to separate the object from its background and from other objects in an image. The performance of this step will determine the accuracy of further tasks of object recognition. PCL provides such segmentation methods whose results are noticeable in other segmentation applications. Among them, the proposed system has chosen the PCL-based watershed segmentation method since it is suitable for our problem. The details about this method are given in the subsequent section. Watershed Segmentation. The most recent arrival of the image processing toolbox like Point Cloud Library (PCL) incorporates a technique for segmentation called watershed segmentation strategy and is one of the effective methods for solving image segmentation issues. To have an insight into the watershed segmentation technique, let us consider an image shown in Fig. 3 (left) consisting of two dark objects. Here, this image can be thought of as surface image shown in Fig. 3 (right). The catchment bowls in this image are the objects to be distinguished, and the watershed line is a separation between two objects. The ideas of watersheds and catchment basins are notable in any scenery. There are two basic approaches to think an image as a surface image. They are as follows.
Fig. 1 Block diagram of object recognition system using PCL
102
S. Gour et al.
Fig. 2 10 Different objects considered in this study
Fig. 3 Sample image with two objects (left) and surface image (right)
The first one begins with finding a downstream way from every pixel of the image to a nearby minimum of image surface elevation. A catchment basin is then characterized as the arrangement of pixels for which their particular downstream paths all end up at a similar altitude minimum. The second approach begins with filling catchment bowls from the base double to the first rather than distinguishing the downstream paths assuming that there is an opening in every neighbourhood minimum, and the topographic surface is submerged in water. Water begins filling all catchment bowls, minima of which are under the water level. Each object images are segmented before it extracts the feature values. Figure 4 shows the object image before segmentation and after segmentation using the watershed technique described above.
2.3 PCL-Based Feature Extraction The main goal of feature extraction technique here is to accurately regain the structure (shape) based features. Shape features are not sensitive to the translation and rotation
Multi-class Support Vector Machine-Based Household Object Recognition …
103
Fig. 4 Object images before and after segmentation
because even if an object in an image is transformed, the geometry of shapes such as centroid distance, area, and the chord length will not drastically change. In the proposed system, shape features are generated by employing the Global Fast Point Feature Histogram (GFPFH) technique from PCL. It is one of those methods which extract shape feature values based on the boundary information and centroid distance. Centroid distance is the distance between the edge points and its centre, and it is found to be translation and rotation invariant. Fast Point Feature Histogram (FPFH). The proposed system uses FPFH features extraction method for shape feature of an object which is helpful to classify the object type. The FPFH considers just the immediate associations between the current key point and its neighbours, expelling extra connections between neighbours. Along these lines, the subsequent histogram is referred as Simplified Point Feature Histogram (SPFH). To reduce the complexity of computing histogram features, system performs the following steps. Initially, an arrangement of sets of points surrounding query point Pq and neighbours which are described will be known as the Simplified Point Feature Histogram (SPFH). In a second step, again k neighbours are re-derived for a query point, and the neighbouring SPFH features calculated in the first step are used to prioritize the final histogram of which is now called Fast Point Feature Histogram (FPFH) as shown in Eq. (1).
104
S. Gour et al.
Fig. 5 Region diagram to set centred neighbour point
FPFH Pq = SPFH Pq
1 1 + SPFH(Pk ) k i=1 wk k
(1)
The point Pq and a neighbour are separated by weight in some given metric space. To know the significance of this weighting plan, Fig. 5 underneath presents the impact locale chart for a k number of neighbours focused at.
2.4 Support Vector Machine In the proposed system, support vector machine (SVM) is used as a classifier. It is a group of correlated supervised learning technique used for classification and regression. The object images belong to different classes. This classification is done by some functions which are called a hyperplane. The identification of such hyperplane that splits the samples into groups is the main objective of SVM modelling. The samples near the hyperplane are referred to as support vectors. The SVM starts with the training sample, where x i is a feature vector representing training sample training and yi is a corresponding target value of a sample. The training phase of SVM is to determine weight vector ‘w’ and the bias ‘a’ of the hyperplane which optimally separates the sample in training set such that Eq. (2). yi w T ∅(xi ) + a ≥ 1 − si , ∀i si > 0, ∀i
(2)
Multi-class Support Vector Machine-Based Household Object Recognition …
105
The objective function given by Eq. (3) is minimized using w and slack variables si . 1 Y w w+C si 2 i=1 N
ϕ(w, ξi ) =
(3)
Here, the variables called slack is the error metric in feature space, C is the constant errors, and (.) is the kernel-based transformation function that transforms features to a high dimensional feature space. Generally, A functions which are linear one are used to represent a hyperplane which separates samples in the feature space, and it is given by Eq. (4) for training data of the form (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) where yi maybe 1 or 0. K (X i , Yi ) = X iT Yi
(4)
The SVM discussed above is binary SVM. The multiclass SVM is built by combining k number of separately trained binary SVMs where k is the number of classes in a problem. As the proposed system recognition is one-to-many matching, multi-class SVM is used rather than two-class SVM.
3 Results and Discussion In this proposed system, the shape-based features are used for a different type of objects to achieve a higher rate of accuracy. During the training stage, only 10 samples for each of 10 different household objects that are 10×10 = 100 images are used. To calculate the performance of the proposed method, 50 (10×5) object images with varying size and different from training samples have been tested. Among these 50 images, 47 images were correctly recognized. The following confusion matrix table illustrates the overall classification accuracy. The proposed system achieved an average accuracy of classification 94% for only 10 different objects which have been calculated manually as follows (Table 1). Object classification accuracy = 47 ÷ 50 × 100 = 94% All ten-type objects such as cooling glass, chair, laptop, mouse, keyboard, fan, watch, alarm, speaker, and toy elephant are taken for study, and their object-wise classification accuracy is shown in Table 2. This work can be compared with other methods like deep learning [32] and the backpropagation neural network (BPNN) [33] that are conducted on the same 10 objects which are considered in this work. Also, it is compared with other existing work [3] SVM-KNN with geometric invariant features. The proposed PCL-SVM system has shown satisfactory and comparable result. Table 3 gives a detailed comparison.
106
S. Gour et al.
Table 1 Confusion matrix for accuracy computation
Obj1
Obj2
Obj3
Obj4
Obj5
Obj6
Obj7
Obj8
Obj9
Obj10
Spects
5
0
0
0
0
0
0
0
0
0
5
Chair
0
4
1
0
0
0
0
0
0
0
5
Laptop
0
1
4
0
0
0
0
0
0
0
5
Mouse
0
0
0
5
0
0
0
0
0
0
5
Keyboard
0
0
0
0
5
0
0
0
0
0
5
Fan
1
0
0
0
0
5
0
0
0
0
5
Watch
0
0
0
0
0
0
5
0
0
0
5
Alarms
0
0
0
0
0
0
1
5
0
0
5
Speaker
0
0
0
0
0
0
0
0
4
1
5
Toy
0
0
0
0
0
0
0
0
1
5
5
6
5
5
5
5
4
6
4
5
5
47
Table 2 Object-wise recognition accuracy Object type
Object name
Accuracy (%)
1
Cooling glass
100
2
Chair
80
3
Laptop
80
4
Mouse
100
5
Keyboard
100
6
Fan
100
7
Watch
100
8
Alarms
100
9
Speaker
80
10
Toy elephant
100
Table 3 Comparison with other approaches Method/Approach
Number of samples in training set = (No. of objects × samples per object)
Accuracy (%)
CNN-based
10×50 = 500
96
BPNN with shape features
10×20 = 200
92
SVM + KNN
10×10 = 100
90
PCL-SVM (Proposed)
10×10 = 100
94
Multi-class Support Vector Machine-Based Household Object Recognition …
107
4 Conclusion The proposed system overcomes some of the pre-processing issues like segmentation of objects from images and ensures that it can perform efficient classification of objects. The system makes use of FPFH feature extraction methods from PCL framework to extract shape features of segmented images. The SVM is employed as a classifier to classify/recognize segmented objects. The experiment is carried out on 5 × 10 = 50 new object images, where 10 × 10 = 100 samples used for training purpose. Among 50 samples, 47 samples are correctly classified. The proposed PCL-SVM system achieved satisfactory classification accuracy, i.e. 94%. Future work of this attempt of object recognition includes analysis with more texture-based features available in PCL for object and combined with present feature vector to obtain the higher accuracy. With that, the system can be extended to recognize more number of objects.
References 1. Mattern FW, Denzler J (2004) Comparison of appearance based methods for generic object recognition. Pattern Recogn Image Anal 14(2):255–261 2. Muralidharan R, Chandrasekar C (2011) Object recognition using support vector machine augmented by RST invariants. IJCSI Int J Comput Sci Issues 8(5):280–286 3. Muralidharan R, Chandrasekar C (2011) Object recognition using SVM-KNN based on geometric moment invariant. Int J Comput Trends Technol 215–219 4. Muralidharan R (2014) Object recognition from an image through features extracted from segmented image. Int J Adv Res Comput Sci Softw Eng 4(12):205–209 5. Muralidharan R (2014) Object recognition using k-nearest neighbour supported by eigen value generated from the features of an image. Int J Innov Res Comput Commun Eng 2(8):5521–5528 6. Muralidharan R, Chandrasekar C (2012) 3D object recognition using multiclass support vector machine-k-nearest neighbour supported by local and global feature. J Comput Sci 1380–1388 7. Muralidharan R, Chandrasekar C (2011) Scale invariant feature extraction for identifying an object in the image using moment invariants. J Eng Res Stud 2(1):99–10 8. Alhamzi K, Elmogy M, Barakat S (2015) 3D object recognition based on local and global features using point cloud library. Int J Adv Comput Technol (IJACT) 7(3):43–54 9. Garstka J, Peters G (2015, July 21–23) Adaptive 3-D object classification with reinforcement learning. In: International Conference on informatics in Control, Automation and Robotics 10. Beksi WJ, Papanikolopoulos N (2016, May 16–21) 3D point cloud segmentation using topological persistence. In: 2016 IEEE International Conference on Robotics and Automation (ICRA) Stockholm, Sweden 11. Faraji MR, Qi X (2014) Face recognition under varying illumination with logarithmic fractal analysis. IEEE Sig Process Lett 21(12):1457–1461 12. Du S, Ward RK (2010) Adaptive region-based image enhancement method for robust face recognition under variable illumination conditions. IEEE Trans Circ Syst Video Technol 20(9):1165–1175 13. Farooque MA, Rohankar JS (2013) Survey on various noises and techniques for denoising the color image. Int J Appl Innov Eng Manage (IJAIEM) 2(11):217–221 14. Sharma A, Chaturvedi R, Kumar S, Dwivedi UK (2020) Multi-level image thresholding based on Kapur and Tsallis entropy using firefly algorithm. J Interdisc Math 23(2):563–571
108
S. Gour et al.
15. Srinivasan GN, Shobha G (2007) Segmentation techniques for target recognition. Int J Comput Commun 1(3):75–81 16. Oji R (2012) An automatic algorithm for object Recognition and detection based on ASIFT Key points. Sig Image Process Int J (SIPIJ) 3(5):29–39 17. Khurana K, Awasthi R (2013) Techniques for object recognition in images and multi-object detection. Int J Adv Res Comput Eng Technol (IJARCET) 2(4):1383–1388 18. Sharma N, Mishra M, Shrivastava M (2012) Colour image segmentation techniques and issues. Int J Sci Technol Res 1(4):9–12 19. Sharma A, Chaturvedi R, Dwivedi UK, Kumar S, Reddy S (2018) Firefly algorithm based effective gray scale image segmentation using multilevel thresholding and entropy function. Int J Pure Appl Math 118(5):437–443 20. Morgan & Claypool Publisher (2009) Algorithms for reinforcement learning. Draft of the lecture published in the Synthesis Lectures on Artificial Intelligence and Machine Learning, June 2009 21. Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA 22. Prandi F, Brumana R, Fassi F (2010) Semi automatic objects recognition in urban areas based on fuzzy logic. J Geogr Inf Syst 2:55–62 23. Diplaros A (2003) Color-shape context for object recognition. In: IEEE Workshop on Color and Photometric Methods in Computer Vision, pp 1–8 24. Hsiao H, Collet A, Hebert M (2010) Making specific features less discriminative to improve point-based 3D object recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, pp 1–8, June 2010 25. Lemaitre C, Smach F, Miteran J, Gauthier JP, Atri M (2006) A comparative study of motion descriptors and Zernike moments in color object recognition In: Conference: IEEE Industrial Electronics, IECON 2006—32nd Annual Conference, pp 1–6 26. Ananthashayana VK, Asha V (2008) Appearance based 3D object recognition using IPCA-ICA. In: The international archives of the photogrammetry, remote sensing and spatial information sciences, vol 37, Part B1, pp 1083–1090 27. Gour S, Patil PB (2016) A novel machine learning approach to recognize household objects. In: 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, pp 69–73 28. Kumar S, Sharma B, Sharma VK, Poonia RC (2018) Automated soil prediction using bagof-features and chaotic spider monkey optimization algorithm. Evol Intel. https://doi.org/10. 1007/s12065-018-0186-9 29. Kumar S, Sharma B, Sharma VK, Sharma H, Bansal JC (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput Inf Syst. https://doi.org/10. 1016/j.suscom.2018.10.004 30. Pawar VN, Talbar SN (2012) Machine learning approach for object recognition. Int J Model Optim 2(5):622–628 31. Wang J (2011) Deep learning: an artificial intelligence revolution. A white paper ARK Invest, pp 1–41 32. LeCun Y, Kavukvuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: International Symposium on Circuits and Systems, pp 253–256 33. Scherer D, Müller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: International Conference on Artificial Neural Networks, pp 92–101 34. Gour S, Patil PB (2020) An exploration of deep learning in recognizing house-hold objects. Grenze Int J Eng Technol, Special Issue
Initialization of MLP Parameters Using Deep Belief Networks for Cancer Classification Barı¸s Dinç , Yasin Kaya , and Serdar Yıldırım
Abstract Deep belief network (DBN) is deep neural network structure consisting of a collection of restricted Boltzmann machine (RBM). RBM is two-layered simple neural networks which are formed by a visible and hidden layer, respectively. Each visible layer receives a lower-level feature set learned by previous RBM and passes it through to top layers turning them into a more complex feature structure. In this study, the proposed method is to feed the training parameters learned by DBN to multilayer perceptron as initial weights instead of starting them from random points. The obtained results on the bioinformatics cancer dataset show that using initial weights trained by DBN causes more successful classification results than starting from random parameters. The test accuracy using proposed method increased from 77.27 to 95.45%. Keywords Deep belief networks · Restricted boltzmann machine · Multilayer perceptron · Classification
1 Introduction In recent years, especially with the completion of the human genome project, improvements in bioinformatics are accelerated. However, strong analysis techniques are still needed to understand the secret behind genetic codes. One of the approaches used in the analysis of bioinformatics data is machine learning techniques. In this way, it is aimed to obtain satisfactory results in terms of gene–disease relationships B. Dinç (B) · Y. Kaya · S. Yıldırım Department of Computer Engineering, Adana Alparslan Türke¸s Science and Technology University, Adana, Turkey e-mail: [email protected] Y. Kaya e-mail: [email protected] S. Yıldırım e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_9
109
110
B. Dinç et al.
as well as the distinction of diseased and healthy individuals. For the detection of genetic disease using machine learning strategies, it is necessary to evaluate thousands of genes in microarray datasets [1]. This requires the use of efficient and reliable classification techniques, especially in the field of disease diagnosis. In spite of the high-dimensional feature sets, the number of samples tends to be small in microarray data. Increasing the number of features causes the search domain to expand exponentially [2]. This adversity between the number of features and sample size poses a big challenge in a multivariate statistical machine learning structure. In recent years, especially after 2010, with the introduction of deep architectures into the field of machine learning, there have been improvements in the learning of complex features. In the learning phase, lower-level layers learn simple feature sets and pass them into the higher levels, thus it is provided learning more complex features layer by layer. In a multilayered feed-forward neural network structure, one of the most important factors affecting classification result is the selection of the initial weights. Whether the optimum starting parameters are selected affects the classification accuracy and the efficiency of the learning algorithm besides avoiding the local minimum. The DBN structure, a probabilistic generative model, can be used to choose the initial parameters of the multilayer perceptron (MLP). The top two hidden layers form symmetric undirected connections while the lower hidden layers generate a directed top-down graph in DBN. The training method of DBN is a constative divergence that relies on the approximation of the gradient of the data likelihood. RBM is a generative model that is composed from binary visible variables v = {(vi )} and hidden variables h = {(h i )}. The learning algorithm is unsupervised and greedy that can find the optimum parameters even in a deep model which consists of many hidden layers [3]. In the study of Abdel-Zaher and Eldeid [4], DBN was used as a unsupervised pre-training phase in the breast cancer database before the supervised training phase and provided higher classification accuracy compared to the classifier with only one supervised phase. In an artificial neural network structure, there are many parameters that affect the success of the learning algorithm such as learning rate, architecture, training time, and the initial weights. Depending on the complexity of the data to be analyzed, shortfat or long-deep architectures may be preferred. However, no matter how accurate the architecture created, there are also some factors that prevent finding the global optimum. The main contribution of this study is to make the training algorithm that is MLP, more efficient by starting initial weights as satisfying as possible. For this purpose, using pretrained initial weights by DBN is intended to improve classification success. In addition, it should be remembered that the right choice of the initial weights may cause to decrease training time and computational cost besides increasing the classification performance. In order to eliminate the curse of dimensionality problem in high-dimensional dataset, top-100 features were selected by applying the Fisher scoring feature selection strategy to original feature set in experimental analysis. In the study conducted
Initialization of MLP Parameters Using Deep Belief Networks …
111
by Pe rez-Ortiz et al. [5] subjective well-being factors were successfully extracted by Fisher scoring method, using data containing 56 different components of the happiness in European countries. In another study conducted by Gu et al. the generalized Fisher scoring method, which maximizes the lower bound of the traditional Fisher score, was presented and experiments showed that the proposed method outperformed many state-of-the-art feature selection methods [6]. The structure of this paper is as follows: Sect. 2 introduces ALLAML microarray dataset analyzed in the experiments as material and RBM, DBN, and MLP structure. Section 3 describes the experimental tests and numerical results of the proposed method on the microarray data.
2 Material and Methods 2.1 Material The ALLAML is a bioinformatics cancer dataset that showed how the new cancer cases could be classified using gene expression monitoring and ensured to assign the tumor types to known classes. The dataset consists of two types of tumors which are acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). There are 72 samples and 7129 gene expressions in the dataset used in the experiments. The dataset is included in the study by Golub et al. [7].
2.2 Methods Restricted Boltzmann Machines. RBM is an energy-based generative model which means the probability distribution of variables is described by an energy function [8]. It composes of a set of visible variables v = {(vi )} and hidden variables h = {(h i )}. The energy function between visible and hidden units is defined [9]: E(v, h) = −b T v − c T h − h T wv
(1)
Since there is no connection among units in the same layer, the conditional distributions are: P(vi |h) (2) P(v|h) = i
and
112
B. Dinç et al.
P(h|v) =
P(h i |v)
(3)
i
Each node in both visible and hidden layers is connected to all the units to corresponding layer. In the case of a binary data vector, the probability of a visible variable: P(vi |h) = sigm (bi + Wi .h)
(4)
and the probability of hidden variables: P(h j |v) = sigm (c j + W j .v)
(5)
where Wi and W j are weight matrix belonging to visible and hidden units. The sampling algorithm needs to run multiple times for each sample to obtain useful reconstruction in order to predict the distribution of the model. If the sampling is run for a sufficient number of iterations for all data points, a gradient value can be obtained for the model parameters. However, it takes a long time. Instead of running a sampling algorithm until chain convergence, it can be run for a single iteration. In this way, it is expected the parameters to learn the optimum parameter values to get minimum reconstruction error. Accordingly, if the data distribution remains intact after a single iteration, it will remain intact in the final reconstruction [8]. In this case, the update rule is: wi j = ε vi h j data − vi h j 1
(6)
vi = ε vi2 data − vi2 model
(7)
P(h j |v) = sigm (c j + W j .v)
(8)
where ε is learning rate, the term 1 is the expected distribution of the samples after 1 step Gibbs sampling initialized from data [10]. Deep Belief Networks. A DBN is a deep neural network structure that models the joint distribution among input and hidden variables using a stack of RBM. Let R1 , R2 . . . Rn be the successive RBMs in DBN. In the training stage, it is started to train the R1 that has the original input vector v and hidden layer h. After learning the model parameter Q in R1 , the prior distribution of hidden layer, p(h|Q), can be replaced by R2 that takes the reconstructed hidden units h 1 as the input vector. Figure 1 shows the architecture of DBN. Multilayer Perceptron. The multilayer perceptron consists of interconnections of simple and multiple neurons that interact by weighted connections [11]. There is a
Initialization of MLP Parameters Using Deep Belief Networks …
113
Fig. 1 Architecture of DBN
nonlinear relation between input and output signals which are the nonlinear function of total input units [12]. Many nonlinear activation functions that approximate the output signal to a nonlinear function are enabled in MLP. One of the most used is sigmoid due to its easily differentiable. The output of a neuron is fed forward to the next layer as input via weight parameters. The information processing architecture in this direction is called the feed-forward neural network. A multilayer perceptron may consist of one or more hidden layers. Each node is fully connected to adjacent layers while there is no interconnection within a layer. The architecture of MLP is shown in Fig. 2. In the training stage, the desired output is tried to approximate 1 while the others is total input in layer l + 1 received by neuron j: are clamped to 0. Suppose y l+1 j = y l+1 j
w ji yil + w j0
(9)
i
where yil is the state of ith unit in layer l, w ji is the connection among ith node in layer l and jth node in layer l + 1, and w j0 is the bias term. The output neuron of yil is: yil =
1 (1 + e−z )
(10)
114
B. Dinç et al.
Fig. 2 Architecture of MLP
where z is total input. The learning stage is to determine the optimum parameters that minimize the desired and obtained output. The cost regarding a sample is defined by an error function in the training of a neural network: E a (W |x a , oa ) =
1 a (o − x a )2 2
(11)
where oa is the desired output for sample x a . One possible procedure for minimizing the error is to use a gradient descent algorithm starting with any weight matrix. The backpropagation training approach is used to distribute the error by using a chain rule: ∂ E ∂ y j ∂h k ∂E = ∂wki ∂ y j ∂h k ∂wki
(12)
where h k is the output of kth hidden unit while y j is jth output. Updating rules are also derived by gradient descent: v jk = σ
oaj − y aj h ak
(13)
a
wki = ε
[ (oaj − y aj )v jk ]h ak (1 − h ak )xia a
(14)
j
where v jk is the weight parameters among hidden and output layers and y aj )v jk is the total error that is conveyed to kth hidden unit. The steps of backpropagation [13]:
a j (o j
−
Initialization of MLP Parameters Using Deep Belief Networks …
115
3 Experimental Results In this study, DBN has been performed on the microarray ALLAML dataset in order to achieve sufficient initial parameters as a preprocess before the classification task using MLP. In order to get more reliable training and classification results, the dataset was shuffled. Besides, in the first experiences that were carried out using 500,128, and 1 output of RBM, it was determined that both DBN and MLP are suffering from the curse of dimensionality problem since both rely on neural network-based approaches. In order to overcome this weakness, Fisher scoring feature selection was applied on the dataset and the analyses were performed covering top-100 features. Thus, computational cost and underfitting problems due to high dimensionality were exceeded. 70% of the data was used for the training of both DBN and MLP. For the purpose of obtaining the initial parameters of input vectors consisting of 100 features, 2 RBMs which have 50 and 1 output units were used, respectively. Parameters received as a result of the training of DBN were presented to MLP as initial weights. The learning rates were determined as 0.001 and 0.01 for DBM and MLP, respectively. These values are based on experimental observations. It was observed that increasing the number of layers of DBN does not affect the classification performance positively. Similarly, the increasing number of Gibbs iterations does not change the classification results and it was determined as 200 in the training of DBN. The effect of initial parameters trained by DBN was compared with the random initial weights in the classification of ALLAML dataset. Table 1 shows both classification results.
116
B. Dinç et al.
Table 1 Classification results on ALLAML dataset using DBN and random initial weights Metrics
Initialized by DBN
Initialized randomly
Training accuracy
100.00%
94.00%
Testing accuracy
95.45%
77.27%
Training F-measure
1.00
0.88
Testing F-measure
0.95
0.66
Cost
0.01
1.80
Epoch
26
200
Results marked in bold represent the superior results compared
As seen in Table 1, starting with pretrained weights by DBN ensures a more successful classification performance compared to random initial parameters. In terms of training time, the training time carried out using random initial weights was continued longer in order to eliminate the injustice caused by the training of DBN. However, while training with the contribution of DBN succeeded just in 26 epochs, the training of random parameters could not be superior to others even though that was carried out longer. The number of misclassified data during the training of the MLP is given in Figs. 3 and 4. Moreover, although the number of misclassified data is more in the first epoch in case pretrained weights by DBN are applied, completing the training in a short time reveals that DBN is superior in finding the global optimum rapidly. As seen in Fig. 4, during the training stage started from the random weights, the number of misclassified data did not change in the first 75 epoch and decreased with the small changes along 200 epoch. This indicates that random initial weights
Fig. 3 Number of misclassified data during training of MLP using DBN
Initialization of MLP Parameters Using Deep Belief Networks …
117
Fig. 4 Number of misclassified data during training of MLP using random initial weights
can be interpreted as the factor which makes the training of the learning algorithm difficult although it is started with the small number of misclassified data.
4 Conclusions In this study, the effect of using pretrained initial weights to the classification of microarray data was analyzed instead of specifying random initial parameters in order to avoid the local minimum problem. Since both DBN and MLP are neural network-based approaches, feature selection was applied to microarray data as a preprocess in order to overcome the curse of dimensionality problem besides reducing the computation cost. The obtained results show that using initial parameters trained by DBN causes more successful classification results than starting from random weights.
References 1. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez J, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135 2. Debie E, Shafi K (2017) Implications of the curse of dimensionality for supervised learning classifier systems: theoretical and empirical analyses. Pattern Anal Appl 22(2):519–536 3. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554 4. Abdel-Zaher AM, Eldeib AM (2016) Breast cancer classification using deep belief networks. Expert Syst Appl 46:139–144
118
B. Dinç et al.
5. Pérez-Ortiz M, Torres-Jiménez M, Gutiérrez PA, Sánchez-Monedero J, Hervás-Martínez C (2016) Fisher score-based feature selection for ordinal classification: a social survey on subjective well-being. In: Lecture notes in computer science hybrid artificial intelligent systems, pp 597–608 6. Gu Q, Li Z, Han J (2011) Generalized fisher score for feature selection. In: UAI’11: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp 266–273 7. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537 8. Noulas AK, Krose BJA (2008) Deep belief networks for dimensionality reduction. In: BelgianDutch Conference on Artificial Intelligence, pp 185–191 9. Larochelle H, Mandel M, Pascanu R, Bengio Y (2012) Learning algorithms for the classification restricted Boltzmann machine. J Mach Learn Res 13:643–669 10. Salama MA, Hassanien AE, Fahmy AA (2010) Deep belief network for clustering and classification of a continuous data. In: The 10th IEEE International Symposium on Signal Processing and Information Technology 11. Pal SK, Mitra S (1992) Multilayer perceptron, fuzzy sets and classification. IEEE Trans Neural Netw 3(5):683–696 12. Gardner MW, Dorling SR (1998) Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos Environ 32(14–15):2627–2636 ¨ 13. Ethem A (2013) Çok Katmanlı Algılayıcılar. In: Yapay o¨ g˘ renme. Bo˘gazi¸ci Universitesi Yayınevi, ˙Istanbul, pp 210–215
An Improved Inception Layer-Based Convolutional Neural Network for Identifying Rice Leaf Diseases B. Baranidharan, C. N. S. Vinoth Kumar, and M. Vasim Babu
Abstract Technological intervention in agriculture is essential for its thriving. Particularly, in the crops like paddy, delayed identification of disease causes major economic losses. Early identification of diseases will help the farmers to take the proper course of action and save their crops. In this paper, an improved inception CNN (I-CNN) model is proposed for early identification of rice leaf disease. Though many CNN models are existing, none of the existing models are sufficient enough to identify rice leaf diseases in its early stage. In many cases, it is found out that early identification of diseases will help take appropriate action and solve it. For comparison, the pre-trained models like AlexNet, VGG16 are compared with the proposed I-CNN model. All the models are tested and compared by varying the learning rate of 0.01, 0.001, 0.0001 and using different optimizers such as SGD and Adam. The proposed I-CNN achieved the highest accuracy of 81.25%, whereas the best accuracies of AlexNet and VGG16 are 72.5 and 62.5%, respectively. Keywords Inception network · CNN · Rice leaf diseases
1 Introduction In India, the economic contribution of agriculture is a great deal that affords 16% of the overall GDP and 10% of whole exports and has a major role in the country’s economic growth [1]. But, on comparing with the contribution of other sectors, the gross contribution of agricultural production in the country’s GDP has been B. Baranidharan (B) · C. N. S. V. Kumar Department of CSE, Faculty of Engineering and Technology, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Kanchipuram, Chennai, TN 603203, India e-mail: [email protected] C. N. S. V. Kumar e-mail: [email protected] M. V. Babu Department of ECE, KKR & KSR Institute of Technology and Sciences, Guntur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_10
119
120
B. Baranidharan et al.
descending in recent years. The major cause behind the decreasing contribution is the poor production of crops due to diseases and high usage of pesticides. Therefore, prior identification of disease is essential for confronting this problem. Oryza sativa, usually called as paddy, is one of the world’s significant agricultural crops because it serves as a staple for over 2.7 billion people [2]. Particularly, in terms of rice production, India comes next to China, where China is the largest rice producer in the world. Though India is the second largest rice producer, each year the farmers are pushed into multiple troubles like plant diseases, water shortage and pests in plants which leads to 37% of crop loss [3]. It is difficult to predict the prior signs of impairment as it is very minute and hence exceeding human visual capacity. An expert analysis would be a possible remedy for this problem. Whereas, the distant expert availability creates discomfort for the farmers and makes the process economically costlier [4]. So, immediate diagnosis of these plant diseases can be achieved through technologically supported automatic early detection techniques and categorization of symptoms in the plants. By this way, major losses can be prevented. Therefore, there is a need to have technological advancement systems which impulsively recognizes indications of the sickness and identifies the suitable analysis. These indications mainly occur in different portions of the plant like leaves and the stem region. Consequently, identifying the expressive patterns on plants, leaves and stems plays a significant part in the quality of diagnosis. An artificial intelligence (AI)based system is capable of identifying the expressive patterns in order to identify and classify the plants affected by the disease. AI systems bypass the human involvement and lead to a more precise and objective decision concerning the illness contagion and its additional analysis [5]. There are huge number of diseases distressing the rice crop, yet this paper considers three major rice diseases: (i) leaf blast, (ii) brown spot and (iii) rice hispa. These three rice leaf diseases are taken since it affects the crop most number of times. Out of the above mentioned three, leaf blast and brown spot are pure rice leaf diseases, and rice hispa occurs because of pest attack. The paper is organized as follows, Sect. 2 made a survey on related works, Sect. 3 on the architecture of CNN, Sect. 4 on proposed model, Sect. 5 on results and Sect. 6 concludes with future scope.
2 Literature Survey Thenmozhi et al. [6] has designed a new improved deep CNN (DCNN) for classifying the pests which affects the plants. The proposed DCNN model aims to identify the pest infection in the crop at the earlier stage and save the crop from its vulnerability. Their DCNN model was compared with pre-trained AlexNet, ResNet and VGG models. They have taken around 40 different kinds of pests from three different sources. The proposed DCNN model comprises six convolutional layers, five max pooling layers, one fully connected layer and the output layer with Soft-max classifier.
An Improved Inception Layer-Based Convolutional Neural Network …
121
DCNN achieves the highest accuracy of 97.47% for 24 classes of insect, whereas AlexNet achieves 94.23%, ResNet achieves 95.87%, and VGG achieves 96.25%. Thus, the proposed DCNN achieved a significant improvement over pre-trained models. Lu et al. [7] put forth that a new CNN model for predicting rice leaf diseases. They identified ten common rice leaf diseases using their CNN model. But in their dataset, only 500 images are used which is not sufficient for training and validating a CNN model to be deployed in a real time. They used stochastic pooling method for dimensionality reduction, whereas max pooling and average pooling are mostly used in such CNN models. Suma et al. [8] developed a CNN model on identifying the plant leaf diseases at its earlier stage. The images whose resolution is less than 500 pixels are not taken for training because of negligible useful feature extraction. The preprocessing steps like background noise removal, reflection removal are done over the image using Python Open CV library function. But in their paper, there are no proper details regarding image source. Liang et al. [9] developed a CNN-based rice blast recognition approach. The authors have compared their CNN model with well-known feature extraction methods such as LBPH and Harr-WT. Two CNN models are proposed in their work. The first CNN model has three fully connected layers, and the second CNN model has two fully connected layers. For final output classification, support vector machine (SVM) is used. They found out that the proposed CNN-SVM gives better improved results than LBPH-SVM and Haar-Wt–SVM combinations. Rahman et al. [10] have designed a stacked CNN model which is suitable for a mobile device. The authors have taken a total of nine classes for classification of rice leaf diseases. Out of nine, five are rice leaf disease, and three are due to pest attack and one healthy leaf class. The training phase follows two stages, in first stage, they are training by creating 17 output classes by sub-dividing the existing nine classes. In the second stage of training, the top layers of previously trained model are retained, and the final layer which classifies into 17 classes is changed into nine classes. Because of this two stage training, the number of trainable parameters is reduced by 98% when compared with VGG model, but the obtained accuracy is 95%. Rajmohan et al. [11] proposed a mobile app-based rice plant sickness discovery and management mechanism. Their contribution comes in two phases: (i) Disease identification by using sensors in the field and (ii) Disease management, where the information regarding disease infection will be sent to the farmer’s mobile device. For disease identification, they used CNN for feature extraction and SVM for classification. They claimed that their combined CNN-SVM model achieved 87.50% accuracy. Joshi et al. [12] suggested the system for distinguishing and characterizing rice diseases using minimum distance classifier (MDC) and KNN classifier. Their work includes pre-handling, segmentation, feature extraction and grouping. Totally, they used 115 images which belong to four major rice diseases. In the preparation stage, all the pictures have been trained by extricating their core features. This trained dataset is additionally utilized as reference information for the classifier to classify the test
122
B. Baranidharan et al.
set images. MDC achieves the maximum accuracy of 89.23% and KNN achieves 87.02%. Ramesh et al. [13] developed a rice disease classification system based on KNN and ANN. KNN-based classifier achieves the accuracy of 86% with K value equal to 3. ANN achieves the maximum accuracy of 100% for rice blast disease identification. In total, they used 350 images for training which is not substantial for deploying such systems in the real time. Barbedo [14] analyzed the effect of applying deep neural network for plant disease classification. The author has critically reviewed the effect of neural architecture on plant pathology. The following recommendations are given in Barbedo work, (i) Data should be available in public, (ii) CNN model for identifying the plant leaf disease should be specific for the particular crop. (i.e.) There cannot be any generic CNN model for detecting plant diseases for all the crops and (iii) For effective identification of plant disease images of plant parts other than leaves are desirable. Some authors deployed machine learning-based techniques for soil classification [15] and disease detection in plant leaves [16]. From the above literature review, it is found out that there are many CNN models for rice disease identification, but none of them demonstrated their capability to achieve better accuracy with small dataset and varying infection spread. One more peculiar thing in a diseased rice leaf is that the spread of infection varies from image to image in a large extent. Also, for training a CNN models, there is a need of huge volume of diseased leaf images which is not practically available. Even, in the datasets which have substantial number of images, it does not capture different spreads of infection. Inception network which uses different kernel size over the same image will be a useful approach in the problems where we are suffering from low volume of data. The different kernels applied over the image will capture the different and distinctive features at the same layer which is not possible in a normal CNN. In this paper, a new inception-based CNN (I-CNN) model is proposed and compared with other well-known CNN models such as AlexNet and VGG16.
3 General Architecture of Convolutional Neural Networks 3.1 Layers of CNN CNN is a deep learning algorithm [17] which takes pictures as input, learns about it using different weight parameters and differentiates the objects based on it. The spatial enslavement and worldly enslavement in a picture are effectively caught using filters. The CNN model performs better because of the reusability of the loads and the bringing down of the huge number of parameters included. CNN has following layers. Convolutional Layers: It consists of kernel or filter which is usually a 1 × 1, 3 × 3 or 5 × 5 matrix. The kernel is convoluted with the input image matrix which results
An Improved Inception Layer-Based Convolutional Neural Network …
123
in a convoluted matrix. The kernel will move through the entire image matrix till the entire image matrix is traversed. In a CNN model, many such convolutional layers will be there. The first convolutional layer captures the lowest feature of an image, and the subsequent layer will capture high level features. Max Pooling Layers: It is used for dimensionality reduction. Like in convolutional layer, a filter will be traversing the entire image vector to identify the most influential feature by selecting the highest value in its region. Another kind of pooling layer called average pooling is there which takes the average value of the kernel space over the image vector. Fully associated layers: It is the last layer that changes over the yield of the past layers into a solitary vector to be taken care of into the characterization yield of a CNN.
3.2 Batch Normalization This procedure assists each level to learn all alone without the guide of its previous layers [18]. Thus, it balances and stabilizes the learning procedure. It may help to reduce the number of epochs needed for training and reduces generalization errors.
3.3 Categorical Cross Entropy It is a blend of Soft-max alongside cross entropy loss utilized for multi-class grouping [19]. It contrasts the possible conveyance and the genuine expectation dissemination wherein the genuine class is a one-hot encoded vector. The closer the yield esteems are to this specific vector, the lower the failure rate. The Eq. (1) is yielded where s p connotes CNN score for positive class and s j is the score for different classes and C represents number of output classes.
es p CE = − log C sj j e
(1)
3.4 Optimizers Optimizers are used for updating the weight parameters. Loss functions are used as guide to indicate the optimizers whether it is going toward in the direction of global
124
B. Baranidharan et al.
Fig. 1 Image before and after segmentation
minimum. SGD, Adagrad, Adadelta, ADAM, RMSprop are the most commonly used optimizers for CNN models.
4 Proposed Work The proposed model has two phases. The first stage is applying the proper preprocessing techniques over the images. The second stage is training the proposed I-CNN model.
4.1 Preprocessing The input images are first converted from RGB to HSV images. Then, the converted images are further smoothened by applying averaging mask. HSV can be considered an alteration of the RGB, since the colorimetry and the consisting parts are relative to its derivative being RGB color space. The HSV model is applied over RGB in conditions where color description is the vital component as it perceives colors in a similar fashion as the human eye. The procedure subsequently trailed with use of the Gaussian channel which encourages noise decrease with softening the image and diminishing contrast. K-means clustering algorithm is used to carry out segmentation over the input images. Segmentation is used to differentiate foreground and background objects in an image. Also, lot of background noise will be reduced using this segmentation technique. It helps to parceling region of notice, the sick pace of the environment. The K is set to 3, for acquiring three clustered environment: Background, leaf part and the infected area in the leaves. Figure 1 shows an image before and after segmentation.
4.2 I-CNN The proposed I-CNN model has four convolutional layers, three inception layers and three fully connected layers. Each convolutional and inception layer is followed by a max pooling layer, and finally, a Soft-max classifier is used. C in Soft-max is set
An Improved Inception Layer-Based Convolutional Neural Network …
125
to 3 since three leaf diseases are going to be identified. ReLU activation function is used after convolutional layer. The first, second, third and fourth convolutional layers have 92, 224, 386 and 128 kernels, respectively. Figure 2 depicts the layers of the proposed I-CNN. A naive inception layer essentially concatenates filters of three different sizes—1 × 1, 3 × 3 and 5 × 5 along with a max pooling layer to provide a single output hence reducing the cost of computation. The three different sized kernels will able to capture the complex features in the training image. Figure 3 depicts the inception layer of I-CNN. The batch size of 32 images is used for training the I-CNN. The three fully connected layers have 1200, 600 and 150 neurons, respectively, and their dropout ratio is set to 0.5, 0.4 and 0.4, respectively. The dropout’s ratios are carefully chosen to reduce the over fitting problem.
Fig. 2 I-CNN layered architecture
Fig. 3 Inception layer of I-CNN
126
B. Baranidharan et al.
5 Experimental Setup and Result Discussions All the segmented training images and testing images are uploaded in Google drive environment. The CNN models such as Alexnet, VGG16 and the proposed I-CNN model are developed using Keras framework and implemented in Google co-lab environment. The dataset consists of 507 images of each of three diseases—leaf blast, brown spot and rice hispa totaling to 1521 images. Training and testing set ratio is 80:20. The existing CNN models AlexNet, VGG16 are compared with proposed I-CNN model based on accuracy. Cross entropy loss function is used as the loss function model. All the above models are tested using SGD and Adam optimizers in three different learning rates. The learning rates are 0.01, 0.001 and 0.0001. All the models are trained for 30 epochs, and Table 1 shows the maximum accuracy of the CNN models in different learning rates and optimizers. Figures 4 and 5 depict the same in graphical manner. In all the cases, the proposed model I-CNN is giving much better Table 1 Maximum validation accuracy of CNN models CNNModels
OPTIMIZERS SGD
Adam
Learning rates
Learning rates
0.01
0.001
0.0001
0.01
0.001
0.0001
Alexnet (%)
72.5
60.75
VGG-16 (%)
60.85
58.5
59.25
68.75
62.5
61.25
55.25
62.5
60.75
55.5
I-CNN (%)
73.5
70
72.75
81.25
72.5
75
Fig. 4 Maximum validation accuracy of the CNN models in different learning rates using SGD
An Improved Inception Layer-Based Convolutional Neural Network …
127
Fig. 5 Maximum validation accuracy of the CNN models in different learning rates using Adam
improved results than AlexNet and VGG16. From the results, it is observed that, at learning rate 0.01 and using Adam optimizer, I-CNN gives it maximum accuracy. In using SGD optimizer, I-CNN gives the maximum accuracy of 72.5% at the learning rate of 0.01 which is 1.36 and 17.21% better than AlexNet and VGG16, respectively. At the learning rate of 0.001, I-CNN is 13.21 and 16.42% better than AlexNet and VGG16, respectively. Then, at the learning rate of 0.0001, I-CNN shows improvement of 18.55 and 24.05% than AlexNet and VGG16. At the learning rate of 0.01 using Adam optimizer, I-CNN gives the maximum accuracy of 81.25% which is 15.38 and 23.07% better than AlexNet and VGG16 models. At the learning rate of 0.001, I-CNN shows improvement by 13.79 and 16.20% than AlexNet and VGG16, respectively. Then, at 0.0001 learning rate, I-CNN is better than 18.33 and 26% than AlexNet and VGG16, respectively. It is also observed that in both SGD and Adam optimizer, when the learning rate is reduced, the performance of I-CNN improves over AlexNet and VGG16. In all the different cases, I-CNN is giving improved results than AlexNet and VGG16 because of its inception layer. The different kernels used in inception layer capture different and distinctive hidden features in an image effectively than a normal CNN filters which is the major reason for accuracy improvement in I-CNN. Also, I-CNN will be effective at the situations where less number of training images are there.
6 Conclusions In this paper, a new inception-based CNN (I-CNN) model is proposed for identifying the rice leaf diseases. I-CNN is compared with most familiar pre-trained CNN models such as AlexNet and VGG-16. The models are compared on different learning rates
128
B. Baranidharan et al.
such as 0.01, 0.001 and 0.0001 using SGD and Adam optimizers. I-CNN achieved the maximum classification accuracy of 81.25%. Comparing the less number of training images, the achieved accuracy is the best one. In future, with a substantial increase in data volume, accuracy can be improved further. Also, more hyperparameter tuning is planned in future to improve the performance.
References 1. Himani G (2014) An analysis of agriculture sector in Indian economy. IOSR J Humanit Soc Sci (IOSR-JHSS) 19(1):47–54 2. India at a glance. https://www.fao.org/india/fao-in-india/india-at-a-glance/en/. Accessed 17 Nov 2019 3. Sharma B, Yadav JKPS (2020) Predict growth stages of wheat crop using digital image processing. Int J Recent Technol Eng (IJRTE) 8(5):3026–3035 4. Pandya C, Sharma LK (2018) A review on recent methods to detect leaf disease symptoms using image processing. Int J Sci Res (IJSR) 7(4):1339–1341 5. Mavridou E, Vrochidou E, Papakostas GA, Pachidis T, Kaburlasos VG (2019) Machine vision systems in precision agriculture for crop farming. J Imaging 5(89):1–32 6. Thenmozhi K, Reddy US (2019) Crop pest classification based on deep convolutional neural network and transfer learning. Comput Electron Agric 164:104906 7. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep convolutional neural networks. Neurocomputing 267:378–384 8. Suma V, Shetty RA, Tated RF, Rohan S, Pujar TS (2019) CNN based leaf disease identification and remedy recommendation system. In 2019 3rd International Conference on Electronics, Communication and Aerospace Technology (ICECA). IEEE, pp 395–399 9. Liang WJ, Zhang H, Zhang GF, Cao HX (2019) Rice blast disease recognition using a deep convolutional neural network. Sci Rep 9(1):1–10 10. Rahman CR, Arko PS, Ali ME, Khan MAI, Apon SH, Nowrin F, Wasif A (2020) Identification and recognition of rice diseases and pests using convolutional neural networks. Biosys Eng 194:112–120 11. Rajmohan R, Pajany M, Rajesh R, Raman DR, Prabu U (2018) Smart paddy crop disease identification and management using deep convolution neural network and SVM classifier. Int J Pure Appl Math 118(15):255–264 12. Joshi AA, Jadhav BD (2016) Monitoring and controlling rice diseases using Image processing techniques. In 2016 International Conference on Computing, Analytics and Security Trends (CAST). IEEE, pp 471–476 13. Ramesh S, Vydeki D (2019) Application of machine learning in detection of blast disease in South Indian rice crops. J Phytolo 31–37 14. Barbedo JG (2018) Factors influencing the use of deep learning for plant disease recognition. Biosys Eng 172:84–91 15. Kumar S, Sharma B, Sharma VK, Poonia RC (2018) Automated soil prediction using bagof-features and chaotic spider monkey optimization algorithm. Evol Intel. https://doi.org/10. 1007/s12065-018-0186-9 16. Kumar S, Sharma B, Sharma VK, Sharma H, Bansal JC (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput Inf Syst. https://doi.org/10. 1016/j.suscom.2018.10.004 17. Stewart M (2019) Simple Introduction to Convolutional Neural Networks. https://towardsdatas cience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac. Accessed 7 Nov 2019
An Improved Inception Layer-Based Convolutional Neural Network …
129
18. Brownlee J (2019) A Gentle Introduction to Batch Normalization for Deep Neural Networks. https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neu ral-networks/. Accessed 11 Nov 2019 19. Gomez R (2019) https://gombru.github.io/2018/05/23/cross_entropy_loss/. Accessed 5 Nov 2019
Design and Implementation of Traffic Sign Classifier Using Machine Learning Model Samarth Patel, Pankaj Agarwal, Vijander Singh, and Linesh Raja
Abstract Recent decade witnesses the advancement in computing hardware and power. This advancement is the result of machine learning and artificial intelligence. One of the important advancements took place in development of autonomous cars. It was one of the long-standing points of research for scientists. The crucial components of any autonomous car are being able to recognize traffic signs. Traffic signs serve as the non-verbal communication channel on the road. Thus, the task of recognizing traffic signs is a challenge. Moreover, due to the rise in number of drivers on the road, the need for a system that can automatically recognize traffic signs has increased. As a result, this manuscript aims at solving the challenges of traffic sign recognition. Keywords Machine learning · ANN · Traffic sign · SVM · Traffic
1 Introduction There are many ways to build a solution that is capable of recognizing traffic signs ranging from standardizing the signs on the road all over and ensuring that they meet those standards at all times and then building a static system capable of recognizing only the signs fitting the rigid criteria to building a system that is modelled after human learning, that is to say that the model is not rigidly programmed on very specific representations of traffic signals, but is instead taught to recognize them S. Patel · P. Agarwal Amity University Rajasthan, Jaipur, India e-mail: [email protected] P. Agarwal e-mail: [email protected] V. Singh · L. Raja (B) Manipal University Jaipur, Jaipur, India e-mail: [email protected] V. Singh e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_11
131
132
S. Patel et al.
by experience. The paper takes the latter approach and aims to build a system that learns through experience. The experience in this context is training dataset. To build a system that can learn, we create a neural network which learns through training by adjusting the internal weights associated with each node, much like the human brain. Further, we test the system on a different dataset to verify whether it has learned. If the results do not meet a threshold accuracy level, then the model is revised to further tune and improve the learning, the major components of this project being, creating the neural network, acquiring, cleaning and consolidating the dataset, creating the pipeline to supply the model with batches of data, tuning the hyperparameters and presenting the metrics in a clear manner to understand the various phenomenon occurring inside the model. Lastly, a classification report for the model is calculated to evaluate its performance. The report makes use of several metrics other than accuracy such as precision, recall and F1-score to determine the true learning of the model. Moreover, a graph depicting the various curves of loss and accuracy is plotted to determine the performance of the model. The need for this research is listed below: • One of the major causes of loss of human lives is car accidents, which account for 1.25 million deaths every year that is 3287 deaths each day globally [1] and around 4 lakh per year in India alone [2]. That is despite car crashes being a preventable problem. • Further, there are no low-cost classification solutions developed yet. • Integration with vehicles could result in better compliance with traffic rules.
2 Literature Review Over the past, many attempts and improvements have been made to try and recognize traffic signs automatically. There were many research papers published which revolutionized how cars read traffic signs. The Vienna Convention on Road Signs and Signals is a multilateral treaty signed on 8th November 1968 which aimed at increasing road safety and aiding international road traffic by standardizing signing system for road traffic in use across various countries. About 52 countries have signed this treaty, and it has been ratified by 15 countries. One of the first attempts at developing automated traffic sign detection was made in Japan in 1984. After this attempt, several new methods were introduced by different researchers to develop an efficient traffic sign recognition system. Automatic road sign recognition can be divided into two stages: initial primary detection of candidate signs within the image and secondary recognition of the type of sign present. Various prior approaches have been proposed by researchers for both these stages.
Design and Implementation of Traffic Sign Classifier …
133
Fig. 1 Early detectors
2.1 Primary Detection Initial detection with help of colour separation in the hue, saturation and variance (HSV) colour space as proposed by Maldonado-Bascón [3], method proposed by Moutarde [4] which solely relies on shape characteristics without any prior colour segmentation. Regarded as one of the most influential innovations of past decade, a paper by Andrzej Ruta and team is proposed to make use of the specific characteristics of road signs to quickly identify candidate signs in any scene [5]. Departing from standard image recognition methods, they focused more on the special characteristics of the traffic signs. For example, road signs use limited number of colours—red, blue, green, white, black and yellow—and they have regular shapes, either circle, triangle or square, etc. Therefore, there are a relatively small number of possible candidate signs. Basically, they combined the two methods to effectively address the problem as a process of elimination and quickly narrow in on the correct answer as shown in Fig. 1. Other shape-based approaches have also been proposed in this area, like generalization of classical Hough transform as proposed by Ruta [5], template-based matching and even a direct support vector machine-based.
2.2 Secondary Recognition Machine learning-based classification algorithm is usually used to perform recognition. Commonly, these algorithms are based on artificial neural network (ANN), or in some more recent works, approaches based on support vector machine (SVM) are
134
S. Patel et al.
Fig. 2 Similar looking signs
used. These steps of the algorithm may also be divided based on the classifier input. Torresen [6] proposed to extract just single digits from the sign candidates and use just the left one as input in the classification process. The Moutarde [4] proposed to recognize multiple extracted digits separately. But both these methods can falsely classify various similar looking signs, and Fig. 2 presents one such example sign. Use of the whole sign as an input was proposed by Damavandi [7] which achieves much higher rate of recognition using a neural network-based approach. There are various methods proposed in literature for identification of age, gender [8], race [9], shape [10], thresholding [11], segmentation [12], watermarking [13] and classification of soil [14] and plant leaves [15].
2.3 Research Gap There have been numerous approaches take to build traffic sign classifiers. It includes geometrical methods to create a model for a traffic sign using neural networks to learn the features of a sign. Some of the gaps that have been identified include: • Most of the models proposed are trained on a small dataset with limited categories. • Moreover, the dataset is either synthetically generated or pertains to a specific region [16]. • Little attempts have been made to amalgamate traffic signs from different regions to create a more generalized model. Here, author proposed a recognition system or a classifier using previously proven whole sign-based approach with a modular convolutional neural network model that can be scaled to accommodate future classes with adaptive histogram equalization (ADE) more specifically contrast limited ADE to achieve improved performance over all lightening conditions.
3 Problem Formulation The paper involves understanding the mechanism behind human vision and creating a system inspired by it. The main challenges involve finding the transformation
Design and Implementation of Traffic Sign Classifier …
135
methods to reduce the dimensionality of the image while retaining the important features such as shape, contrast and edge information. Moreover, after reducing the dimensions of the images, we need to create a neural network that is capable of learning up to 40,000 images across 43 categories and then have a testing accuracy of above 85%, while taking into account various real-life scenarios of rotation of signs, a little difference in dimensions, distance from the sign, etc. This paper aims at achieving the following objectives: • To create a machine learning model that is capable of recognizing traffic signals. • To tune the hyperparameters effectively. • To achieve an accuracy above 85%. The methodology of the research work is derived in Fig. 3.
Fig. 3 Proposed methodology
136
S. Patel et al.
4 Derivations The proposed model is capable of classifying various traffic signs ranging from consolidating various datasets, splitting the final dataset into train-test split, cleaning and processing label information from different datasets and pre-processing image by applying appropriate transformations such as histogram equalization; moreover, it involves creating a neural network with appropriate layers that are capable of learning the complexity and depth of a dataset consisting of over 40,000 images. Further, tuning the hyperparameters appropriately is to achieve high accuracy.
4.1 Dataset The two datasets, namely GTSRB (European Traffic Signs) and TSRD-mini (Asian Traffic Sign), are considered. The TSRD-mini dataset only contains about 500 images belonging to a single category. Thus, the dataset was divided into 75% training and 25% testing split and added as a 43 category. Moreover, the GTSRB dataset contains over 50,000 total images.
4.2 Model Design The model needs to be complex enough so that it can ingest a dataset of 50,000+ images of different categories and learn the important features that distinguish each category. It uses a structure that has the capability to understand images using a mechanism modelled after process of biological vision. Author uses convolutional neural networks as it is a structure that functions in a manner akin to human sight. Convolutional neural network work is currently state of the art in classification tasks. They work by first extracting the basic features from an image like edges, and then they subsequently build more complex features such as shapes from those basic features and go on to create objects from shapes. The model has the following layers (Fig. 4): • Input Layer: This is the layer where a vectorized image is served as an input to its subsequent layer. In our architecture, this layer has the following shape: – 32 × 32 × 3 Where 32 × 32 are the width and height of the image, respectively, and 3 is the channel depth, which in this case corresponds to RGB. • Hidden Layers: These layers are called hidden layers because their outputs are not directly observable by the user. These layers each learn a number of weights which help them classify images.
Design and Implementation of Traffic Sign Classifier …
137
Fig. 4 Basic architecture of SignBot model
• Output Layer: This layer outputs a 43-element vector of probabilities for each image. Here, the output layer uses a SoftMax activation function. This is called output layer because its output is directly observable. Thus, the outputs are probability vectors of size 43.
4.3 Training the Model on the Dataset Initially, the arguments are parsed using an argument parser to systematically read them along with their switches. The switches are then used as indexes to reference the arguments. The images stored inside the dataset directory are then read into dedicated lists along with their labels. These lists are then passed to the fit_generator method to fit the model to the dataset. After reading the image paths from the training and testing set, we shuffle them to avoid skewing the network weights by giving it images from the same category. The images are converted to matrix representations, to be processed by applying various transformations. First, the image is resized to 32 × 32 dimensions. After the resizing, the image undergoes CLAHE histogram equalization. Contrast limited adaptive histogram equalization (CLAHE) is a technique that distinctly calculates several histograms for distinct regions of the image and uses them to redistribute the light in the image; moreover, it also prevents amplification of noise in the image which could arise in case we apply adaptive histogram equalization (AHE) [17]. The resultant image has enhanced contrast values around edges which are helpful in detecting shapes of the traffic sign in an image. Then each transformed image is converted a numpy array and appended to convert ImageMatrix list (Fig. 5). The shuffle_and_load() function takes care of consolidating all the images from different categorical directories, reading the image labels, and finally returns the consolidated lists. After loading the images, they are scaled in the range [0,1], and the reason for scaling the images allows us to normalize the pixel data of images so that the weights in the network does not occupy a large range. Moreover, they are also converted to float32 so that I can be supplied to the flow () method. Following the rescaling operation, we one-hot encode the labels. All the label categories ranging from 0 to 43 are vectorized. As a result, class 3 is represented as a vector of size 43
138
S. Patel et al.
Fig. 5 CLAHE transformed signal image
with its value at index 3 as 1 and the rest of the values as 0 s. One-hot encoding. Most of the classification algorithms accept data in one-hot encoded format. As observed by plotting the dataset distribution, the dataset is skewed as there are a different number of images under each category. As a result to correct this skew in the dataset, a weight is assigned to each class during training. The various changes in the alignment of traffic signs on the road are considered by augmenting the dataset to include transformations of images such as rotation, distance from the sign, shearing, width shifting, height shifting. This augmentation is performed using ImageDataGenerator() function of TensorFlow. The trained network is then saved for later evaluation and testing by serializing it as a prototype file which contains the weights of the trained model as well as the graph representation of the model.
4.4 Parameter Selection The following parameters identified for the result and discussion: • • • • •
Loss Function: Categorical cross-entropy (CC) Optimization Function: Adaptive momentum (Adam) Learning Rate: No. of Epochs: 30 Batch Size: 64.
5 Result Analysis and Discussion After training the model, it is important to test it to ensure its efficacy. During this phase, the model receives previously unseen images and needs to classify them correctly. Testing helps in finding out how the model performs. In this project, we test the model on 12,652 images, and the trained model needs to classify them correctly into one of the 43 categories.
Design and Implementation of Traffic Sign Classifier …
139
Moreover, to check that the model performs on both the training and testing dataset, we make use of the following metrics: • • • • • •
Total positive (P): the total number of real positive cases in the data, Total negative (N): the total number of real negative cases in the data, True positive (TP): equivalent to a hit, True negative (TN): equivalent to a correct rejection, False positive (FP): equivalent to a false alarm, Type I error, False negative (FN): equivalent to a miss.
Precision refers to the number of true positives identified by the model out of the total number of samples presented [18]. Precision =
True Positive True Positive + False Positive
Recall refers to the total number of true positives identified by the model out of only the total number of positives present in the samples [18]. Recall =
True Positive True Positive + False Negative
F1-score refers to the harmonic mean of the precision and recall. It is defined by the following formula: F1 =
2 +
1 precision
1 recall
The results obtained after testing we obtain a weighted accuracy of 0.95, i.e. 95% (weighted accuracy is the metric that takes in to account the skew in the dataset). Subsequently to ensure that the model follows the loss curve, and as a result, there is no underfitting or overfitting, we plot a graph (Fig. 6). Here, we observe that the validation loss curve closely follows the train loss curve which is an indication that the model is not overfitted to the training dataset. Moreover, we also note that the val_acc curve also follows the training accuracy curve closely and reaches near to 1 which is the ideal benchmark for a classifier.
6 Conclusion and Future Scope This paper aimed to solve the challenge of classifying traffic signs, since this is one of the important aspects of realizing full autonomy in driving. The paper aimed at achieving an accuracy of greater than 85%. The proposed model is capable of handling large datasets and wide variety of classification categories, and effectively tuning the hyperparameters. As observed during the testing and evaluation phases of
140
S. Patel et al.
Fig. 6 Loss/accuracy versus epoch plot
the project, we observe that the model has achieved an accuracy of 95%. Moreover, as observed by the graph plotted during the testing phase, we note that the model is not underfitting or overfitting the data heavily.
References 1. Association for Safe International Road Travel (2019) Road safety facts—Association for Safe International Road Travel. [online] Available at: https://www.asirt.org/safe-travel/road-safetyfacts/. Accessed 4 June 2020 2. Morth.nic.in. (2019) Ministry of Road Transport & Highways, Government of India. [online] Available at: https://morth.nic.in/. Accessed 4 June 2020 3. Maldonado-Bascón S, Lafuente-Arroyo S, Gil-Jimenez P, Gómez-Moreno H, López-Ferreras F (2007) Road-sign detection and recognition based on support vector machines. IEEE Trans Intell Transp Syst 8(2):264–278 4. Moutarde F, Bargeton A, Herbin A, Chanussot L (2007) Robust on-vehicle real-time visual detection of American and European speed limit signs, with a modular Traffic Signs Recognition system. In: 2007 IEEE intelligent vehicles symposium. IEEE, pp 1122–1126 5. Ruta A, Porikli F, Li Y, Watanabe S, Kage H, Sumi K (2009) A new approach for in-vehicle camera traffic sign detection and recognition. MERL Report: TR2009-027, 2009, pp 1–7 (2009) 6. Torresen J, Bakke JW, Sekanina L (2004) Efficient recognition of speed limit signs. In: Proceedings. The 7th international IEEE conference on intelligent transportation systems (IEEE Cat. No. 04TH8749). IEEE, pp 652–656 7. Damavandi YB, Mohammadi K (2004) Speed limit traffic sign detection and recognition. In: IEEE conference on cybernetics and intelligent systems, 2004, vol 2. IEEE, pp 797–802 8. Gupta R, Kumar S, Yadav P, Shrivastava S (2018) Identification of age, gender, & race SMT (scare, marks, tattoos) from unconstrained facial images using statistical techniques. In: 2018
Design and Implementation of Traffic Sign Classifier …
9. 10.
11. 12.
13.
14.
15.
16.
17.
18.
141
international conference on smart computing and electronic enterprise (ICSCEE). IEEE, pp 1–8. https://doi.org/10.1109/ICSCEE.2018.8538423 Gupta R, Yadav P, Kumar S (2017) Race identification from facial images using statistical techniques. J Stat Manage Syst 20(4):723–730 Yadav P, Gupta R, Kumar S (2019) video image retrieval method using dither-based block truncation code with hybrid features of color and shape. In: Engineering vibration, communication and information processing. Springer, Singapore, pp 339–348 Sharma A, Chaturvedi R, Kumar S, Dwivedi UK (2020) Multi-level image thresholding based on Kapur and Tsallis entropy using firefly algorithm. J Interdiscip Math 23(2):563–571 Sharma A, Chaturvedi R, Dwivedi UK, Kumar S, Reddy S (2018) Firefly algorithm based Effective gray scale image segmentation using multilevel thresholding and Entropy function. Int J Pure Appl Math 118(5):437–443 Chaturvedi R, Sharma A, Dwivedi U, Kumar S, Praveen A (2016) Security enhanced image watermarking using mid-band DCT coefficient in YCbCr space. Int J Control Theory Appl 9(23):277–284 Kumar S, Sharma B, Sharma VK, Poonia RC (2018) Automated soil prediction using bagof-features and chaotic spider monkey optimization algorithm. Evol Intel. https://doi.org/10. 1007/s12065-018-0186-9 Kumar S, Sharma B, Sharma VK, Sharma H, Bansal JC (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput Inf Syst. https://doi.org/10. 1016/j.suscom.2018.10.004 Moiseev B, Konev A, Chigorin A, Konushin A (2013) Evaluation of traffic sign recognition methods trained on synthetically generated data. In: International conference on advanced concepts for intelligent vision systems. Springer, Cham, pp 576–583 Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, ter Haar Romeny BM, Zimmerman JB, Zuiderveld K (1987) Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39:355–368 In-text: (Precision and recall, 2020), Your Bibliography: En.wikipedia.org. 2020. Precision and Recall. [online] Available at: https://en.wikipedia.org/wiki/Precision_and_recall#:~:text=In% 20pattern%20recognition%2C%20information%20retrieval,of%20relevant%20instances% 20that%20were. Accessed 28 May 2020
Designing Controller Parameter of Wind Turbine Emulator Using Artificial Bee Colony Algorithm Ajay Sharma, Harish Sharma, Ashish Khandelwal, and Nirmala Sharma
Abstract In purview of limitedly available conventional energy sources, wind energy is the need of the day. It is available in plenty and is free of pollution as well. The present wind energy conversion systems (WECSs) consisting of wind turbines (WTs) are doing exceptionally well. To design a more accurate WT and efficient WECS is attracting researchers from the last many years. Further, looking at the size and remote location of WECSs it is not feasible to experiment for variation in their on-site experiments. That’s why researchers are assuming WTs using motors in laboratories as wind turbine emulators (WTEs). So, it becomes an interesting problem to experiment a motor as WTE. In this paper, a separately excited DC (SEDC) motor is used to design a WTE. The parameters of PI controller installed in WTE are designed using artificial bee colony algorithm (ABC) algorithm. The ABC is selected as it is performing well to solve design optimization problems. Keywords Wind energy · WTE · Swarm intelligence · ABC · Wind turbines · PI controller
A. Sharma (B) Government Engineering College, Jhalawar, India e-mail: [email protected] H. Sharma Rajasthan Technical University, Kota, India e-mail: [email protected] A. Khandelwal Government Engineering College, Jhalawar, India e-mail: [email protected] N. Sharma Rajasthan Technical University, Kota, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_12
143
144
A. Sharma et al.
1 Introduction The wind energy is an utmost research area of interest in present scenario. This energy ia available in ample. It does not cause pollution hazards in the environment. To do on site research in this field is a little tedious due the location and placement of wind turbines. To overcome this problem, researchers are continuously trying to first simulate the wind turbine by some other means in laboratory as wind turbine emulator (WTE) and then executing the same at the wind farm where the wind turbines are placed in wind energy conversation system (WECS). The recent advancement as reviewed in literature in this area has been tabulated in Table 1. The reviewed literature shows that the researchers are continuously working in this area. In search of higher efficiency and exact simulation of DC motor as WTE the control parameters of proportional integral controller (PI) may be optimized in a recent manner. It is also demonstrated in many literatures that now a days, the swarm intelligence (SI) based soft computing strategies are playing vital role in solving real world optimization problems [11–14]. So here, a significant algorithm namely, artificial bee colony (ABC) algorithm [15] of SI family is applied to optimize the controller parameters. The ABC algorithm is very popular and modified hybridised with different strategies, like global best [16–18] and spiral based local search [19– 21]. The WTE is carried out using a separately excited DC (SEDC) motor in this paper. The paper is presented in following manner: Sect. 2 describes the WTE using DC motor. The ABC algorithm and its mapping to optimize WTE parameters is presented in the Sect. 3. The work is concluded and future scope is presented in Sect. 4.
2 Wind Turbine Emulator Using SEDC Motor The block diagram of WTE using SEDC motor and the regulation strategies are demonstrated in Fig. 1a, b separately. The kinetic energy (KE) is presented in Eq. 1. The energy density (Pw ) is presented in Eq. 3. K E = (1/2)ρ(Ad x)v 2
(1)
Pw = (1/A)d(K E)/dt = (1/2)ρ(ν)3
(2)
Here, ρ, v, and A represents air density, wind speed, and swept area respectively. As wind turbines (WTs) draws a part of this power, the transformation competency of wind to mechanical energy is denoted by C p . This C p depends on tip speed ratio λ (linear blade tip velocity to wind velocity). The output power Po for a turbine with a predefined pitch angle is given in equation.
Designing Controller Parameter of Wind …
145
Table 1 Significant contribution in wind turbine emulator research S. No. Year/References Title Research contribution 1
Hardy et al. (2011) [1]
2
Nouria et al. (2012) [2]
3
Sahoo et al. (2013) [3]
4
Chaursiya et al. (2014) [4]
5
Bhayo et al. (2015) [5]
6
Garg et al. (2016) [6]
7
Benaaoutinate et al. (2017) [7]
8
Sirouni et al. (2018) [8]
9
Dekali et al. (2019) [9]
10
Oliveria et al. (2019) [10]
“Emulation of 1.5 wind turbine with a DC motor”
To design Mathematical model for the 1.5 MW wind turbine generator (WTG) “A contribution to the To propose electrical and design and the installation mechanical sensors design of an universal platform of a and pulse width modulation WTE using a DC motor” (PWM) control “DC motor based WTE A experimental setup was using Labview for WECS prepared with a feedback laboratory setup” survey form to customize the experiments as well “Emulation of a 1 MW wind Design of 1 MW WT using turbine system (WTS) with power coefficient separately excited DC motor using MATLAB simulink” “Modelling of WTS for To direct DC motor to analysis of the WECS using follow the theoretical MATLAB” reference speed “Modelling and Designed low-cost WT test development of WTE for rig the condition monitoring of wind turbine” “Development of a useful Modelling of permanent wind turbine emulator based magnet (PM) DC motor on permanent magnet DC with PWM motor” “Design and control of a Pitch angle of the WT small scale WTE with a DC control to execute it during motor” high wind speed zones “Experiment emulation of a Chopper controlled DC wind turbine under motor along with dSPACE operating modes using DC test bench motor” “Design and simulation of a Time series based wind controller for emulation of a speed variation wind turbine by a DC considerations motor”
Po = C p A Pw = (1/2)ρC p A(ν)3
(3)
A = π R2
(4)
λ = ωw R/ν
(5)
146
A. Sharma et al.
Fig. 1 a Block diagram WTE using separately excited SEDC motor, b Regulation strategies (C p Vs λcur ve)[1]
Here, the C p and λ are dependent as per following equation. C p = Ca (Cb /λ1 − Cc β − Cd ) exp− Ce λ1 + C f λ
(6)
1/λ = 1/(λ + 0.008β) − 0.035/(β 3 + 1)
(7)
and The respective values of coefficients Ca , Cb , . . . , C f are dependent on WT design as presented in Table 2 for the considered case in this paper. The engendered torque is given by Eq. 8. The mechanical torque is directly proportional to ratio of C p versus λ (Fig. 2). Tw = P0 /ωw = 1/2λρC p π R 3 ν 2
(8)
To simulate a SEDC motor as WTE [1], the mathematical model is feeded with turbine angular speed, wind velocity, and pitch angle as input. The outcome of WTE is modelled as power or torque. In this WTE the output is modelled as torque. The power represented in terms of minimum error. The mathematical model equations are as followings: (9) Va = Ia Ra + L a d Ia /dt + E b E b = K e φω
(10)
τe = K t φ Ia
(11)
Table 2 Respective values of C p coefficients Ca
Cb
Cc
Cd
Ce
Cf
0.5
116
0.4
5
21
0
Designing Controller Parameter of Wind …
147
Fig. 2 Generated power versus wind speed curve [8]
Jm dω/dt + βωm = τe − tau l
(12)
The reference current is assessed using Eq. 11. This current on comparison with measured armature current gives an error. This error is executed through PI controller. In this manner a SEDC motor executes similar to a WT. The output of PI controller is applied to a PWM control circuit. This ultimately engender the gating pulse to function power MOSFET to execute armature voltage. The block diagram of WTE using DC motor and regulation strategies are demonstarted in Fig. 3a, b separately. The considered system parameters during this research work are presented in Table 3. The respective motor speed voltage signal is given to WT as input. The error produced from measured armature and reference current is supplied to PI controller which further supplies input to PWM for current control. The PI acts as a current controller. A squirrel cage induction generator (IG) is adopted to setup an environment of grid connected WECS. The parameters of IG are presented in Table 4. A PI controller assesses an error value on the regular basis as the difference amid a reference and a measured value. The PI controller applies a corrective measure on the basis of proportional and integral actions. The structure of PI controller is given below: (13) e(t) = K P e(t) + K I e(t)d(t) Here, the K P and K I are proportional and integral controller gain as tuning parameters. The objective function in this problem is defined as below: F(x) = Min f (e)
(14)
148
A. Sharma et al.
Fig. 3 Generated power versus wind speed curve Table 3 WT specifications Rated power (RP ) Cut in wind speed (Vcutin ) rated wind speed (Vrated ) Turbine radius (RT ) Turbine inertia coefficient (TC ) Optimum power coefficient (β = 00 ) Optimum TSR (β = 00 )
500 W 3 m/s 7.5 m/s 1.25 m 0.07 kgm2 0.411 8
Table 4 Induction generator specifications Rated power (RPG ) Stator resistance (RS ) Stator leakage inductance (R X ) Mutual inductance (MI) Rotor resistance (RR ) Rotor leakage inductance (RX ) Excitation capacitance (X C ) (at full load in
connection)
1 hp 16.05 hp 753.43mH 703.43 mH 7.46 ohms 753.43 mH 6.52 micro farad
3 Mapping of ABC Algorithm to WTE The ABC algorithm, an iterative process, is a mathematical simulation of natural honeybee food foraging conduct [15]. It is divided in four phases. The first or initialization phase for wi (ith candidate solution where i = 1 . . . N ) is demonstrated as Eq. 15. All the N solutions are initialized in the D dimensional within the specified lower and upper bounds. wi j = wlow j + rand[0, 1](wupper j − wlow j )
(15)
Designing Controller Parameter of Wind …
149
where, wlow j and wupper j separately are bounds of wi in jth direction, rand[0, 1] is an arbitrary number. In the second or employed honey bee phase the solution is altered on the basis of communication from the other solution as Eq. 16 for the ith candidate solution. The fitness value (FV) of this solution is compared with the previous solution. The better fit solution out of two is selected for the next generation. wi j = wi j + φi j (wi j − wneigh j )
(16)
where, neigh ∈ {1, 2, . . . , N } and j ∈ {1, 2, . . . , D} are arbitrarily selected dissimilar indices and φi j is an arbitrary number. In the third or onlooker honey bee stage a food source (FS) is chosen based on a probability value Probi as mentioned in Eq. 17. Fitnessi Probi = N i=1 Fitnessi
(17)
where, Fitnessi is demonstrating the FV for the ith solution. The solutions selected as per the Eq. 17 are given a chance to update their position using equation. The more fit solution is chosen for the next generation. The fourth stage namely, scout honey bee stage plays role if a FS is not altering its position up to a predefined value. This discarded FS is termed as wi and j ∈ {1, 2, . . . , D} then this FS is re-initialized as Eq. 18: wi j = wlow j + rand [0, 1](wupper j − wlow j )
(18)
The K p and K I parameters for PI controller are initialized using Eq. 15 of ABC algorithm. The iterative process of ABC is executed. The pseudo code of the ABC algorithm to obtained WTE controller parameter is presented in Algorithm 1. For the considered system parameters of separately excited DC motor during this research work as presented in Table 3 which is simulated as WTE, the obtained optimized value of K p and K I present the generated power output as Fig. 3. The above value of power is obtained when tuned K p and K I parameters produce the minimum error. The output obtained as shown in the Fig. 3 shows that their is an improvement in the power output while optimizing parameters of PI controller in MATLAB using ABC algorithm.
4 Conclusion and Future Works In this paper, the controller parameter of the proportional integral (PI) controller for a wind turbine emulator (WTE) using separately excited DC (SEDC) motor are tuned. To determine the PI parameters a significant swarm intelligence (SI) based
150
A. Sharma et al.
Algorithm 1 ABC algorithm for WTE parameter optimization: The parameters are initialized. N =Number of FSs. Dimension D = 2 M G N =Maximum Number of Generations. CurrentIndex=1. Initialize each FS (solution) using equation 15. While (CurrentIndex ≤ M G N ) do • Step 1: Employed Bee Phase – Update the position of each FS by using the equation 16. – Calculate the error for the newly generated FS using the equation 13. – The old FS will be substituted by the new FS if the respective error value for the newly generated FS is minimized. • Step 2: Onlooker Bee Phase – – – –
Calculate pr obi using equation 17 for each FS. Update the position of FS using the equation 16 selected as per equation 17; Calculate the error for the newly generated FS using the equation 13. The old FS will be replaced by the new FS if the respective error value for the newly generated FS is minimized.
• Step 3: Scout Bee Phase – If a FS is not altering its state upto predefined limit. – Initialize that FS using equation 15. – Calculate the error for the newly generated FS. • Step 4: Memorize the best solution found so far. • CurrentIndex=CurrentIndex+1; end while Output the best solution;
artificial bee colony algorithm (ABC) is applied. The obtained outcome reveal that the optimized parameters of WTE for wind energy conversion system (WECS) using MATLAB are suitable enough for the experimentation purpose. In future, the same setup may be implemented on hardware. Acknowledgements This research work is funded from TEQIP III CRS RTU(ATU) fund (TEQIP III /RTU/(ATU)/CRS/2019-20/29).
References 1. Hardy T, Jewell W (2011) Emulation of a 1.5 mw wind turbine with a dc motor. In: 2011 IEEE power and energy society general meeting. IEEE, pp 1–8 2. Imen N, Adel K, Adel B (2012) A contribution to the design and the installation of an universal platform of a wind emulator using a dc motor. Int J Renew Energ Res (IJRER) 2(4):797–804 3. Sahoo NC, Satpathy AS, Kishore NK, Venkatesh B (2013) Dc motor-based wind turbine emulator using labview for wind energy conversion system laboratory setup. Int J Electr Eng Educ 50(2):111–126
Designing Controller Parameter of Wind …
151
4. Chaurasia K, Chaurasia K (2014) Emulation of a 1 mw wind turbine system with a separately excited: Dc motor using matlab-simulink. Int J Innovative Res Dev 3(10):103–109 5. Bhayo MA, Yatim AHM, Khokhar S, Aziz MJA, Idris NRN (2015) Modeling of wind turbine simulator for analysis of the wind energy conversion system using matlab/simulink. In: 2015 IEEE conference on energy conversion (CENCON). IEEE, pp 122–127 6. Himani G, Ratna D (2015) Modeling and development of wind turbine emulator for the condition monitoring of wind turbine. Int J Renew Energ Res 5(2):591–597 7. Benaaouinate L, Khafallah M, Mesbahi A, Martinez A (2017) Development of a useful wind turbine emulator based on permanent magnet dc motor. In: 2017 14th international multiconference on systems, signals and devices (SSD). IEEE, pp 44–48 8. Sirouni Y, El Hani S, Naseri N, Aghmadi A, El Harouri K (2018) Design and control of a small scale wind turbine emulator with a dc motor. In: 2018 6th international renewable and sustainable energy conference (IRSEC). IEEE, pp 1–6 9. Dekali Z, Baghli L, Boumediene A (2019) Experimental emulation of a small wind turbine under operating modes using dc motor. In: 2019 4th international conference on power electronics and their applications (ICPEA). IEEE, pp 1–5 10. Oliveira MC, Ramos RA (2019) Design and simulation of a controller for emulation of a wind turbine by a dc motor. In: 2019 workshop on communication networks and power systems (WCNPS), pp 1–5. IEEE 11. Nirmala S, Harish S, Ajay S (2018) Beer froth artificial bee colony algorithm for job-shop scheduling problem. Appl Soft Comput 68:507–524 12. Sharma N, Sharma H, Sharma A (2019) An effective solution for large scale single machine total weighted tardiness problem using lunar cycle inspired artificial bee colony algorithm. IEEE/ACM Trans Comput Biol Bioinf 13. Ajay S, Harish S, Annapurna B, Nirmala S (2016) Optimal design of pida controller for induction motor using spider monkey optimization algorithm. Int J Metaheuristics 5(3–4):278– 290 14. Ajay S, Harish S, Annapurna B, Nirmala S (2017) Fibonacci series-based local search in spider monkey optimisation for transmission expansion planning. Int J Swarm Intell 3(2–3):215–237 15. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Techn Rep TR06, Erciyes Univ Press Erciyes 16. Sharma S, Kumar S, Sharma K (2019) Improved gbest artificial bee colony algorithm for the constraints optimization problems. Evol Intell 1–7 17. Sharma H, Sharma S, Kumar S (2016) Lbest gbest artificial bee colony algorithm. In: 2016 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 893–898 18. Bhambu P, Sharma S, Kumar S (2018) Modified gbest artificial bee colony algorithm. In: Soft computing: theories and applications. Springer, pp 665–677 19. Sonal S, Sandeep K, Kavita S (2019) Archimedean spiral based artificial bee colony algorithm. J Stat Manage Syst 22(7):1301–1313 20. Sharma S, Kumar S, Nayyar A (2018) Logarithmic spiral based local search in artificial bee colony algorithm. In: International conference on industrial networks and intelligent systems. Springer, pp 15–27 21. Tiwari P, Kumar S (2016) Weight driven position update artificial bee colony algorithm. In: 2016 2nd international conference on advances in computing, communication, and Automation (ICACCA) (Fall). IEEE, pp 1–6
Text Document Orientation Detection Using Convolutional Neural Networks Shivam Aggarwal, Safal Singh Gaur, and Manju
Abstract Identifying the orientation of scanned text documents has been a key problem in today’s world where every department of any cooperation is surrounded by documents in one or another way. In this paper, our emphasis is on the more challenging task of identifying and correcting the disorientation of general text documents back to normal orientation. Our work aims to solve the real-world problem of orientation detection of documents in PDF forms which can be later used in further document processing techniques. All further document processing tasks depend on detecting the correct orientation of the document. To do this, the convolutional neural network (CNN) is used which can learn salient features to predict the standard orientation of the images. Rather than the earlier research works which act mostly between the horizontal and vertical orientation of non-text documents only, our model is more robust and explainable as it works at page level with text documents. Also, we have accelerated to a different level with proper explanation and interpretability. The proposed approach runs progressively in real time and, in this manner, can be applied to various organizations as well. Keywords Text documents · Convolutional neural network · Orientation detection · Model interpretability
1 Introduction Proper maintenance of source document is a key aspect, and today, we have to deal with documents regularly in every aspect whether in the form of PDFs, images, etc. Retrieving useful information from documents is crucial. With an increasing interest in deep learning and artificial neural networks, various document analysis problems such as character recognition, layout analysis, and orientation detection of documents arise. To process or retrieve anything out of these documents, a correctly oriented documentation is required on which users can work further. A basic step S. Aggarwal · S. S. Gaur · Manju (B) Jaypee Institute of Information Technology, Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_13
153
154
S. Aggarwal et al.
of every document processing is to correct its original orientation if it is not in a proper format. For example, a conventional OCR system needs correctly oriented pages of any document before recognition and cannot be applied directly to any other orientation. Therefore, we aim to build a model that works on the problem of detecting the correct orientation of disoriented documents by converting clockwise and anticlockwise oriented documents to the normal position as shown in Fig. 1. In this figure, we present an automated way where we first detect the orientation of text at the page level and then automatically write the page direction dependent on the visual information (like text patches, lines). To do this, we take input as a PDF of documents with disoriented pages, and the expected output is a PDF with all correctly oriented pages which can be later used in various applications. There is a lot of research works that aims to solve the above-stated problem statement [1]. However, most of these works are done on detecting an orientation of non-text documents like family pictures, landscapes, nature scenes, etc. [2–4]. Most of the works are done to estimate head positioning using deep learning techniques [5, 6]. To perform this task, there are inbuilt sensors inside the most recent cameras to modify the direction of pictures in 90° steps, but this function is usually not implemented due to a non-automated slow approach. To resolve this issue, various solutions were proposed by many researchers [7]. In our proposed work, we are not
Fig. 1 Description of the proposed model
Text Document Orientation Detection Using Convolutional …
155
only focused on detecting the orientation of non-text documents; instead, we are interested in a more challenging task of text documents. The task is challenging as few important features are contributing to finding out the direction of text as compared to scenic pictures which consist of various horizontal, vertical lines and shapes as features (contributing in identifying directions) [8]. Various traditional methods are applied to solve the given problem at a very basic character-wise level which is slow for obvious reasons for a large number of characters over a single page [9]. Many other methods and approaches of computer vision are used to scan the text lines and classify the direction of text [10]. In such cases, the outcome relies upon subtle features of scanned documents that require a few comprehensions of the picture content. Over the most recent decade, profound convolutional systems have been demonstrated to be truly adept at learning such features [11, 12]. The major problem is the standard classification problem among different orientations of documents at the page level. Convolutional neural networks work in most satisfying ways of detecting good important features [13] and then use those feature vectors in solving classifying the direction of pages and then rotate page at corresponding anti-angles to make it normal [14]. In this study, we aim to train a convolutional neural network to classify the orientation of an image of a document. Firstly, all the PDFs are converted to image format dataset which is the prerequisite for training in the CNN model. This model is working for three positions or classes to classify into clockwise (90°), anticlockwise (−90°), and normal position (0°) from the vertical. Once the classification of the imaged page finds out, then the model simply rotated the PDF page corresponding to the classified image by anti-classified angle. Preparing ground-breaking, profound convolutional neural systems are used to require a lot of labeled training data. Regularly, this is a solid limitation as the assortment of such information can be intense. For the current work, be that as it may, preparing such informative (labeled) data can be produced effectively from any cooperation documentation. Any unlabeled set of images can be treated as a training set by manually labeling it. Training samples can be created by simply pivoting these pictures to the required degrees (−90°, 0°, 90°). Moreover, our proposed work tries to focus on model interpretability of the convolutional network via the proposed algorithm. Some guidelines enable one to see the internal working of neural networks [15, 16]. By observing intermediate visualization of input image after every layer, produced results are semantically meaningful as depicted in the simulation section. The rest of the paper is organized as follows: Sect. 2 discusses the related work on document orientation detection, and Sect. 3 addresses the proposed methodology to solve the discussed problem. Section 4 discusses the model interpretability and gives a detailed description of the convolutional network used. Section 5 discusses the simulation outcomes and discusses various outcomes achieved by the proposed model. Finally, in Sect. 6, we conclude the proposed model accuracy and give direction on future work.
156
S. Aggarwal et al.
2 Related Work Orientation detection is a common challenge in document analysis and processing for certain applications. Most of the existing research works are proposed for nontext images and focus on identifying the skew angles using various computer vision algorithms. Fischer et al. [1] applied CNN for solving the problem of orientation classification of general images only. In the proposed method, CNN predicts the angle of landscape photos without any preprocessing. It is also clear from this research that CNN can be used to feature out those areas responsible for different image orientations. However, authors in [1] did not provide any explanation that solidifies their CNN model further, and robustness is poor for text images. Furhet, Jeungmin et al. [7] proposed a document orientation detection approach that detects document capturing moments to help users to correct the orientation errors. They captured the document orientation by tracking the gravity direction and gyroscope data and gave visual criticism of the construed direction for manual rectification. Wei et al. [2] used an interpolation artifacts approach implemented by applying a rotation to digital images. Be that as it may, this strategy likewise would not work for pictures that were not taken upstanding. Baird [3] defined an algorithm for detecting the page orientation (portrait/landscape) and the degree of skew for documents available as binary images. They used a new approach of Hough transform and local analysis. Later, Solanki et al. [17] method are for printed images, make use of patterns of printer dots by analyzing them, and then estimate the rotation of images. Plainly, this strategy likewise not material to pictures taken with a computerized camera. In other words, the continuous-valued prediction of angles task has been decreased to discrete angles by confining the revolutions to multiples of 90. This issue can be understood as reasonable proficiently [4, 18]. With time, a lot more researches happened over text documents too. Every one of these techniques exploits the extraordinary structure of text document images, for example, content design in lines and exact states of letters which is a comparatively harder task as there are fewer features to find out which are contributing to orientation detection. Chen et al. [9] were first to apply neural networks techniques to the recognition of document language type and acquired outcomes better than those of conventional strategies. They were using text lines as input data and feeding them to their CNN model to recognize the document properties but did not provide any model interpretability which solidifies their model simply to use it as a black box. Chen et al. [19] algorithm utilize recursive morphological transforms for text skew estimation. The automated experimental results indicate that the algorithm generates estimated text skew angles which are within 0.5/spl deg/of the true text skew angles 99% of the time. Avila et al. [20] worked on a fast algorithm for orientation and skew detection for complex monochromatic document images, which is capable of detecting any document rotation at high precision. Sun et al. [11] developed an algorithm that employs only the gradient information. Their results show that the statistics of the gradient orientation are enough to obtain the skew angle of a document image. The algorithm works on various document images containing text or may contain text with tables, graphics, or photographs. Yan et al. [10] worked on
Text Document Orientation Detection Using Convolutional …
157
a model for gray-scale and color images as well as binary scanned images. In this work, the cross-correlation between two lines in the image with a fixed distance is calculated, and the maximum distance is chosen to calculate the skew angle; later, the image is rotated in the opposite direction for skew correction. Chen et al. [12] come up with a new deep learning language-based approach and orientation recognition in document analysis where CNN is used in determining features/document properties. Here, authors have proposed a voting process to diminish the system scale and completely utilize the data of the content line. There are several other works too which are based on deep learning, and various algorithms of image processing to solve the discussed problem. However, these approaches either work on text lines datasets or works on identifying the orientation of non-text data like simple landscape images. Building a new approach of CNN classifier using image dataset of documents is something we have researched to improve the existing orientation detection schemes to make them better. Our aims here are to enhance the speed and robustness along with accurate results by following a deep learning convolutional approach that can be used to extract document properties at the page level and helps in classifying the discrete angles.
3 Proposed Methodology In this section, we will discuss the proposed methodology to solve the problem of text document orientation detection. Deep learning convolutional neural network is the proposed method based on the classification of the document. Here, we classify the whole document in three categories, namely clockwise (90°), anticlockwise (−90°), normal (0°), and change the orientation of the document for the respective degrees. With so much advancement in the field of image processing, the first basic deep learning model which pops up in the mind is convolutional neural networks (CNNs). Image classification is one of the most important applications of CNN. Due to its simplicity and better results, it is widely used in image classification problems. It has three major parts which include the input layer to take the input image to be classified, hidden layers that make it deeper and more efficient as compared to any other machine learning model, and finally output layer where the class is defined for the input image. Most of the existing methods use text lines of the document as input to the CNN which further required a task to find out patches of text in a document and increase computational time and then work only for two orientations positive and reverse directions. In order to find a solution, there are some approaches in which we directly input text document as an image and corresponding orientation as label results in better and faster computations. Our model provides a complete end-to-end solution where input is a wrongly oriented PDF of a text document in any form and output is correctly oriented pages in the same PDF format as depicted in the following Fig. 2.
158
S. Aggarwal et al.
Fig. 2 Input is a disoriented PDF; output is a correctly oriented PDF
3.1 Data Description Preferably, for preparing a system to foresee orientation, one requires a dataset of characteristic pictures commented on how much their rotation angles deviate from the upstanding direction. We have used a dataset of text documents provided by some organization; the training dataset consists of 8618 JPEG images from different PDF documents where each document PDF is first converted to the image. These images are manually divided into three categories by rotating images where 2908 are −90° oriented images, 2927 are 0° oriented images, and 2783 are +90° oriented images that are saved in the format like orientation.id.jpg to make input–output label easier at the time of training. For example, 90.458.jpg refers to a 90° oriented JPEG image with unique id 458, and 0.1563.jpg refers to a 0° oriented JPEG image with unique id 1563. In our proposed methodology, the CNN model has five convolutional layers which have 32, 64, 128, 256, and 512 convolution kernels, and all kernel sizes are 33 with rectified linear unit activation function and five max-pooling layers with a stride of 2 and size of 2 × 2. There is batch normalization after every convolution operation and dropout with a drop probability of 0.25 after each pooling layer. There is a flattening layer and two dense layers with one having an output shape of 512 and the last one with the shape of 3 as we have three classes (i.e., one for each). There is one SoftMax regression layer on the top of the neural network. The proposed model is shown in Fig. 3.
Fig. 3 Block structure of the proposed convolutional neural network
Text Document Orientation Detection Using Convolutional …
159
RMSProp is used as an optimizer which is similar to gradient descent algorithm but with momentum for better and faster results. Cross-entropy loss is the loss function our model used in the case of multi-classes. The model uses callbacks provided by Keras at given stages of the training procedure. Early stopping is used to stop training when a monitored quantity has stopped improving with patience parameter (which defines a number of epochs with no improvement after which training will be stopped) equal to 10. Further, ReduceLROnPlateau is used with a function to reduce the learning rate when a metric has stopped improving. This callback monitors the quantity, and if there is no improvement seen for a ‘patience’ number of epochs (i.e., set to 2), the learning rate is reduced to 0.00001. Data augmentation is applied to the training dataset in order to expand the data for better performance and accuracy and to prevent overfitting. The augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images. Operations like rescaling, shearing, zooming, and width and height shifting are performed with batch size of 32 and the number of epochs equal to 10.
4 Model Interpretability There are many research works exist where a solution is provided to a great extent, but explaining the reason behind the working of the solution is still challenging. Model interpretability is the key as it is a foundation of the standards and procedures which are characterized. As correlation frequently does not rise to causality, a strong model comprehension is required with regards to settling on choices and clarifying them. Model interpretability is important to check what the model is doing in accordance with what you expect, and it permits making trust with the clients and facilitating the progress from manual to mechanized procedures.
4.1 Intermediate Feature Visualization at Every Convolutional Layer With regard to profound neural systems, interpretability turns into somewhat extreme. Luckily, convolutional neural networks (ConvNets or CNNs) have inputs (pictures) that are outwardly interpretable by people, so we have different procedures for seeing how they work, what do they learn, and why they work in a given way. To make CNN from black box to white box, results are recorded in the form of images after multiple steps. Two algorithms are used in explaining model interpretability. The first algorithm is to predict the correlation of every kernel with the input image and finding the kernel/filter which is contributing most in predicting the class. Since every kernel has different trained weights, every image contributes and establishes
160
S. Aggarwal et al.
different correlations which may help and play an important role in predicting classes (positive correlations) or may not help in any form and therefore shows no impact toward classification score (zero correlated) or may have a negative impact on the score and contributing in the prediction of wrong class (negatively correlations). In this algorithm basically, every kernel or channel is replaced iteratively by setting its weight to all zeros and then observing the impact/changes by new classified score. Thus, the channel importance score can be computed by subtracting the normal score from the new classified score. Channel imp score = new classified score−normal classified score ⎧ ⎨ Positive, if channel imp score > 0 Channel Impact = No Impact, if channel imp score = 0 ⎩ Negative, if channel imp score < 0 If channel importance score comes out to be positive, i.e., channel is important and has a positive impact on classification, channel corresponding to this score is said to be a positively correlated channel. On the other hand, if the channel importance score comes out to be negative, i.e., channel has a negative impact on classification, the channel corresponding to this score is said to be a negatively correlated channel. Finally, if the channel importance score comes out to be zero, i.e., channel has no impact on classification, channel corresponding to this score is said to be zero correlated channel. The second algorithm is about visualizing an image after every set of layers. Featured images are saved after every set of four operations which includes convolutional, max-pooling, batch normalization, and dropout for every kernel. The way after the first layer we obtained 32 images from 32 different kernels, after the second layer we obtained 64 images from 64 different kernels and so on.
4.2 Interpretations The underlying layers (after initial layers) hold the vast majority of the info picture’s components. It would seem that the convolution channels are actuated at all aspects of the input image. Simply, we can say that the model is considering text patches as features, and we identify those patches with the bright part in our intermediate images. As we go deeper (layer-3 and layer-4), the highlights extricated by even most associated channels become outwardly less interpretable. One can see only bright horizontal and vertical lines. An instinct for this can be clarified that intermediate image is now abstracting endlessly visual data of the input image and attempting to change over it to the necessary yield grouping area since normal orientation text is more in form of horizontal patches, while for 90 and -90, these patches are tilted vertically which helps model in the classification of horizontal and vertical text.
Text Document Orientation Detection Using Convolutional …
161
They go about as data refining pipeline where the input image is being changed over to space which is visually less interpretable (by evacuating noises), however mathematically valuable for the convolutional network to settle on a decision from the yield classes in its last layer [16]. In the end, by this model interpretability method, one can provide a satisfactory explanation for classifying horizontal and vertical test document images.
5 Simulation Results We have used a dataset of text documents provided by some organization; the training dataset consists of 8618 JPEG images from different PDF documents where each document PDF is first converted to the image. These images are manually divided into three categories by rotating images where 2908 are −90° oriented images, 2927 are 0° oriented images, and 2783 are +90° oriented images that are saved in the format like orientation.id.jpg to make input–output label easier at the time of training. For example, 90.458.jpg refers to a 90° oriented JPEG image with unique id 458, and 0.1563.jpg refers to a 0° oriented JPEG image with unique id 1563. Manually verified test set: The test dataset consists of new 1517 JPEG images which are extracted from completely different PDF documents from the training dataset. Similarly, training dataset is divided into three categories where 515 for − 90° orientation, 500 for zero-degree orientation, and 502 for +90° orientation. While providing input to the model for training, images are resized into (400, 400, 3), i.e., height and width of three channels (RGB) are 400 × 400, respectively. Experiments were run online using Kaggle kernels (a free platform to run Jupyter notebooks) using Python version 3 on a computer system with AMD A8-7410 processor of 4 GB RAM, running Windows version 10 Home. While evaluating the test dataset of 1517 images, our model predicts 1486 with correct orientation, while the other 31 images wrongly predicted. More clarity in results is provided by observing on what orientations our model fails by dividing wrong predicted images into six subsections as shown below. 1. 2. 3. 4. 5. 6.
Images with normal orientation (zero-degrees) predicted in 90° category. Images with normal orientation (zero-degrees) predicted in −90° category. Images with 90° orientation predicted in 0° category. Images with 90° orientation predicted in −90° category. Images with −90° orientation predicted in 0° category. Images with −90° orientation predicted in 90° category.
The confusion matrix of results on the test set is shown below with an accuracy of 97.956%. One can observe in Fig. 4 that there are zero wrongly predicted normal orientation images (zero-degrees) which are well described by our interpretability model which can provide a satisfactory explanation for classifying horizontal and vertical test
162
S. Aggarwal et al.
Fig. 4 Confusion matrix
document images. All the wrongly predicted images come either from clockwise or anticlockwise orientations as an explanation for model interpretability in case of 90° and −90° is still not crystal clear by a given algorithm. There are 17 wrongly predicted 90° orientation images out of which seven are predicted as in zero-degrees orientation, and the other 10 are classified as −90° orientation. And, there are 14 wrongly predicted −90° orientation images out of which eight are predicted as in zero-degrees orientation and the other six are classified as 90° orientation. Also, there are some cases, like blank pages which can be classified as any of three categories. Given below are some predictions results of both the cases wrongly and correctly predicted.
6 Conclusions In this study, we have presented an approach dependent on convolutional networks that aims to convert disorientated text documents to its well-oriented form. Our model aims to solve the real-world problem of orientation detection of documents in PDF format which can be later used in further document processing techniques as the document processing tasks depend on detecting the correct orientation of the document. There were many related works already done on this problem of document orientation, but none of them works at the page level. Also, we have accelerated to a different level with proper explanation. The proposed model achieves an accuracy of approx. 98%, based on deep learning five-layer convolutional neural network. Next, this study also tried to explain model interpretability via a proposed algorithm of observing intermediate visualization of the image every layer. These produced
Text Document Orientation Detection Using Convolutional …
163
semantically meaningful results. In future work, the network architecture will be optimized to run faster in real time. Also, using interpretability results to analyze misclassified cases propose a way to fix this either by creating more training data or other methods. Devise methods for model interpretability are to understand how 90° and −90° are distinguished by the model. Acknowledgements First author also thanks Mr. Vamsi Krishna AV for providing useful guidance and suggestions during problem identification.
References 1. Fischer P, Dosovitskiy A, Brox T, Image orientation estimation with convolutional networks, vol 9358, pp 368–378. https://doi.org/10.1007/978-3-319-24947-630 2. Wei W, Wang S, Zhang X, Tang Z (2010) Estimation of image rotation angle using interpolationrelated spectral signatures with application to blind detection of image forgery. Trans Inf For Sec 5(3):507–517 3. Baird HS (1995) Measuring document image skew and orientation. In: Proceedings of SPIE— The International Society for Optical Engineering (1995) 4. Vailaya A, Zhang H, Member S, Yang C, Liu FI, Jain AK (2002) Automatic image orientation detection. IEEE Trans Image Process, 600–604 5. Voit M, Nickel K, Stiefelhagen R (2006) Neural network-based head pose estimation and multi-view fusion. In: Proceedings of the CLEAR Workshop, LNCS, pp 299–304 6. Peake GS, Tan TN (1997) A general algorithm for document skew angle estimation. In: ICIP, vol 2, pp 230–233 7. Oh J, Choi W, Kim J, Lee U (2015) Scanshot: Detecting document capture moments and correcting device orientation. In: ACM conference on human factors in computing systems, pp 953–956 8. Pingali GS, Zhao L, Carlbom I (2002) Real-time head orientation estimation using neural networks. In: ICIP, pp 297–300 9. Chen L, Wang S, Fan W, Sun J, Satoshi N (2015) Deep learning-based language and orientation recognition in document analysis. In: International conference on document analysis and recognition, pp 436–440 10. Yan H (1993) Skew correction of document images using interline cross-correlation. CVGIP: Graph Model Image Process 55(6):538–543 11. Sun C, Si D (1997) Skew and slant correction for document images using gradient direction. In: 4th international conference document analysis and recognition (ICDAR’97), pp 142–146 12. Wang R, Wang S, Sun J (2018) Offset neural network for document orientation identification. 13th IAPR international workshop on document analysis systems (DAS), Vienna, 2018, pp 269–274 13. Fefilatyev S, Smarodzinava V, Hall LO, Goldgof DB (2006) Horizon detection using machine learning techniques. In: ICMLA, pp 17–21 14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1106–1114 15. How convolutional neural networks see the world. https://blog.keras.io/how-convolutional-neu ral-networks-see-the-world.html 16. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks In Computer Vision. In: ECCV 2014—13th European conference, Proceedings, PART 1 edn, pp 818–833
164
S. Aggarwal et al.
17. Solanki K, Madhow U, Manjunath BS, Chandrasekaran S (2004) Estimating and undoing rotation for print-scan resilient data hiding. In: ICIP, pp 39–42 18. Wang YM, Zhang H (2004) Detecting image orientation based on low-level visual content. Comput Vis Image Underst 93(3):328–346 19. Chen SS, Haralick RM (1994) An automatic algorithm for text skew estimation in document images using recursive morphological transforms. In: ICIP, pp 139–143 20. Avila BT, Lins RD (2005) A fast orientation and skew detection algorithm for monochromatic document images. In: Proceedings of the 2005 ACM symposium on document engineering, pp 118–126
A Deep Learning-Based Segregation of Housing Image Data for Real Estate Application Annu Kumari, Vinod Maan, and Dhiraj
Abstract Pictures have become an important part of our life nowadays. Humans tend to analyze all the images they come across to find which category they belong to and annotate them. This analysis will become difficult for large volumes of data and we introduce a machine learning methodology to work more efficiently to do such classification. The input dataset contains images belonging to different categories of parts of the house for the real estate work thus reducing hectic work of an agent and will increase his working efficiency. The approach to this type of classification algorithm contains a convolution layer network with a fully connected layer. The algorithm for the same approach also comes with an image enhancement technique as the quality of images of the dataset might not be sufficient for the architecture. So, the preprocessing technique ‘contrast limited adaptive histogram equalization (CLAHE)’ has been implemented for the enhancement of images. The image classification results of the CNN-based architecture are superior to other traditional methods used for classification. Keywords Image classification · Convolutional neural network · Real estate image database
A. Kumari (B) National Institute of Technology Warangal, Warangal, Telangana, India e-mail: [email protected] V. Maan Mody University, Lakshmangarh, Rajasthan, India e-mail: [email protected] Dhiraj CSIR—Central Electronics Engineering Research Institute, Pilani, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_14
165
166
A. Kumari et al.
1 Introduction Advertising is the most timely and relevant thing these days. A good picture will bring in a good amount of sale for a product that is to be sold. Thus, real estate agents considerably spend a good amount of time in collecting pictures of thousands of the houses to be put on sale. Out of all, best pictures are used for advertising of the sale of the house determining the value of the house and this brings a lot of work to the real estate agents. Hence, to bring more efficiency to real estate agent’s work, we need to implement image classification algorithms using machine learning. These algorithms take input images from the dataset classifying them into different categories. However, there are several other factors including the material the house is made of, house age and condition that is old or new house, location, neighborhood, etc that decide the value of a property. Other factors include the flooring material, indicating higher valued homes if the floor is made of some good quality rather than cheaper ones. Same thing goes for the material used for making counters in the kitchen, bathroom floors, stairs, ceilings and several other parts of the house as shown by sample images of REI dataset [1] as shown in Fig. 1. Hence, we require computer vision technology that helps in annotating images from a big chunk of data images. In this paper, we will study about the computer vision algorithm and its implementation for real estate image classification. In deep learning, a convolutional neural network commonly referred as ConvNet or CNN is a class of deep neural networks which is mostly used for analyzing the images. Also known as Shift Invariant Artificial Neural Networks based on their shared weight architecture and translation invariant characteristics, they have several applications in image and video recognition, object detection, image classification, medical image analysis, natural language processing, etc. CNNs are regularized versions of multilayer perceptrons that are fully connected networks. A convolutional network layer
Fig. 1 Sample images from the real estate image database [1]
A Deep Learning-Based Segregation of Housing …
167
Fig. 2 Overall framework of our approach
consists of an input layer, an output layer as well as multiple hidden layers [2]. CNNs most often use a little preprocessing of the data before classifying them. CNNs often used in image recognition systems, have a surprisingly fast learning process and are computationally expensive. CNNs have millions of parameters and with a small amount of dataset they would run into the problem of overfitting thus making them poor performers. They need massive amounts of data expecting welltrained models to give strong and better performance [3]. Deep neural networks are state-of-art technology for image recognition and perform well. In this paper, a miniaturized version of a popular deep learning convolution layer architecture along with a fully connected network for the purpose of image classification of enhanced images as shown in Fig. 2 has been implemented. In present times, there exists a few datasets for the image classification of scenes belonging to different categories such as SUN, MIT 67 that are considered as standard benchmarks containing images for scene classification with hundreds of classes of objects. Scene understanding (SUN) database [4] is used for evaluating the stateof-art algorithms for scene classification and set benchmarks for their performance. Another standard database MIT 67[5] is a collection of indoor scene categories containing multiple images. Indoor scene recognition is a high-level vision problem. All these databases contain a few categories of real estate however do not list all that are required for the real estate classification task and do not demonstrate variance in intraclass. Previously, there was no such dataset into existence for the image classification in real estate work. To accommodate this, a new dataset real estate image (REI) dataset was introduced by [1] for the image recognition system, samples of which are shown in Fig. 1. The scene classification exhibits classifying the images and labeling the scene categories as kitchen, bedroom, etc. We build a scene classification model to predict the labels of the input employing the CNN networks. Before predicting labels, a little preprocessing is done on the input data images as quality of images may not be appropriate for the classification approach. So, to improve image quality, enhancement of image is done using the technique contrast limited adaptive histogram equalization (CLAHE) [6]. This will bring in the result that the scene classifier works with great accuracy in classifying multiple classes contained in the data. The input is given to the CNN architecture containing a convolution layer to extract the features of the image. The extracted features are then given to fully connected
168
A. Kumari et al.
layers and then to a classifier which classifies and predicts the labels to give the output at the end.
1.1 Framework Overview and Contribution The framework mainly consists of two modules; image enhancement module and image classification module. The real estate image dataset contains images which are of bad quality and thus might not be appropriate for the classification approach. A little preprocessing is done for the enhancement of images using contrast limited adaptive histogram equalization (CLAHE) technique. Finally, the enhanced images are given to the classification algorithm for classification task to predict the labels of the images accordingly. Labels are used that define the descriptive nature of the images accurately. In the classification framework, we introduce a CNN network that learns the features of the image by feature extraction techniques. The features obtained from the CNN network are combined with the fully connected layer. At the last stage, a softmax classifier has been implemented to classify and predict the label of the image category to which it belongs. This paper contributes to the following major factors: • We define a segregation-based image recognition approach with a deep learning convolutional layered architecture with fully connected layers for classifying images and then predicting the annotations of the images exhibiting our recognition performance on large sets of data. • To evaluate our classification algorithm, we have used the real estate image (REI) dataset that comprises different scene categories. The system works well for the large REI dataset that contains numerous images from different scene categories. The REI dataset can be used for further research work in understanding of classification of images.
2 Survey of Related Work Image classification has become a prominent feature these days. The features of the image content carry a lot of information. House pricing determination has been affected by the information a classified image does carry along with it. The predicted labels of an image influence the attributes while considering the real estate values. If we had to estimate the real estate pricing of properties by considering their almost identical features except that one has much bigger space than the other. The potential difference between these two properties will lead to the one with much bigger space priced at the highest value out of the two properties. This approach can lead to several other features such as age of the house, location, availability of transportation, type of the house and neighborhood features such as convenience stores nearby.
A Deep Learning-Based Segregation of Housing …
169
In image processing, features are a piece of information that is helpful for computational tasks in certain applications. In the same sense, machine learning learns features for the recognition systems which provide rich information on the image content. Thus, image can be described as a collection of features where specific structures such as edges, points and objects are termed as features. Image feature extraction leads to two different things: detection of area of interest in images; contours and description of local regions of the image. Detecting features in an image such as interest points is a necessary step for various computer vision applications. These interest points are just a discontinuous function of the intensity distribution and for instance are points, T junctions or the points with high texture variations. There have been various approaches for the detection of interest features roughly falling into three categories as contour-based approach, intensity-based and model-based approach. Intensity-based approach is considered as the most reliable due to its independence with respect to contour detection and type of interest points. Feature detection is thus based upon the observation of derivatives of the intensity function and on detection of local extremes of gradient. Several techniques have been implemented to enhance the quality of images with shadows, illumination, texture and geometry [7]. Contrast limited adaptive histogram equalization technique (CLAHE) which depicts the histogram of a particular section of an image and redistributes the energy values are used for enhancement of poor-quality images. Almost every existing image classification approach makes the use of feature-based algorithms. CNN dominates the field of classification as it requires only the local understanding of image being good enough to annotate it with practical benefits of having fewer parameters improving the time taken by it to learn and reducing the amount of data required to be trained. For example, a large collection of data is required to train the model for classification. In [8], the aim to design ensembles of error-independent networks is acknowledged. It indicates the approach of automatic design of effective neural network ensemble. In [9], a CNN with multistage features is described. Most CNN utilizes only the top layer feature for the classification, those features might not contain the useful features for classification. Therefore, to utilize a CNN’s potential to its full extent; an inherent property of need for fusion of multiple layers’ feature is required. Some of the classification algorithms implement recurrent neural networks for scene labeling. In paper [10], the house dataset contains a text file which contains house numbers as their attributes. The textual attributes represent the number of bedrooms, bathrooms, area of the house, price and zip code. The pair-wise comparison approach using deep learning was presented by [11]. The paper [12] presents a method for image classification by neural networks which use characteristic data extracted from images. In order to extract characteristic data, image pixels are divided by a clustering method on HSI three-dimensional-color space and processed by labeling to select domains. The information extracted from the domains is characteristic data (color information, position information and area information) of the image. The REI dataset for the real estate image classification contains all kinds of related images. All these images have been collected from two kinds (1) real estate image record and (2) online information sources.
170
A. Kumari et al.
3 Real Estate Image Dataset Real estate is a thriving field of business nowadays. The dataset that will be used for the classification task is house dataset for real estate images. This house dataset is a standard benchmark for house pricing estimation that contains both visual and listing information containing images from different parts of the house. The REI dataset used for this study was the first kind of dataset introduced containing all kinds of images for the classification used for the estimation of house pricing. Real Estate Record. The real estate image data is a collection of pictures from real-time data and other sources that contains information of real estate data listing. The whole dataset has been divided into six different categories for the scene classification task. The six categorized labels are bedroom, bathroom, front yard, backyard, kitchen and living room. Each category contains images in a very huge amount. Image classification works fine for a huge dataset but it has become quite difficult due to variations of intraclasses. However, the approach used here is image classification where it works quite well for this dataset with the existence of intraclasses.
4 Framework Formulation Our entire framework demonstration is divided into two modules: image enhancement module and image classification module. Both the modules are discussed below with details.
4.1 Image Enhancement Module There is no standard protocol for the quality of images being collected for the visual and classification task. Some pictures being bad quality are not even appropriate for recognition algorithms and it becomes quite hard for the system to predict the label of the images. Thus, low quality images are preprocessed by an image enhancement module as enhancing the image will remove illuminations, shadows effects, blurring edges thus improving the overall quality of the image. The improved quality images thus improve the overall effective working of a CNN. To enhance a bad quality image, we use contrast limited adaptive histogram equalization technique (CLAHE) [13]. CLAHE technique involves computing several histograms of each distinct part of the image and then redistributes the intensity values of the image and is well suited to carry out the enhancement process.
A Deep Learning-Based Segregation of Housing …
171
4.2 Recognition Module Our approach consists of a convolutional neural network and fully connected layer network for the image classification. The architecture is formed by a stack of distinct layers that transforms the input layer to the output layer at the end, i.e., it contains input layer, output layer with several hidden layers. The CNN that we have implemented in this paper for the classification task is MiniGoogLeNet Architecture as shown in Fig. 3. The MiniGoogLeNet architecture is a miniaturized replica version of GoogLeNet [14]. GoogLeNet is a convolutional neural network that is a 22-layer deep architecture. This network can be pretrained either on ImageNet [15] or Places365 [16]. In image recognition challenges, GoogLeNet architecture has achieved a very low error rate which is very close to human level performance. The model exhibits very small convolutions in order to reduce the number of parameters thus having less parameter as compared to previous architectures. Features of GoogLeNet The GoogLeNet architecture is very different from previous state-of-art architectures with convolution and average pooling layer. The main layers used in the architecture are: Convolution Layer: The architecture is built with a 1 × 1 convolution layer. These convolutions are used as dimension reduction modules and thus reduce the number of parameters of the architecture. Reduced number of parameters thus increases the depth of the architecture.
Fig. 3 Individual convolution (left), inception (middle), downsample (right) modules followed by MiniGoogLeNet architecture (bottom) build from these individual modules
172
A. Kumari et al.
Global Average Pooling: As in previous architectures, fully connected layers are used at the end of the network. GoogLeNet architecture makes the use of global averaging layer methodology at the end of the network. This layer takes the feature map of 7 × 7 and averages it to a 1 × 1 layer. This averaging function reduces the number of trainable parameters and thus improves the accuracy of the architecture. Inception Module: In this module, 1 × 1 convolution layer with 1 × 1 stride and 3 × 3 convolution layer with 3 × 3 strides are performed in a parallel way and output of these are combined together to give a final output at the end of the layer. Downsample Module: The architecture consists of a downsample module with 3 × 3 convolution layer of 2 × 2 strides and a 3 × 3 max pooling layer with 2 × 2 strides concatenated together [14]. Classifier for training: This architecture used some intermediate classifier branches in the middle of architecture for the purpose of training only. The intermediate classifier branches in this architecture consist of a 5 × 5 average pooling layer with a stride of 3, a 1 × 1 convolution with 128 layers, two fully connected layers and a softmax classification layer. MiniGoogLeNet Architecture MiniGoogLeNet is a smaller version of its big brother GoogLeNet. In Fig. 3 (bottom), we can see the MiniGoogLeNet architecture constructed from building blocks convolution, inception and downsample modules. The inception module in MiniGoogLeNet is a variation of the original inception module. Classification: The classification task is carried out by extracting the features from the hidden layers of the architecture. The output from the fully connected layer is given to the softmax classifier for the prediction of the labels. Softmax is the logistic regression that normalizes the input into a vector of values following probability distribution as shown in Eq. 1. A mathematical representation of softmax function is P(y = j|θ (i)) = k
eθ (i)
θk j=0 e
(1)
where θ = w0 x 0 + w1 x 1 + · · · + wn x n = wT x. where θ is a one-hot encoded matrix of input parameters to predict a trained set of features x, with its weight class j. To generalize, we say that θ is the transpose of the weight’s matrix w, multiplied by the feature matrix x. The w0 × 0 is the bias term added to every iteration that is taking place. Training the network: Backpropagation algorithm is commonly used where loss function and gradient descent optimization algorithm play essential roles. The cross-entropy loss is shown in Eq. 2: L = − ln pc
(2)
where pc is the probability for the correct class prediction c. We need to calculate the input to the softmax’s backward propagation phase, ∂L and pi shows up in the loss equation, given where the output of the softmax is ∂Out s
A Deep Learning-Based Segregation of Housing …
173
by the Eq. 3: ∂L ∂out(i) s
=
0 if i = c − p1i if i = c
(3)
Gradient descent is an optimization algorithm that updates the learning parameters of the network as weights and biases in an iterative manner to minimize the loss function. The gradient of the loss function tells us about the direction in which the function has the steepest ascent. The gradient descent algorithm will take a step in the negative direction in order to reduce loss. Every parameter will be updated in this negative direction at the rate of a hyperparameter named as learning rate. The gradient is, mathematically, a partial derivative of the loss with respect to each learnable parameter, and a single update of a parameter is formulated in Eq. 4 as: w := w − α ∗
∂L ∂w
(4)
where w stands for each learnable parameter, α stands for a learning rate, and L stands for a loss function. It is of note that, in practice, a learning rate is one of the most important hyperparameters to be set before training starts. The gradients of the loss function are used to update the parameters of the layer by passing the training set called mini-batch, through the hidden layers of the neural network. This method is called mini-batch gradient descent, also frequently referred to as stochastic gradient descent (SGD) [17], and a mini-batch size is also a hyperparameter. Subsequently, we compile the model with the SGD optimizer and a categorical cross-entropy loss function.
5 Experimental Result In this paper, our approach for the classification of images has been evaluated on the real estate image (REI) dataset. The approach contains the image enhancement module and the classification module. The REI dataset contains a collection of all the relevant images of the inside and outside of a house. We choose scene classification with all the categories of the house available. This dataset is divided into six different categories including backyard, bathroom, bedroom, front yard, kitchen and living room.
174
A. Kumari et al.
5.1 Image Enhancement Result The preprocessing is required for the data in the dataset as some images will be of bad quality having different illuminations, shadows and blur. To enhance a bad quality image, we use contrast limited adaptive histogram equalization technique (CLAHE) well suited for REI dataset. Figure 4 shows a few examples of the image
Fig. 4 Outcome of image enhancement on the real estate images dataset, first column representing the original image while second column represents the enhanced image
A Deep Learning-Based Segregation of Housing …
175
enhancement. A few samples of images were taken from the dataset and the result is shown in the figure after the image enhancement is carried out.
5.2 Image Classification Result Before training the model, we need to split our dataset further into three different sets: training, testing and validation dataset. The training testing and validation datasets contain 60%, 15% and 25%, respectively, images from each of the categories of the dataset in a random manner. The entire network model will be trained on the REI dataset. The basis of the model is supervised learning where the model is trained by input data and expected output data. To create such a model, first, we will construct the model which is neural networks with the help of Keras and TensorFlow. The initial hyperparameters for training purpose are set as: learning rate = 1e−2, batch size = 32, number of epochs = 100. After construction, the model will be trained and later will be carried for testing. The time taken by the model for the training purpose is nearly 5664 s. After this, the model will be evaluated on new data to demonstrate the classification approach. The model took nearly 6 s to predict the labels for the image classification task. To describe the image classification on the real estate dataset with the proposed algorithm, Fig. 5 shows the dependence of training loss and accuracy with the number of epochs has been plotted. The plot shows that the training data achieve higher
Fig. 5 Graph showing the image classification result with regards to the variation of training, validation loss and accuracy with the number of epoch
176
A. Kumari et al.
accuracy after it has surpassed ten epochs. After that the accuracy will not improve much and will decrease often in between epochs. In the plot, there is an intersection point of the validation and training accuracy (see Fig. 5). Validation accuracy shows the ability of the model to be generalized on a new dataset. Overfitting situations take place if the validation accuracy keeps on getting worse while training accuracy is always improving. This results in the fact that after a certain epoch the model will show the same result, not better. So, it is sufficient to train the model on certain number of epochs. In order to compute image classification, we consider some other methods: CNN using AlexNet model, VGG model, fine-tuned AlexNet [18] and fine-tuned VGGNet. CNN performs better than other few architectures such as SIFT [19] and GIST [20] descriptors. The classification results of the MiniGoogLeNet are shown in Fig. 6. The image classification performance in terms of validation accuracy of the proposed
Fig. 6 REI data category recognition outputs
A Deep Learning-Based Segregation of Housing …
177
Table 1 Classification report to show the quality of predictions Precision
Recall
F1-score
Support
Backyard
0.81
0.82
0.81
147
Bathroom
0.91
0.70
0.79
145
Bedroom
0.87
0.74
0.80
337
Front yard
0.91
0.83
0.87
183
Kitchen
0.82
0.88
0.85
190
Living room
0.60
0.89
0.72
170
CNN-based approach has been compared with the other techniques as shown in Table 3 and it provides superior performance with respected to other techniques. Classification Result. In this paper, we apply the classification approach for the scene classification on the REI dataset. Few results of the classification task are shown in Fig. 6. A classification report that is used to measure the quality of predictions from the algorithm is shown in Table 1 and the confusion matrix for the six classes is presented in Table 2. Table 2 Confusion matrix to show the performance of the classification model Actual class
Predicted class
Backyard
Bathroom
Bedroom
Front yard
Kitchen
Living room
Backyard
[[ 121
0
0
14
6
6]
Bathroom
[2
102
26
1
10
4]
Bedroom
[6
9
248
0
4
70]
Front yard
[21
0
0
151
6
5]
Kitchen
[0
1
4
0
168
17]
Living room
[0
0
6
0
12
152]]
Table 3 Performance comparison between various classifiers Method
Scene classification accuracy (%)
SIFT descriptor
69.88
GIST descriptor
73.29
CNN with AlexNet
86.74
CNN with fine-tuned VGGNet
91.32
Proposed CNN (MiniGoogLeNet)
93.2
178
A. Kumari et al.
6 Conclusion This paper demonstrates the classification approach for the real estate images. The classification algorithm has been evaluated on real estate image (REI) dataset. This approach for the classification of images works quite well for the large amount of dataset as REI consists of different categories with a number of images to give a better accuracy than other networks. Its usage can be advantageous in e-commerce applications where automatic segregation of the images needs to be done. Our future work approach consists of the idea of a dataset with extended categories. It can also be extended to further analysis such as the condition of the house that is defective floors, roofs, etc., and type of neighborhood which includes the subway facility or the nearby convenience stores and to analyze what kind of maintenance required for the house, i.e., if the house is old or new with new house being required a very low maintenance. All these factors affect the real estate business and we consider our extended research and approach to study the further analysis of the house in the real estate business.
References 1. Bappy H, Barr JR, Srinivasan N, Roy-Chowdhury AK (2017) Real estate image classification. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, 2017, pp 373–381. https://doi.org/10.1109/WACV.2017.48 2. Jmour N, Zayen S, Abdelkrim A (2018) Convolutional neural networks for image classification. In: 2018 international conference on advanced systems and electric technologies (IC_ASET), Hammamet, 2018, pp 397–402. https://doi.org/10.1109/ASET.2018.8379889 3. Sharma N, Jain V, Mishra A (2018) An analysis of convolutional neural networks for image classification. Procedia Comput Sci 132:377–384. ISSN 1877-0509. https://doi.org/10.1016/j. procs.2018.05.198 4. Choi MJ, Lim JJ, Torralba A, Willsky AS (2010) Exploiting hierarchical context on a large database of object categories. In: CVPR. IEEE, pp 129–136 5. Quattoni, Torralba A (2009) Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR) 6. Setiawan W, Mengko TR, Santoso OS, Suksmono AB (2013) Color retinal image enhancement using clahe. In: International conference on ICT for smart society (ICISS), pp 1–3 7. Yue J, Li Z, Liu L, Fu Z (2011) Content-based image retrieval using color and texture fused features. Math Comput Modelling 54(3):1121–1127 8. Giacinto G, Roli F (2001) Design of effective neural network ensembles for image classification purposes. Image and Vision Computing 9. Yim J, Ju J, Jung H, Kim J (2015) Image classification using convolutional neural networks with multi-stage feature. In: Kim JH., Yang W, Jo J, Sincak P, Myung H (eds) Robot intelligence technology and applications 3. Advances in Intelligent Systems and Computing, vol 345. Springer, Cham. https://doi.org/10.1007/978-3-319-16841-8_52 10. Ahmed E, Moustafa M (2016) House price estimation from visual and textual features. In: Proceedings of the 8th international joint conference on computational intelligence (IJCCI 2016), pp 62–68. ISBN 978-989-758-201-1 11. Wang X, Takada Y, Kado Y, Yamasaki T (2019) Predicting the attractiveness of real-estate images by pairwise comparison using deep learning. In: 2019 IEEE international conference
A Deep Learning-Based Segregation of Housing …
12.
13.
14.
15.
16.
17. 18. 19.
20.
179
on multimedia & expo workshops (ICMEW), Shanghai, China, pp 84–89. https://doi.org/10. 1109/ICMEW.2019.0-106 Shinmoto M, Mitsukura Y, Fukumi M, Akamatsu N (2002) A neural network approach to color image classification. In: Proceedings of the 9th international conference on neural information processing, 2002. ICONIP ’02, vol 2. Singapore, pp 675–679. https://doi.org/10.1109/ICONIP. 2002.1198143 Yadav G, Maheshwari S, Agarwal A (2014) Contrast limited adaptive histogram equalizationbased enhancement for real time video system. In: 2014 international conference on advances in computing, communications and informatics (ICACCI), New Delhi, pp 2392–2397. https:// doi.org/10.1109/ICACCI.2014.6968381 Szegedy et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, pp 1–9. https://doi.org/10.1109/CVPR. 2015.7298594 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, Miami, FL, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848 Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems 27 (NIPS) Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv: 1609.04747 Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105 Liu C, Yuen J, Torralba A (2011) SIFT flow: dense correspondence across scenes and its applications. In: IEEE Transactions on pattern analysis and machine intelligence 33(5):978–994 https://doi.org/10.1109/TPAMI.2010.147 Li Z, Itti L (2011) Saliency and GIST features for target detection in satellite images. TIP 20(7):2017–2029
Improved Image Super-resolution Using Enhanced Generative Adversarial Network a Comparative Study B. V. Balaji Prabhu
and Omkar Subburao Jois Narasipura
Abstract Super-resolution using generative adversarial networks is an approach for improving the quality of imaging system. With the advances in deep learning, convolutional neural networks-based models are becoming a favorite choice of researchers in image processing and analysis as it generates more accurate results compared to conventional methods. Recent works on image super-resolution have mainly focused on minimizing the mean squared reconstruction error and able to get high signal-tonoise ratios. But, they often lack high-frequency details and are not as accurate at producing high-resolution images as expected. With the aim of generating perceptually better images, this paper implements the enhanced generative adversarial model and compares with super-resolution generative adversarial model. The qualitative measures such as peak signal-to-noise ratio and structural similarity indices were used to assess the quality of the super-resolved images. The results obtained prove that, enhanced GAN model is able to recover more texture details when compared to super-resolution GAN models. Keywords Super-resolution · Deep learning · Convolutional neural network · Peak signal-to-noise ratio · Structural similarity · Generative adversarial network · Enhanced generative adversarial network
B. V. Balaji Prabhu (B) · O. S. J. Narasipura Computational Intelligence Lab, Department of Aerospace Engineering, Indian Institute of Science Bangalore, Bangalore, Karnataka, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_15
181
182
B. V. Balaji Prabhu and O. S. J. Narasipura
1 Introduction Super-resolution is an approach for improving imaging system. Image superresolution is defined as reconstructing a high-resolution image with the details obtained from low-resolution images [1]. The super-resolution models developed using convolutional neural networks outperforms the conventional approaches like bicubic interpolation, which is fast but less accurate in terms of image quality as they give blurry images [2, 3]. The generative adversarial networks (GAN) model is used for image superresolution [4] in this work. Figure 1 shows the architecture of generative adversarial networks, comprised of two networks, namely generator network and discriminator network. The generator generates high-resolution images from given lowresolution images, while the discriminator tries to distinguish between generated super-resolution images and high-resolution images. The generator tries to optimize weights to make super-resolution images match high-resolution images so that the discriminator fails to distinguish between super-resolved and high-resolution images [5]. Through multiple cycles of generation and discrimination, both networks train each other, while simultaneously trying to beat each other as shown in Fig. 2. Previous works on super-resolution using GAN have used SRGAN model to generate super-resolution images from low-resolution images. Ledig et al. [6] have proposed a photo realistic image super-resolution method using GAN, the proposed model can recover photo-realistic textures from low-resolution image to generate
Fig. 1 Working flow of generative adversarial networks
Fig. 2 GAN for super-resolution and their content loss and GAN loss
Improved Image Super-resolution Using Enhanced …
183
high-resolution image. Mahapatra et al. [7] have proposed an images super-resolution GAN model for analysis of retinal images. Local saliency maps were used for defining the loss function, and the results show that, the proposed model can generate highresolution images, close to original images. Xie et al. [8] have proposed a tempoGAN for super-resolution of fluid flows. The proposed model generates high-resolution image which is of more detailed, more realistic and temporally coherent features. A progressive multi-scale super-resolution (ProSR) was proposed by Wang et al. [9] which improves the reconstruction quality while upscaling the images. The proposed model achieves a good result, and also, it is working faster compared to other models. Mahapatra et al. [10] have proposed a multi-stage progressive GAN for super-resolution of medical images of a preferred scaling factor. The proposed model has generated high-resolution images with high scaling factor along with good quality. From the results, it has observed that, the proposed model outperformed the existing GAN models. Chenyu et al. [11] have proposed a CTSR-GAN for superresolution of tomography images. The results have shown that, the proposed model is a robust and efficient method in reconstruction of CT low-resolution images to high-resolution images. In spite of SRGAN model generating quantitatively accurate results, a clear gap between quality of SRGAN generated images and the original images is observed. To overcome this issue, this work uses an enhanced super-resolution GAN [12] which is an improvement over SRGAN model. Rest of the paper is organized as follows, Sect. 2 describes the methodology used, and Sect. 3 explains the experimentation conducted, results and discussions are given in Sect. 4, and Sect. 5 concludes the work.
2 Methodology The architectures of the SRGAN and ESRGAN models are explained below in the subsequent sections along with the loss functions used during training.
2.1 SRGAN Network Architecture The network architecture of a generator is given in Fig. 3a. This architecture consists of 16 residual blocks (RB) and each RB comprises of 2 convolutional layers with 64 channels and a feature maps of 3 × 3 filters. Feature maps are followed by batch normalization and ReLu activation layer. The architecture of discriminator network is shown in Fig. 3b. This network has 8 convolutional layers with 3 × 3 kernel filters, and kernel filters are increasing by a factor of 2 from 64 to 512. To get the probability for sample classification, the model has two dense layers and a final sigmoid layer. For nonlinear activation, the model uses LeakyRelu activation.
184
B. V. Balaji Prabhu and O. S. J. Narasipura
Fig. 3 a Network architecture of generator. b Network architecture of discriminator
2.1.1
SRGAN Loss Functions
Earlier works tried to see the similarity in pixel space, and this has led to perceptually unsatisfying blurry images. To overcome this problem, this work uses perceptual loss function which encompass content loss, total variation loss, and adversarial loss. The loss function for the super-resolved images is given in Eq. 1. The perceptual loss and total variation loss are used for training generator. Perceptual loss is the weighted sum of content loss and adversarial loss. For content loss, we use both MSE loss and VGG loss. SR SR SR SR + 10−3 lAdversarial + 6e−3 lVGG + 2e−3 lTV l SR = lMSE
(1)
MSE loss is a pixel-wise difference between super-resolved image and highresolution image as given in Eq. 2. SR lMSE
rW r H 2 1 HR Ix,y − G θG I L R x,y = 2 r W H x=1 y=1
(2)
The content loss may produce highly pixelated and noisy images. To get rid of this, total variation loss is used (Eq. 3) which ensures spatial continuity and smoothness in the output images.
Improved Image Super-resolution Using Enhanced …
185
Fig. 4 a Low resolution. b High resolution with artefacts SR lTV =
x∇ I (x)1
(3)
SRGAN-VGG is a loss defined to focus on the image contents for which it uses feature maps of higher level features from deeper network layers of VGG network. SR lVGG/i, j =
Wi, j Hi, j 2 1 ∅i, j I HR x,y − ∅i, j G θG I LR x,y Wi, j Hi, j x=1 y=1
(4)
These losses will allow us to achieve high PSNR levels. To improve perceptual quality of upscaled images, we include adversarial loss produced by discriminator SR lAdversarial =
N
− log Dθ D G θG I LR
(5)
n=1
This loss pushes upscaled image closer to original image by fooling the discriminator. With this technique, SRGAN significantly improves the overall visual quality of super-resolution images. However, SRGAN results are still not as close to the ground truth images and when we have add some more layers into network, we see formation of noise as shown in Fig. 4a, b which is undesirable. To overcome this problem, we have used another model called ESRGAN which is an improvement over SRGAN.
2.2 ESRGAN The network of ESRGAN is obtained by improving the SRGAN network architecture through high capacity residual-in-residual dense block (RRDB) as shown in Fig. 5. As the deep model has capability to capture semantic information more effectively,
186
B. V. Balaji Prabhu and O. S. J. Narasipura
Fig. 5 Residual block without batch normalization and residual in residual dense blocks
in addition the RRDB will reduce the noise and thereby improves the quality of recovered image. The batch normalization layers of SRGAN are also removed as it is generating unpleasant artifacts and limit the generalization ability. Further, the discriminator network is improved through “Relativistic GAN” which can learn sharper edges and more detailed textures. This is possible for relativistic GAN as it compares whether a generated image is more realistic or not rather than the generated image is real or fake as compared to SRGAN discriminator network.
2.2.1
Loss Functions of ESRGAN
To make the loss function of ESRGAN more effective, the features are constrained pre activation rather than post activation to avoid sparse and inconsistent features as observed in SRGAN. The loss function of ESRGAN is given in Eq. 6. SR SR + 10−2 lMSE + L Ra l SR = lVGG G + L1
(6)
MSE loss is represented in Eq. 7, which assess the difference between superresolved and original image in pixel-wise. SR lMSE =
rW r H LR 2 1 HR I I x,y − G θ G r 2 W H x=1 y=1 x,y
(7)
ESRGAN-VGG: a loss defined to focus on the content of the images SR lVGG/i, j =
Wi, j Hi, j 2 1 ∅i, j I HR x,y − ∅i, j G θG I LR x,y Wi, j Hi, j x=1 y=1
(8)
The adversarial loss for generator L GRa is defined as − E x f log DRa x f , xr L Ra G = −E xr log 1 − DRa xr , x f
(9)
Improved Image Super-resolution Using Enhanced …
187
Here L 1 is the content loss that evaluates the 1-norm distance between superresolved and ground truth high-resolution images and is defined as L 1 = E xi G(xi ) − y1
(10)
3 Experiment This section describes the dataset used for training the model and also the details of training.
3.1 Dataset This work uses RAISE dataset for the experimentation. This dataset includes in total 8137 high-resolution raw images, mainly intended for digital image forensics. It has been guaranteed that the images have not been compressed, and also, it contains images with diverse color and scenery.
3.2 Data Pre-processing We have used 1000 images of size 512 × 512 of RAISE dataset to train both SRGAN and ESRGAN for 1000 epochs for 2× resolution. The high-resolution images are loaded to data loader which will load data and form high- to low-resolution images of size 256 × 256 and the returned high-resolution and low-resolution image pair are used for training the models.
3.3 Training The networks are trained on NVIDIA tesla GPU. PyTorch is used as a framework to implement the models. Our generator network has 5 residual blocks in SRGAN and 5 dense residual blocks in ESRGAN. The “adam” optimizer is used for both generator and discriminator. Learning rate for both models was set to 0.01. SRGAN does not have any warm-up period but in ESRGAN has warm-up period for 50 epochs on MSE loss only. During SRGAN training, we train both generator and discriminator alternatively with data from data loader with perceptual loss and adversarial loss for generator and adversarial loss for discriminator for 1000 epochs.
188
B. V. Balaji Prabhu and O. S. J. Narasipura
Fig. 6 Generator loss of SRGAN with respect to epochs
During ESRGAN training, we first train only generator with pixel loss for 50 epochs and then start training for both generators with perceptual loss and adversarial loss and for discriminator with adversarial loss alternatively for 1000 epochs.
3.4 Training Losses This section contains graphs of discriminator loss and generator loss with respect to epochs for ESRGAN.
3.4.1
SRGAN Losses
Figure 6 depicts the generator loss with respect to epochs during training. From graph, it has been observed that after 1000 epochs there are no considerable changes in generator loss so the training is stopped after 1000 epochs. The discriminator loss during training for each epochs is plotted in Fig. 7. From the figure, it can be observed that, the discriminator loss starts at 1.5 and decreases up to 0.5 and fluctuates at 1.
3.4.2
ESRGAN Losses
Figures 8 and 9 contain generator loss with respect to epoch. From graph, we find that after 1000 epochs there is no considerable change in loss so we stopped training for 1000 epochs. For ESRGAN, discriminator loss starts at 0.8 and decreases to 0 and fluctuates at 0.
Improved Image Super-resolution Using Enhanced …
Fig. 7 Discriminator loss of SRGAN with respect to epochs
Fig. 8 Generator loss of ESRGAN with respect to epochs
Fig. 9 Discriminator loss of ESRGAN with respect to epochs
189
190
B. V. Balaji Prabhu and O. S. J. Narasipura
4 Results and Discussion Performance of the ESRGAN model was compared with the SRGAN using PSNR and SSIM scores on 10 low-resolution images of size 256 * 256. Following tables are the results of the images for metrics structural similarity and PSNR. The PSNR score computes the peak signal-to-noise ratio in decibels between two images, higher the PSNR better the quality of the reconstructed image. The structural similarity (SSIM) index measures the similarity between two images. The SSIM index is a measure the quality of one images being compared and provided the other image regarded as of perfect quality. Higher the value more is the similarity. The following tables have metric values for SRGAN and ESRGAN, and these values are calculated using SSIM and PSNR formulas, and from them, we see SRGAN has large value than ESRGAN (Table 1). The models are also tested using mean opinion score (MOS) testing with three different people. The following table has the values from the 3 raters. We find that mean opinion score of ESRGAN generated images are higher than mean opinion score of SRGAN generated images. It shows that ESRGAN model is capable of generating images with more accuracy and texture details. As the MOS values are more for ESRGAN compared to SRGAN which indicates, ESRGAN model is capable of generating more natural structures in detail while SRGAN produces picture with undesired texture (Table 2). Figure 10 shows the comparison of low resolution (a), SRGAN generated superresolution (b), ESRGAN generated super-resolution (c), and high-resolution images (d). Below the each figures a zoomed portion of the generated images is given from which it can be observed that, the ESRGAN generated super-resolution images have perceptually more accuracy compared with SRGAN generated images. Table 1 PSNR and SSIM values PSNR
SSIM
IMAGES
SRGAN
ESRGAN
SRGAN
ESRGAN
1
30.602
30.160
0.812
0.752
2
28.808
29.416
0.817
0.799
3
30.258
28.787
0.838
0.816
4
30.834
31.277
0.866
0.807
5
30.066
29.315
0.780
0.709
6
33.996
29.495
0.948
0.903
7
29.634
29.247
0.754
0.686
8
30.856
31.782
0.866
0.838
9
33.106
34.064
0.914
0.867
10
31.460
31.255
0.860
0.759
Improved Image Super-resolution Using Enhanced …
191
Table 2 MOS values IMAGES
SRGAN
ESRGAN
HR
1
2
4
5
2
3
5
4
3
3
3
5
4
4
4
5
5
3
5
5
6
3
4
5
7
3
5
4
8
2
4
5
9
1
3
5
10
2
4
5
AVG
3.1
4.1
4.8
5 Conclusion The main objective of this work is to assess the perceptual quality of superresolution images with respect to human visuals. So, this work implemented the existing SRGAN model for image super-resolution and compared the results with the enhanced GAN results which are an improvement over SRGAN model. The results for both models are compared using structural similarity, PSNR and MOS as quantitative measures. The aim of this work is to show ESRGAN captures more perceptual quality than SRGAN. From the results, it has been observed that, SRGAN has better values for PSNR and SSIM when compared to ESRGAN but in terms of MOS measures ESRGAN achieves closer values to the original high-resolution images than SRGAN values. We have shown that the PSNR and SSIM measures are not a good measure to capture image quality with respect to human visual system. So, the models are validated using MOS method to assess the perceptual performance of SRGAN and ESRGAN images. From the result it found that, the ESRGAN model generates perceptually most convincing results compared to SRGAN results.
192
B. V. Balaji Prabhu and O. S. J. Narasipura
(a)
(b)
(c)
(d)
Fig. 10 a Low-resolution images. b SRGAN generated super-resolution images. c ESRGAN generated super-resolution images. d High-Resolution images
Improved Image Super-resolution Using Enhanced …
193
Acknowledgements The authors would like to thank Space Technology Cell (STC), Department of Aerospace Engineering, Indian Institute of Science, Bangalore, Karnataka, India, for funding this project.
References 1. Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307 2. Tong T, Li G, Liu X, Gao Q (2017) Image super-resolution using dense skip connections. In: Proceedings of the IEEE international conference on computer vision, pp 4799–4807 3. Lim B, Son S, Kim H, Nah S, Lee KM (2017) Enhanced deep residual networks for single image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 136–144 4. Ledig C et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. Puter Vis Pattern Recogn 2(3):4 5. Bulat A, Yang J, Tzimiropoulos G (2017) To learn image super-resolution, use a GAN to learn how to do image degradation first. Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol 11210, pp 187–202 6. Ledig C, Theis L, Huszar F, Caballero J, Cunningham A (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, pp 105–114. https://doi.org/10.1109/ CVPR.2017.19 7. Mahapatra D, Bozorgtabar B, Hewavitharanage S, Garnavi R (2017) Image super resolution using generative adversarial networks and local saliency maps for retinal image analysis. Lect Notes Comput Sci. https://doi.org/10.1007/978-3-319-66179-7_44 8. Xie Y, Franz E, Chu M, Thuerey N (2018) tempoGAN: a temporally coherent, volumetric GAN for super-resolution fluid flow. ACM Trans Graph 37(4), Article 95 9. Wang Y, Perazzi F, McWilliams B, Sorkine-Hornung A, Sorkine-Hornung O, Schroers C (2018) A fully progressive approach to single-image super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 864–873 10. Mahapatra D, Bozorgtabar B, Garnavi R (2019) Image super-resolution using progressive generative adversarial networks for medical image analysis. Comput Med Imaging Graph 71:30–39 11. You C, Li G, Zhang Y, Li M, Ju S, Zhao Z, Zhang Z, Cong W (2020) CT super-resolution GAN constrained by the identical, residual, and cycle learning ensemble (GAN-CIRCLE). IEEE Trans Med Imag 39(1) 12. Wang X, Yu K, Wu S, Gu J, Liu Y, Dong C, Qiao Y, Change Loy C (2018) ESRGAN: enhanced super-resolution generative adversarial networks. In: Proceedings of the European conference on computer vision (ECCV)
Comparative Study of Supervised Machine Learning Algorithms for Healthcare Dataset Using Orange Vaibhav Bhatnagar and Ramesh C. Poonia
Abstract Machine learning algorithms are special computer programs that improve their efficiency through experiences. It is the combination of statistics and computer algorithms that is used for identifying the hidden pattern and forecasting. In this paper, seven supervised machine learning algorithms are compared with three datasets of heart disease patients, diabetes patients, and depression patients. These seven algorithms are decision tree, logistic regression, K-nearest neighbor, support vector machine, Naïve Bayes, random forest, and Adaboost are implemented on Orange. Orange is an open-source graphical user interface platform for the implementation of machine learning algorithms. From the study, it is found that logistic regression and Naïve Bayes algorithm shown better results as compared to other algorithms of average accuracy of 81.23% and 79.65%. Support vector machine does not fit with these types of classification and performed accuracy of 56.34%. As an inference, SVM should not be implemented on such classification problems. Keywords Machine learning · Healthcare data · Orange · Logistic regression · Naïve Bayes · Support vector machine
1 Introduction Machine learning [1] is defined as the study of computer programs that leverage algorithms and statistical models to learn through inference and patterns without being explicitly programed. Machine learning field has undergone significant developments in the last decade. Machine learning is a subcategory of artificial intelligence (AI). The objective of machine learning generally is to know the design of data and fit that data into models that can be known and exploit by people. Machine learning is the V. Bhatnagar Department of Computer Applications, Manipal University Jaipur, Jaipur, India R. C. Poonia (B) Amity Institute of Information Technology, Amity University Rajasthan, Jaipur, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_16
195
196
V. Bhatnagar and R. C. Poonia
field of study that gives computer a skill to learn without having explicitly program [2]. Comparable to conventional computational approaches, machine learning algorithms rather allow for computers to trail on data inputs and use statistical analysis to output values that fall within a distinct range. Most of the time, machine learning is considered as advanced concepts of statistics, but there is s significance difference between ML and statistics. The difference lies in the purpose itself, ML models are designed to predict accurately, whereas statistics-based models are designed to inference out the relationships among the variables. Statistical models also perform prediction, but these models are not so accurate as compared to ML models. Machine learning deals with unsupervised and supervised learning, whereas statistics deals with population, sampling, and hypothesis framing. Parents of statistics and ML are also different, ML is a subbranch of AI and statistics is derived from core mathematics. However, to be a good ML expert, once should have a good knowledge of statistics. Because of this, machine learning aid computers in constructing models from sample data to brutalize decision-making processes based on data inputs [3]. Machine learning can be used in advertisement popularity, spam classification, face recognition, prediction systems, recommender systems, buying habits, grouping user logs, learning associations, development of video games, industrial simulation, resource management and agriculture [4, 5]. Languages for implement machine learning algorithms are Python, Prolog, Java, R, and Julia. Their frameworks are also directly used by the experts such as Rattle, TensorFlow, Weka, RapidMiner, and Orange. Flow diagram of machine learning is shown in Fig. 1. Machine learning algorithm can be classified into five categories [6], Supervised, Unsupervised, Reinforcement, Deep Learning, and Deep Reinforcement. In supervised machine learning, set of labeled datasets is used while training an algorithm. In supervised machine learning, target attribute is present, which is affected by other attributes. In unsupervised learning, target attribute is not specified, but a system infers a hidden structure from unlabeled dataset. In reinforcement machine learning algorithm, models are trained to make a sequence of decisions. Input in reinforcement machine learning algorithm is an initial state, and there are multiple outputs present. The training is based upon the input, model will return a state and the user will decide to reward or punish the model based on its output. Deep learning is a subset of machine learning in artificial intelligence that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Last, deep reinforcement, is the combination of deep learning and reinforcement learning, this field of research
Fig. 1 Working of machine learning
Comparative Study of Supervised Machine Learning …
197
has been able to solve a wide range of complex decision-making tasks that were previously out of reach for a machine. The objective of this paper is to compare the efficiency of supervised machine learning algorithms on heart, diabetes, and depression. Logistic regression, decision tree, KNN, support vector machine, random forest, and Adaboost algorithms are compared in this paper. Orange Software is used to perform the comparative study. The organization of this paper is as follows: After this introduction section, second part discusses the Orange, third part discusses supervised machine learning algorithm, fourth part is Data Collection, fifth part is implementation of algorithms on orange, sixth part is Discussion, and last seventh part discusses the Conclusion and Future Work.
2 Orange Orange [7] is an open-source framework used for implementation of machine learning and data mining techniques. It is a visual programming-based tool used for data visualization, data analysis, and machine learning algorithms. Initially, Orange was first conceived as a C++ library of machine learning algorithms and related procedures, such as preprocessing, sampling, and other data manipulation. Current version of framework uses python open-source libraries such as numpy, scjikit-learn, and scipy. Cross-platform QT framework is used to operate the GUI of orange tool. This framework is compatible with Windows, macOS, and Linux operating systems. This framework is developed by the University of Ljubljana. Evolution of Orange is shown in Table 1. Orange is UI friendly tool for applying machine learning algorithms suitable for learners [8]. User interface of Orange is divided into two parts, one is canvas and other is widget. Canvas is a place where data analysts/machine learning expert put widgets. Widgets are used for reading and showing the data, selection of features, Table 1 Evolution of Orange [7] S. No.
Year
Evolution
1
1997
Development started by Janez Demšar and Blaž Zupan
2
2002
First prototype of GUI is created Pmw Python megawidgets
3
2003
Reformed is done using qt framework
4
2005
Bioinformatics analysis is introduced
5
2008
macOS-based packages developed
6
2009
More 100 widgets created
7
2013
Redesign for heavy GUI
8
2015
Version 3.0 is released
9
2016
Version 3.3. is released
198
V. Bhatnagar and R. C. Poonia
applying algorithms, and visualizing outputs. Widgets of Orange version 3.3 are divided into 5 components. First widget is Data, which is used for importing data and data wrangling. Second widget is Visualize, which is used for visualizing the data through different graphs and plots. Third widget is Model, which provide different supervised machine learning. Fourth widget is Evaluate, which facilities to evaluate machine learning models applied on the data. Fifth widget is Unsupervised, which is used to apply unsupervised machine learning algorithms. Orange is versatile and rich tool that provide add-ons packages for Bioinformatics biological data analysis, Educational package for teachers, Geo package for spatial analysis, ImageAnalytics for image classifications, Spectroscopy package for cell study, Text package for textanalysis, and even Timeseries for time-series analysis.
3 Supervision Machine Learning Algorithm In this section, different supervised machine learning algorithms such as decisions tree, KNN, support vector machine, logistic regression, random forest, and Adaboost are discussed. Decision Tree: Decision tree [9] is a supervised machine learning algorithm, used for both classification and regression techniques. It is a flowchart like representation, that tries to identify different ways to split the given dataset depending on various conditions. Target attribute is generally a categorical dichotomous variable that can have at most two values, e.g., Yes/No, true/false, Present/Absent, Live/Deceased, etc. Input in decision tree is of categorical type. Indeed, decision trees are easy to implement and interpreter, but it has a serious disadvantage that DTs are very unstable in nature, that means slight change in data can affect the entire structure of DT. Random forest [10] classification is next stage of decision tree where a large number of decision trees are created. Each tree in forest labeled with class prediction and class which has most votes declared as model’s prediction. In comparison with decision tree, random forest has high predictive capacity with better accuracy. However, it is complex to implement and take more time to train the data. Logistic Regression: Logistic regression [11] is a supervised machine learning algorithm used for predictions. As the name itself depicts, it is based on the logistic function, which is “s” shaped in graphic representation. It is used to identify probability of an event to log a function. This function takes real vales as input but map these values into 0 to 1 value. Like decision tree it can also predict categorical binary values. It has an assumption that it works with no or moderate collinearity among all the independent attributes. However, it has some disadvantages such as it is affected by outliers, gives the result with over fitting, and not designed for nonlinear analysis. KNN: KNN stands for k-nearest neighbor algorithm [12], which is supervised machine learning algorithm used for both regression and classification techniques. It works on philosophy of similar data supposed to be nearer to each other. It classifies the data based on the nearest distance among themselves. In KNN algorithm, an integer value of K is decided by which K near data points are chosen for classification.
Comparative Study of Supervised Machine Learning …
199
When new data is input in the model, it classifies according to well suite category. However, it is not suitable for ample amount of data with high dimensions. Support Vector Machine (SVM): Support vector machine [13] is also supervised machine learning algorithm, mostly used in classification. In this algorithm, datasets are plotted in n-dimensional space. Value of n depends upon on the features present in the datasets. Classification is performed by the construction of hyperplane. Hyperplane is used to differentiate between two groups, and thus classification is performed. SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as support vectors, and hence, algorithm is termed as support vector machine. SVM is also not suitable for large amount of data. Naïve Bayes: Naïve Bayes algorithm [14] is also supervised machine learning algorithm, which works on Bayes Theorem, but assumption taken that independence among predictors. Bayes classification works on conditional probability, where to determine probability of occurrence of an even when an event is already occurred. It is also used in classification and most suitable for large dataset. It is very useful for text and image classification. It has an assumption that it all variables are independent to each other. Naïve Bayes cannot predict with Zero Frequency. Adaboost: Adaboost is an abbreviation of adaptive boosting. Boosting [15] is a technique that is used to create a strong classifier from a weaker classifier. In boosting, first a model is created using training data and in second stage another model is created that rectify the misclassification done in previous model. This process is repeated until the tanning dataset classify the dataset without any misclassification. Adaboost [16] is designed to increase the performance of decision tree. The idea behind the Adaboost is that weights are set on classifiers, and tanning of dataset in every iteration is done in such a manner that it ensures the accurate predictions of unusual observations. Like logistic regression, Adaboost also suffers from overfitting problem. Performance of Adaboost is affected by outlier and noisy data.
4 Data Collection In the paper, secondary data of heart patients, diabetic patients, and depression patients are taken from Kaggle.com. Total number of records of heart patients, diabetic patients, and depression patient are 304, 768, and 1429, respectively. Number of attributes are shown in Tables 2, 3, and 4, respectively. These tables show name of attributes, nature of attribute, and their possible values including discerption.
5 Implementation of Machine Learning Through Orange In this section, the machine learning algorithms are implemented through Orange. The following steps are performed:
200
V. Bhatnagar and R. C. Poonia
Table 2 Attributes of heart patient data S. No.
Name of Attribute
Scale/categorical
Possible values
1
Age
Scale
Real number
2
Sex
Categorical
Male, Female
3
Severity of Chest Pain
Categorical
1: low severity and 3: high severity
4
Trestbps
Scale
As per standard medical vales
5
Chol
Scale
As per standard medical values
6
Fbs
Scale
As per standard medical values
7
Restecg
Categorical
0: Normal, 1: moderate, 2: hypertrophy
8
Thalach
Scale
As per standard medical values
9
Exang
Scale
Real number
10
Oldpeak
Scale
Real number
11
Slope
Categorical
0 to 2 (low to high)
12
Ca
Categorical
0 to 2 (low to high)
13
Thal
Categorical
1 to 3 (low to high)
12
Target (Target Attribute)
Categorical
0 and 1, suffered or not suffered
Table 3 Attribute of diabetic patient data S. No.
Name of attribute
Scale/categorical Possible VALUES
1
NTP
Scale
Real number
2
PGC
Scale
As per standard medical values
3
BP
Scale
As per standard medical values
4
TSFT
Scale
As per standard medical values
5
Insulin
Scale
As per standard medical values
6
BMI
Scale
As per standard medical values
7
Diabetes
Scale
As per standard medical values
8
Age
Scale
9
Class Variable (Target Attribute) Categorical
Real number 0 and 1, suffered or not suffered
5.1 Data Wrangling Data wrangling [17] is a process in which raw data is transformed into another format to make it appropriate more valuable. Following operations are performed in data wrangling is shown in Table 5 and Fig. 2.
Comparative Study of Supervised Machine Learning …
201
Table 4 Attribute of depression patient S. No.
Name of attribute
Scale/categorical
Possible values
1
Sex
Categorical
1 male, 0 female
2
Age
Scale
Real number
3
Married
Categorical
Marital Status (1 and 0)
4
Number of Children
Scale
Real number
5
Educational Level
Scale
Real number
6
Total_Number
Scale
Real number
7
Gained_Assest
Scale
Real number
8
Living Expenses
Scale
Real number
9
Incoming_Scale
Category
1 and 0
10
incoming_own_farm
Category
1 and 0
11
incoming_business
Category
1 and 0
12
Depressed (Target Attribute)
Category
1 and 1, suffered or not suffered
Table 5 Data wrangling of all three datasets S. No. Name of method
Description
On table
1
Impute
Fill the missing values in the datasets
Missing values are replaced with average/most frequent value
2
Purge Domain
Remove redundant values in the All duplicate values are removed datasets in the datasets
3
Feature Statistics Descriptive statistics of datasets
Descriptive statistics is applied on all datasets
5.2 Calculation of Correlation in the Datasets The next step is to calculate correlation among all the variables in all three datasets. Orange provides facility to calculate both Pearson and Spearman correlation among the data. Sample of correlation of age attribute with all other attribute of heart dataset is shown in Table 6, which is directly exported from Orange tool. It is quite is to apply correlation in Orange by just using Correlation widget, additional any result of orange can be extracted in spreadsheet by connecting it from Save Data widget. Scatter plots are used to show how to one attribute affect another variable. Scatter plots of Age and Chest Pain with respect to target value (dataset 1), Diabetes Function and Age (dataset 2), Age and Number of Patients (dataset 3) attributes are shown below in Fig. 3.
202
V. Bhatnagar and R. C. Poonia
Fig. 2 Data wrangling of all datasets Table 6 Correlation of age attribute with all other attribute of heart dataset Correlation
Feature 1
Feature 2
−0.398
Age
Thalach
0.341
Age
Ca
0.286
Age
Trestbps
0.268
Age
Oldpeak
0.196
Age
Chol
−0.184
Age
Slope
−0.133
Age
Restecg
−0.087
Age
Cp
0.087
Age
Thal
Comparative Study of Supervised Machine Learning …
203
Fig. 3 a Scatter plot between Age and Chain Pain. b Scatter plot between Age and Diabetes Function. c Scatter plot between Age and Number of Children
204
V. Bhatnagar and R. C. Poonia
Fig. 3 (continued)
5.3 Implementing Machine Learning Algorithms In this stage, supervised machine learning algorithms such as decision tree, logistics regression, support vector machine, KNN, random forest, and Adaboost are implemented in all three datasets. Implementation of all algorithms in all three datasets is shown in Fig. 4.
Fig. 4 Implementation of supervised machine learning algorithms on all three datasets
Comparative Study of Supervised Machine Learning …
205
Table 7 Confusion matrix of all three datasets S. No.
Dataset
Algorithm
1
Heart Disease Dataset
Decision Tree
75.90
24.09
Logistic Regression
83.49
16.5
KNN
65.35
34.65
Naïve Bayes
83.82
16.17
SVM
73.2
26.73
Random Forest
80.85
19.14
2
3
Diabetes Dataset
Depression Disease
Correct Classification (%)
Misclassification (%)
Adaboost
77.22
22.44
Decision Tree
70.83
29.16
Logistic Regression
76.95
23.04
KNN
71.09
28.90
Naïve Bayes
73.56
26.4
SVM
47.91
52.02
Random Forest
76.17
23.82
Adaboost
70.96
29.03
Decision Tree
74.59
25.40
Logistic Regression
83.3
16.65
KNN
81.45
18.54
Naïve Bayes
81.59
18.40
SVM
48.84
51.15
Random Forest
81.59
18.40
Adaboost
78.31
16.65
5.4 Evaluation of Algorithms Based on Confusion Matrix Confusion matrix is an easier way to identify the efficiency of classification. It clearly depicts the correct classifications and misclassification. Type I and Type II errors are also depicted by confusion matrix. Table 7 shows the efficiency of algorithms.
6 Discussion Seven machine learning algorithms are applied on three different datasets. Decision tree obtained average accuracy of 73.56% in all three datasets, which is acceptable in nature. Logistic regression obtained average accuracy of 81.23% in all three datasets, which is quite good in nature. KNN obtained average accuracy of 72.6% in all three datasets, which is again acceptable in nature. Naïve Bayes obtained average accuracy of 79.65% in all three datasets, which can be accepted as high rate of successful.
206
V. Bhatnagar and R. C. Poonia
Support vector machine obtained average accuracy of 56.34% in all three datasets, which cannot be accepted. Random forest obtained average accuracy of 79.5% in all three datasets, which is also acceptable in nature. Last, Adaboost obtained average accuracy of 56.34% in all three datasets, which is also acceptable in nature. Individual performance of each algorithm is shown in Fig. 5 using ROC analysis. Dataset wise, Naïve Bayes algorithm and Logistic Regression obtained highest accuracy of 83.82% and 83.49%, whereas SVM received lowest accuracy of 73.2% in heart disease dataset. Logistics regression algorithm obtained highest accuracy of 76.95% and SVM received lowest accuracy of 47.91% in diabetes dataset. In depression dataset, logistics regression obtained highest accuracy of 83% and SVM again obtained lowest accuracy of 48.84%. Overall logistic regression performs best but it is very hard to say that logistic regression always best to apply on these types of classification problems. However, it can be inferred that support vector machine is not suitable for these types of classification problems.
7 Conclusion and Future Work Machine learning is indeed the part of artificial intelligence that helps computer program to identify the hidden pattern and to predict the values in the datasets. Machine learning is the amalgam of statistics and computer programming. Different programming is available for implementing machine learning such as Python, R, IBM Watson. Their frameworks are also available for implementing machine learning algorithm directly. Orange is of them framework that is used for implementation of machine learning algorithms. It is open-source UI-based framework which is best suitable for beginners. It provides facility of drag and drop of machine learning algorithms and generates the output in a user friendly. Seven supervised machine learning algorithms—decision tree, KNN, Naïve Bayes, SVM, logistic regression, random forest, and Adaboost are applied of three different datasets of heart disease patients, diabetes patients, and depression patients. It is found that logistic regression and Naive Bayes algorithm perform better in these datasets. SVM algorithm does not obtained results, so it can be inferred that SVM algorithm should not be implemented on healthcare related such datasets. As a future work, this comparison can be enhanced by adding more numbers of datasets and by adding more domains of datasets.
Comparative Study of Supervised Machine Learning …
207
Fig. 5 a ROC of diabetes disease dataset. b ROC of heart disease dataset. c ROC of depression disease dataset
208
V. Bhatnagar and R. C. Poonia
References 1. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning. Neur Stat Classification 13(1994):1–298 2. Alpaydin E (2020) Introduction to machine learning. MIT Press 3. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Machine Learn Res 12:2825–2830 4. Kumar S, Sharma B, Sharma VK, Poonia RC (2018) Automated soil prediction using bagof-features and chaotic spider monkey optimization algorithm. Evol Intel. https://doi.org/10. 1007/s12065-018-0186-9 5. Kumar S, Sharma B, Sharma VK, Sharma H, Bansal JC (2018) Plant leaf disease identification using exponential spider monkey optimization. Sustain Comput Inf Syst. https://doi.org/10. 1016/j.suscom.2018.10.004 6. Ayodele TO (2010) Types of machine learning algorithms. New Adv Machine Learn 3:19–48 7. Demšar J, Zupan B (2012) Orange: Data mining fruitful and fun. Informacijska Družba− IS 2012:6. 8. Demšar J et al (2004) Orange: from experimental machine learning to interactive data mining. In: European conference on principles of data mining and knowledge discovery. Springer, Berlin, Heidelberg 9. Ben-Haim Y, Tom-Tov E (2010) A streaming parallel decision tree algorithm. J Machine Learn Res 11(2) 10. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22 11. Cameron TA (1988) A new paradigm for valuing non-market goods using referendum data: maximum likelihood estimation by censored logistic regression. J Environ Econ Manage 15(3):355–379 12. Soucy P, Mineau GW (2001) A simple KNN algorithm for text categorization. In: Proceedings 2001 IEEE international conference on data mining. IEEE 13. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neur Process Lett 9(3):293–300 14. Catal C, Sevim U, Diri B (2011) Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm. Expert Syst Appl 38(3):2347–2353 15. Schapire RE (2003) The boosting approach to machine learning: an overview. In: Nonlinear estimation and classification. Springer, New York, NY, pp 149–171 16. Schapire RE (2013) Explaining adaboost. In: Empirical inference. Springer, Berlin, Heidelberg, pp 37–52 17. McKinney W (2012) Python for data analysis: data wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Inc.
Maximum Power Point Tracking of Photovoltaic System Using Artificial Neural Network Kusum Lata Agarwal and Shubham Sharma
Abstract The use of renewable energy sources is increasing day by day in order to meet ever-increasing energy demands at the same time to keep the carbon emissions low. The solar photovoltaic (PV) systems are offering best solution among the available renewable energy alternatives. The troubles with the solar PV system are its higher installation cost and lower energy efficiency at commercial scale. The nonlinear I-V characteristic of PV array depends on temperature and solar insolation. The MPPT techniques are used continuously to track maximum power point at variable temperature and solar insolation. In this paper, DC-DC boost converter topology is used to track maximum power point on PV array. The “Perturb and Observe” method is generally used for MPPT, and in this paper, backpropagation algorithmbased artificial neural network (ANN) technique is presented for maximum power point tracking, and the results of this technique are compared with the conventional “P and O” method. The complete model is simulated in MATLAB/Simulink software. The results obtained through ANN technique are quite satisfying, and it is found that this MPPT offers better result than conventional “P and O” method. Keywords MPPT · Artificial neural network · DC-DC boost converter · MATLAB/Simulink
1 Introduction The Sun is an unlimited source of energy on the Earth and provides much more energy than global energy demand. The capital cost of solar PV system is very high, and energy conversion efficiency is near to 15% at commercial scale for polycrystalline K. L. Agarwal (B) Department of Electrical Engineering, JIET, Jodhpur, Rajasthan, India e-mail: [email protected] S. Sharma JIET, Jodhpur, Rajasthan, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_17
209
210
K. L. Agarwal and S. Sharma
technology, which is low in comparison with the other energy sources. To compete in price with other energy sources, one needs to enhance the efficiency of solar PV module. To increase efficiency of solar PV system, maximum power point tracking (MPPT) methods are very useful as these techniques continuously track maximum power available at SPV module. Our goal is to increase the efficiency of PV system using ANN-based MPPT algorithm. Most commonly, “perturb and observe” method is implemented for MPPT, but when weather conditions are changing continuously, then voltage fluctuations occur which are difficult to deal with this method [1]. The energy efficiency improvement is not as good as expected. To deal with this problem, artificial neural network (ANN) approach is presented in this paper. For different temperatures and solar insolation, maximum power point occurs at different voltages; so for MPPT, one needs to change the voltage. The boost converter topology is used for MPPT. The paper also presents the detailed analysis of perturb and observe method. The ANN-based MPPT is represented in this paper; many other MPPT techniques such as random forest [2], neural network-based incremental conductance algorithm [3] and cuckoo search algorithm [4] can be used for MPPT.
2 PV Array Modeling The output power of solar PV array is function of voltage and current, but the output of PV array is decided by the temperature and solar insolation. The equivalent circuit of solar PV array is shown in Fig. 1. The mathematical equation for output current is written as V + IRS V + IRS −1 − I = IL − I0 exp nVT RSH
(1)
This equation not gives current directly. The Lambert W function is used to solve the above equation. The solution for V is given as:
Fig. 1 Electric Model of PV Array
Maximum Power Point Tracking of Photovoltaic System …
211
Table 1 PV module specification PV Module Modules connected in series Modules connected in parallel Maximum power (PMPP ) OCV (Voc ) SCC (Isc ) Maximum power voltage (VMPP ) Maximum power current (IMPP ) Current temperature coefficient (α) Voltage temperature coefficient (β)
1 1 213.15 36.3 Volt 7.84 29 Volt 7.35 0.102 -0.36099
IL − I V = nVT ln + 1 − IRS I0
(2)
The open-circuit voltage equation is written as: VOC ≈
IL nkT ln +1 q I0
(3)
The value of current at nominal temperature Tn and nominal solar irradiance is given as G Iph = [Iscn + α(T − Tn )] (4) Gn The main task in modeling of PV array is to compute values of Rs and Rsh . The values of Rs and Rsh were computed by the Newton and Raphson method. The specifications of PV array used for MATLAB simulation are presented in Table 1. The V-I and P-V characteristics for different solar irradiances and at 25 ◦ C are shown in Fig. 2. Figure 2 shows that short-circuit current changes linearly whereas open-circuit voltage changes in logarithmic manner with the solar irradiance.
3 MPPT Techniques In this section, “perturb and observe” technique and neural network backpropagation algorithm are discussed.
212
K. L. Agarwal and S. Sharma Array type: 1Soltech 1STH-215-P; 1 series modules; 1 parallel strings Current (A)
Fig. 2 Characteristics of solar array 8
1 kW/m 2 0.8 kW/m 2 0.7 kW/m 2 0.6 kW/m 2 0.5 kW/m 2 0.4 kW/m 2 0.3 kW/m 2 0.2 kW/m 2 0.1 kW/m 2
6 4 2 0
0
10
20
30
40
Power (W)
Voltage (V) 1 kW/m 2
200
0.8 kW/m 2 0.7 kW/m 2 0.6 kW/m 2 0.5 kW/m 2 0.4 kW/m 2 0.3 kW/m 2 0.2 kW/m 2 0.1 kW/m 2
100 0
0
10
20
30
40
Voltage (V)
3.1 “Perturb and Observe” Method The “perturb and observe” technique is basically a trial and error technique. In this method, we give perturbation in voltage and corresponding change in power is observed. The slope of P-V curve on the left side of MPP is positive and in the right side is negative. This concept provides us help in tracking maximum power point. The flowchart of “perturb and observe” technique is presented below. When maximum power point is arrived, using this algorithm, the power point oscillates around peak value. To avoid large oscillations, the perturbation values are kept very small. The voltage at MPP is said to be reference voltage (Fig. 3).
3.2 Neural Network Backpropagation Algorithm The backpropagation algorithm is a supervised learning method. The training of the network depends on teacher and environmental conditions. The backpropagation algorithm is iterative process, so sometimes, it may be time consuming. To increase speed, data can be divided into batches and then given to the network. The training of neural network was done in MATLAB simulink. The architecture of ANN is presented in Fig. 4. The inputs to neural network are temperature and solar irradiance. The backpropagation NN takes input and processes it as per its training examples and provides the control signal to pulse generator. The pulse generator provides firing pulse to switch of boost converter. The block diagram of NN control is presented in Fig. 5. The backpropagation neural network algorithm is as follows: Step 1: Initialize weights to some random value. Step 2: Present the network with set of training examples and repeat step 3.
Maximum Power Point Tracking of Photovoltaic System …
213
Fig. 3 Flowchart of “perturb and observe” method
Step 3: If neuron j is selected and iteration number is n, then the following terms are computed: Error ej (n) = dj (n) − yj (n)
(5)
Error cost function E(av) =
1 ej (n)2 2
(6)
214
K. L. Agarwal and S. Sharma
Fig. 4 NN architecture
Fig. 5 Block diagram of NN control
Average value of cost function E(n) =
1 E(n) N
(7)
Induced local field Vj (n) = Wji (n)yi (n)
(8)
Maximum Power Point Tracking of Photovoltaic System …
215
Output yj (n) yj (n) = φ(Vj (n)) Step 4: Adjust weights to make error zero, and for that,
(9) δE(n) δWji (n)
δj (n) = ej (n)φ (Vj (n))
is computed. (10)
The weight adjustment is given as: Wji (n) = ηδj (n)yi (n)
(11)
Here η represents learning rate. Step 5: If j belongs to the outer layer, then δj (n) can be computed easily from Eq. 10. If j belongs to middle layer, then δj (n) is calculated by formula given as:
δj (n) = φ (Vj (n))δk (n)Wkj (n)
(12)
4 Simulation Results The MATLAB simulation of “perturb and observe” method and neural network has been done successfully. It is found that backpropagation-based neural network provides better result than “perturb and observe” method. The validation data of neural network is presented in Fig. 6. The best validation performance is found as 0.0013294 at epoch 73. The training state of neural network is presented in Fig. 7. The neural network works on error correction process. The values of temperature and solar radiations were changed suddenly, and it is found that neural network-based MPPT provides better result than perturb and observe method. Table 2 shows results of maximum power point tracking. This table was obtained for constant temperature (25o C). Figure 8 shows the output power for solar radiation of 1000 kW h/m2 and temperature of 25 ◦ C.
5 Conclusion This paper gives a comparison between the “perturb and observe” and “artificial neural network”-based MPPT methods. It is found that without MPPT maximum power transfer at solar radiation of 1000 kW h/m2 and temperature of 250o C was 150.9 Watts whereas the “perturb and observe” MPPT has transferred 211.5 Watts and artificial neural network-based MPPT method has transferred 213.1 W to the connected load on SPV module. The rated maximum power of the solar PV module
216
K. L. Agarwal and S. Sharma Best Validation Performance is 0.0013294 at epoch 73
Mean Squared Error (mse)
101
Train Validation Test Best
100
10-1
10-2
10-3
10-4 0
10
20
30
40
50
60
70
60
70
79 Epochs
Fig. 6 Validation data of neural network Gradient = 0.0090515, at epoch 79
gradient
100
10-5
Mu = 0.0001, at epoch 79
mu
10-2 10-4
val fail
Validation Checks = 6, at epoch 79 5
0
0
10
20
30
40
79 Epochs
50
Fig. 7 Training state of neural network Table 2 Maximum power point tracking result S. No. Solar radiation Power output (W) (k W h/m2 ) P & O Result 1 2 3 4 5
1000 800 600 400 200
211.5 168.5 125.2 82.1 38.2
Neural network 213.1 171.6 129.4 86.2 42.4
250
250
200
200
Output Power (Watts)
Output Power (Watts)
Maximum Power Point Tracking of Photovoltaic System …
150
100
50
0
150 100 50 0
0
0.02
0.04
0.06
0.08
0.1
Time (Sec.)
(a) Perturb & Observe Result
217
0
0.02
0.04
0.06
0.08
0.1
Time (Sec.)
(b) Neural Network Result
Fig. 8 Output power of different MPPT methods
has been 213.15 Watts, and therefore, it is observed in the result that ANN-based MPPT algorithm tracks maximum power approximately same as rated one. The ANN-based MPPT algorithms can successfully be used for tracking the maximum power under non-uniform insolation conditions. Though it is beyond the scope of presented paper, it is expected that the powerful technique presented in the paper can be used in wider aspect. Acknowledgements Author is grateful to TEQIP-III at RTU-ATU, Kota for providing project grant under the sanction no. TEQIP-III/RTU(ATU)/CRS/2019-20/30.
References 1. Suganya J, Carolin Mabel M (2012) Maximum power point tracker for a photovoltaic system. In: International Conference on Computing, Electronics and Electrical Technologies (ICCEET), Kumaracoil, pp 463–465 2. Jour M, Shareef E, Mohamed H, Azah (2017) Random forest-based approach for maximum power point tracking of photovoltaic systems operating under actual environmental conditions. In: Computational intelligence and neuroscience, Hindawi, sp 1673864 3. Punitha K, Devaraj D, Sakthivel S (2013) Artificial neural network based modified incremental conductance algorithm for maximum power point tracking in photovoltaic system under partial shading conditions. Energy 62:330–340 4. Ibrahim A, Obukhov S, Aboelsaud R (2019) Determination of global maximum power point tracking of PV under partial shading using cuckoo search algorithm. Appl Sol Energy 55:367– 375
IoT Security: A Survey of Issues, Attacks and Defences Vinesh Kumar Jain and Jyoti Gajrani
Abstract Internet of things (IoT) is the disruptive technology used in the computer automaton. Embedded electronic hardware is connected to Internet in IoT. Connecting IoT devices to the Internet, IoT security becomes the important issue. In this paper, we explain the recent case studies of attacks on IoT devices and network along with vulnerabilities behind each attack. We enlist reasons behind these attacks along with preventive measures to make the IoT devices and network more secure. The paper aims to attract attention of research community toward IoT security. Keywords IoT · Security · Attacks · Vulnerabilities
1 Introduction An important research area in the recent times is Internet of Things (IoT) and cyberattacks exploiting IoT devices. The continuous stream of cyberattacks and data breaches has hit almost every industry irrespective of the size from large to small businesses. According to estimations, there will be approximately 75 billion devices connected to the Internet around the world by 2025 [1]. The exponential usage of IoT devices attracted many cybercriminals to exploit the vulnerabilities for various attacks. This boost of interest is due to the vulnerabilities in many IoT devices. In addition to this drawback, the striking feature is connectivity of IoT devices with each other through Internet. The botnet building attackers use IoT devices as the cyberweapons to deliver DDoS attacks. IoT devices exist in huge number all across the world. Moreover, most of these devices are easily accessible via telnet and can be easily hacked owing to the lack of security features. Attackers would prefer to build botnets using huge number of free IoT devices instead of renting expensive V. Kumar Jain (B) · J. Gajrani Engineering College Ajmer,Ajmer, India e-mail: [email protected] J. Gajrani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_18
219
220
V. Kumar Jain and J. Gajrani
resources for hosting environment. The study indicates that in near future, darknet world will make IoT botnets as driving source of their business. Manufacturers of IoT devices tend to patch of vulnerabilities very slowly. Each model of each device is a special snowflake, running inscrutable, proprietary code and making it difficult to create a commonly acceptable security scanning tool. Meanwhile, a large number of institutions and industrial environments have already struggled to prioritize patching of PC and server [2]. Therefore, finding and cataloging IoT devices and hustling to apply every update quickly becomes unmanageable. Hence, the devices remain connected to the open Internet with less oversight and few protections. A major part of the problem is that every IoT device is the black box and that is why we are unaware of what code these smart things are executing. It is all proprietary. In this paper, we explain the recent attacks leveraging the vulnerabilities of IoT devices and network. We enlist objectives behind these attacks, targets of these attacks and preventive measures. We also demonstrate proof-of-concept attack that leverage weak username and password settings of Raspberry Pi (CoAP server) [3] to transmit disguised information of sensed values by ultrasonic sensor to the CoAP client.
2 IoT Security in Nutshell We consider five major components of IoT as nutshells of IoT security.
2.1 IoT Devices IoT includes all the devices that have sensors attached to them and can transmit data from one device to another through Internet. These devices can be remotely monitored and controlled over the Internet which makes them preferable target by attackers. The examples include smart watches, pacemakers, Wi-Fi controlled coffee makers, vacuum cleaners, cars, air conditioners, etc.
2.2 Vulnerabilities The convenience provided by IoT devices in the lives of human being increases their popularity. Unfortunately, manufacturers of these devices are not much concerned about the security of IoT devices which makes “smart” devices vulnerable and thus probably dangerous. The network of IoT devices has the vast vulnerabilities such as outdated softwares/firmware, weak passwords, weak encryption in transmitters and open ports.
IoT Security: A Survey of Issues, Attacks and Defences
221
2.3 Attack Surface The IoT attack surface encompasses not only IoT devices but also associated software and network infrastructure. IoT ecosystem’s attack surface is divided broadly into four types: (1) network attack surface which includes the attacks delivered due to weak network protocols, (2) software attack surface which primarily includes attacks on Web applications, associated software, mobile applications, encryption algorithms, backend, firmware and weak passwords, (3) human attack surface which includes social engineering and trusted insiders and (4) hardware attack surface which includes firmware and hardware. Table 1 briefly elaborates major attacks along with their objectives.
2.4 Operating System There are large number of operating systems (commercial and open-source) used by IoT devices. These are designed keeping the resource constraints such as memory, processing capability and battery in mind. These operating systems have various features like key communication and networking protocols but there exist some implementation and protocol flaws. These flaws arise the chances of various kinds of attacks including denial of service, fragmentation attacks, black hole attack, etc. We summarize various operating systems used in IoT devices along with observed vulnerability in each OS in Table 2.
Table 1 IoT attacks with objectives Attack type Objective DoS DDoS Jamming (Fake signal)
Spoofing MITM attack Software attacks
Disrupt legitimate request to IoT devices by flooding thousands of fake requests Disrupt legitimate request to IoT devices by flooding thousands of fake requests originated from distributed network Interrupt the ongoing radio transmissions of IoT devices and further deplete their bandwidth, energy, central processing units (CPUs) and memory resources of IoT devices Impersonate like legal IoT devices to feed false data into IoT network Secretly eavesdrop and then alter the private communication between IoT devices Privacy leakage, economic loss, power depletion and network performance degradation of IoT systems targeting software flaws
222
V. Kumar Jain and J. Gajrani
Table 2 IoT operating systems along with vulnerabilities IoT OS Vulnerability Windows 10 for IoT Core [4] Windriver VxWorks [5] Nucleus RTOS [6] Huawei LiteOS [7] Ostro Linux [8] Raspbian [9] Tizen [10] uClinux [11] Arm Mbed [12] Contiki [13] FreeRTOS [14] TinyOS [15]
Privilege escalation vulnerability DoS exec code overflow open ports for HTTP, SSH, and telnet services Freedom of connectivity and enabled HTTP, FTP services Heap-based buffer overflow in the high fidelity drive Privilege escalation Weak SSH host keys Enabled root, system and shell privileges Stored credentials, enabled code rewriting Buffer overflow http_state structure is not deallocated properly Memory corruption No default validation of acquisition and transmission nodes
2.5 Security Algorithms IoT devices being resource-constrained devices require cryptographic operations with limited energy consumption. The lightweight symmetric key algorithms allow lower energy consumption in the end devices.
3 Attacks: Case Studies This section lists various case studies of attacks belonging to IoT devices, network and protocols.
3.1 One-Click Attack • Timeline: June 2020. • Target: Amazon’s Alexa devices. • Vulnerability Targeted: Amazon’s Alexa virtual assistant platform with cross-site scripting (XSS) flaw and cross-origin resource sharing (CORS) misconfiguration. • Objective: Access users banking data history or home addresses. • Attack Type: Ransomware Attack.
IoT Security: A Survey of Issues, Attacks and Defences
223
• Details: Cybersecurity researchers revealed a severe security bug in Amazon’s Alexa virtual assistant that makes it vulnerable to a number of malicious attacks. According to a report by Check Point Research, exploits have allowed an attacker to remove/install skills on the targeted victim’s Alexa account, access their voice history and collect personal information through skill interaction when the user invokes the installed skill. As smart speakers and virtual assistants are found everywhere, it is easy to miss how much personal data these devices hold and their role in controlling other smart devices in homes [16–19]. • Prevention: Prevent cross-site request.
3.2 BIAS Bluetooth Attack • Timeline: May 2020. • Target: Smartphones, laptops and smart home devices. • Vulnerability Targeted: Lack of integrity protection, encryption and mutual authentication. • Objective: Stealing user data. • Attack Type: Interception attack. • Details: BIAS attack is the first attack capable of bypassing Bluetooth’s authentication procedure during secure connection establishment. BIAS attacks allow to impersonate Bluetooth master and slave devices and establish secure connections without knowing the long-term key shared between the victim and the impersonated device. This attacks exploit several flaws that are identified in the Bluetooth standard, such as lack of integrity protection, encryption and mutual authentication. This attack is stealthy as no messages are shown to the end user [20]. • Prevention: Use of secure authentication method requires updating the Bluetooth core specification, checking for encryption type to avoid a downgrade of secure connections to legacy encryption.
3.3 Silex Malware • Timeline: June 2019. • Target: IoT devices. • Vulnerability Targeted: Default credentials of IoT devices and weak firewall rules. • Objective: Ruins smart devices by gaining access to and destroying a device’s storage, eliminating its firewall and removing its network configuration. • Attack Type: Ransomware attack. • Details: A 14-year-old hacker used a new strain of malware to brick up to 4,000 insecure IoT devices before abruptly shutting down his command-and-control server. Silex is built to trash the IoT devices, to break the firewall, remove the
224
V. Kumar Jain and J. Gajrani
network configuration and also make the system disjoint. It is using any guessable passwords to damage the device and writing any random stuff to storage. Also, this malware is deleting network configuration and also the firewall. It is all deleting the table entries by adding one which removes all the connections and then reboots the device. The operating system which this attack mainly focuses is Unix or Linux. This malware has Bash shell version available to download which will target any application running on Unix [21]. • Prevention: Change the default passwords.
3.4 Smart Deadbolts • Timeline: June 2019. • Target: Smart homes. • Vulnerability Targeted: Insecure communication between lock & mobile app and insecure storage in mobile app. • Objective: Present a physical risk to the people and property. • Attack Type: Interception attack. • Details: Smart deadbolt attack targets the vulnerabilities of smart locks. Smart locks are devices that make door opening more convenient by allowing unlocking doors through mobile apps. But these locks have major security issues as they can be easily hacked by interception of network traffic between the mobile apps and smart locks, thus stealing the key of someone’s house. Unfortunately, lock’s design makes the bypassing mechanisms to easily eavesdrop on message exchanged between lock and app fairly easy for attackers, leaving it vulnerable for even relatively simple attacks [22–24]. • Prevention: Buy smart locks from reputable companies. Make sure the smart lock uses two-factor authentication. Make sure that smart lock provides longer passwords. Also, keep software and applications up to date.
3.5 Satori Attack • Timeline: December 2017. • Target: D-Link DSL-2750B routers, Huawei’s HG532e Home Gateway. • Vulnerability Targeted: Remote code execution vulnerability that resides in the miniigd daemon of the Realtek SDK Universal Plug and Play (UPnP) SOAP interface, and the second is a remote code execution vulnerability in the Huawei HG532e home router. • Objective: To perform UDP Flood, SYN Flood, TCP_ACK Flood, GRE Flood attacks on vulnerable IoT devices. • Attack Type: DDoS Attack.
IoT Security: A Survey of Issues, Attacks and Defences
225
• Details: Satori’s move was toward classic zero-day attack against unknown and unpatched vulnerabilities. It was engaged in various bot activities like flooding targets with manually crafted UDP or TCP packets. This bot was having a wormlike functionality and performed scanning activity itself rather than relying upon a separate loader or scanner to infect devices [25]. • Prevention: A threat intelligence service that monitors active threats and can provide actionable information in real time.
3.6 Crypto-Mining Attack • • • •
Timeline: December 2017. Target: Businesses and organizations globally, bitcoin and other cryptocurrencies. Vulnerability Targeted: Remote code execution. Objective: Computing power of all the gadgets can be used to mine the digital coin Monero [26]. • Attack Type: Cross-site scripting (XSS) attack. • Details: Mining is the method of verifying transactions of a cryptocurrency network by solving complex mathematical problems with high computational capable computers. Bitcoin is a type of cryptocurrency that is very hard to mine without having a high computation power, but another type of cryptocurrency called Monero can be mined with a network of connected devices. A theoretical real-world attack would begin with hackers taking over a network of devices. They would use the combined computing power of those devices to then mine some Monero. While the $1,000 might not sound like a lot of profit, the potential is huge because by the year 2020, there will be more than 20 billion IoT devices, according to a forecast by research firm Gartner [27]. • Prevention: Patch all systems and applications.
3.7 Hajime Attack • • • • • •
Timeline: February 2017. Target: Devices with weak (factory set credentials). Vulnerability Targeted: Poor network and credential management. Objective: To create a peer-to-peer botnet network with unknown end goal. Attack Type: Brute-force attack. Details: Hajime was built on a peer-to-peer network in which the controller pushes command messages to the peer network and the messages propagates to all the peers over time. The attackers could open a shell script to infected machine in Hajime had built a compromised network consisting of 300,000 compromised devices with the majority of infections in countries Brazil, Iran, Thailand and the
226
V. Kumar Jain and J. Gajrani
Russian Federation. Hajime also blocked access to ports 23, 7547, 5555 and 5358 which were entry points for the rival Mirai worm [28]. • Prevention: Modification of default privacy and security settings of IoT devices.
3.8 Cold Attack in Finland • • • •
Timeline: November 2016. Target: Two apartment buildings in the city of Lappeenranta in Finland. Vulnerability Targeted: No security measures placed in network. Objective: To direct high traffic to the computer system that controls the heating of apartments to make it non-functional. • Attack Type: DDoS attack. • Details: Cybercriminals were able to halt the heating of two buildings in the city Lappeenranta at Finland. The attack managed to cause the heating controllers to continually reboot the system due to massive amount of traffic to halt heating process. As the temperatures in Finland dip below freezing at that time of year, this attack could damage life of human beings living in buildings [29]. • Prevention: Protection of IoT devices and network connecting IoT devices to Internet.
3.9 Reaper Attack • Timeline: October 2016. • Target: IoT devices, namely routers and wireless IP cameras, manufactured by companies including TP-Link, Avtech, MikroTik, Linksys, Synology and GoAhead. • Vulnerability Targeted: Insecure devices with known security flaws in the code. • Objective: To compromise whole networks of IoT devices by making their service unavailable. • Attack Type: DDoS Attack. • Details: Reaper borrowed some code from the Mirai botnet, but it was more dangerous than Mirai. Instead of trying to crack weak passwords as in Mirai, Reaper looks for insecure devices. Reaper was designed and implemented using a flexible Lua engine and scripts. This makes reaper to update its code easily on the fly in contrast to static pre-programmed attacks. This allows reaper to exploit huge number of IoT bots to run new attacks as soon as bots become available [30]. • Prevention: Update devices and patch all security flaws.
IoT Security: A Survey of Issues, Attacks and Defences
227
3.10 A Rube Goldberg Attack • Timeline: July 2017. • Target: Unpatched Axis M3004-V network security cameras. • Vulnerability Targeted: Slow patching of vulnerabilities in each model of each device. • Objective: Targeting a security camera. • Attack Type: DoS attack. • Details: A attack of an IP camera can give the hacker complete access of the video inside the building of any organization. The attackers may keep watch of activities like employee’s access area, security codes of employees, schedules of security officers and many more. In this attack, attackers could jump from one IoT device to another without moving through PCs and servers, making the path even difficult to discover. In this attack, one vulnerable IoT device can create complete network disruption. • Prevention: Make secure remote configuration services and ports.
3.11 Linux.ProxyM Attack • Timeline: February 2017. • Target: IoT devices running on various architectures, such as x86, MIPS, MIPSEL, PowerPC, ARM, Superh, Motorola 68000 and SPARC. • Vulnerability Targeted: Default login credentials. • Objective: Anonymously perform destructive actions like sending Email spams, performing phishing attacks through proxy server. • Attack Type: Spam and phishing attacks. • Details: The Trojan functionality was to launch a SOCKS proxy server on infected devices which give the attackers leverage on proxy from there being able to perform nefarious operations while hiding their tracks. It can target any Linux device, including routers, set-top boxes and other similar equipment. Linux.ProxyM created a proxy network through SOCKS proxy server on infected devices. The proxy network was used to relay malicious traffic, disguising the real source of traffic. Cybersecurity team observed 20,000 attacks launched by this botnet [31]. • Prevention: Securing the network.
228
V. Kumar Jain and J. Gajrani
3.12 Sonic Cyberattack • Timeline: 2016–17. • Target: Critical sensors used in a broad array of technologies including smartphones, automobiles, medical devices and the Internet of things for taking various decisions • Vulnerability Targeted: Circuit-level security flaws in MEMS accelerometer. • Objective: To deceive by sending signals through MEMS accelerometer that never occurred. • Attack Type: Acoustic attack. • Details: The attack is discovered by University of Michigan engineering students. The team used precisely tuned acoustic tones/sound waves to deceive 15 different models of accelerometers including Samsung Galaxy s5 into registering movement that never occurred [32]. • Prevention: The better inbuilt security in sensors will help in reducing attacks.
3.13 Mirai Attack • Timeline: September 2016. • Target: Blog Web sites like KrebsonSecurity,1 Twitter,2 Netflix,3 Amazon4 etc. • Vulnerability Targeted: Weak or default user name and password in various IoT devices such as printers, routers, sensors and IoT cameras. Devices with open inbound telnet access on ports TCP/23, TCP/2323, TCP/103. • Objective: To make the sites unresponsive by overwhelming the Web servers of target Web sites • Attack Type: DDoS attack. • Details: The 2.5 million IoT vulnerable devices were cracked by attack scripts and became botnets to be controlled by command and control (C&C) server with hardcoded address. These botnets are then used to launch distributed denial-of-service (DDoS) attack against targeted Web sites. Mirai takes advantage of default user name and password combinations to brute-force its way into unsecured devices with open telnet ports. Mirai infected devices of 164 countries with Vietnam taking the top spot at 12.8%. The other major affected devices are part of countries, namely Brazil, USA, China, Mexico, South Korea, Taiwan, Russia and Romania. The meantime to compromise a vulnerable IoT device is 10 min or even less [33]. • Prevention: The proactive steps would be to shield access through TCP/23, TCP/2323 and TCP/103 ports. 1 https://krebsonsecurity.com/. 2 https://blog.twitter.com/. 3 https://medium.com/netflix-techblog. 4 https://blog.aboutamazon.com/.
IoT Security: A Survey of Issues, Attacks and Defences
229
3.14 Stuxnet Attack • • • • • •
Timeline: 2014. Target: Programmable logic controllers (PLCs). Vulnerability Targeted: Windows operating system vulnerability. Objective: Uranium enrichment facilities in Iran. Attack Type: Cyberattack. Details: Although Stuxnet was not pure IoT attack as it was targeted to PLCs which are not actual smart devices, it serves as basis for IoT-based attacks. It was installed inside the computers running on windows OS through an infected USB device and had ability to control over thousands of factory assembly lines and centrifuges [34]. • Prevention: All the devices whether electronic or IoT which have some mission critical application must not be attached to wired LAN until it is absolutely necessary. There should be proper safeguards and preventive measures to avoid access of unknown persons. The passwords of IoT devices must be updated on regular basis. Table 3 shows the summary of attacks on IoT network and devices as discussed aforementioned.
4 Inference and Futuristic Solutions IoT devices have diverse characteristics in hardware, software and protocols due to variety of IoT applications which increases the attack surface. Therefore, it is challenging task for vendors to design strong security solution. Attackers leverage the vulnerabilities like default password, weak encryption, third party libraries, weak firmware, weak software implementation, open connectivity, etc., to perform various attacks. DoS, DDoS, buffer overflow, XSS, spoofing, poisoning, SQL injection, privilege escalation, social engineering and many more methods are used to disturb the proper functioning of IoT devices. In Sect. 3, we present the study of recent cyberattacks in IoT networks. We observe that most of the attacks leverage the security weaknesses of IoT devices. Resource-constraint nature of IoT devices motivates the researchers to design lightweight security solutions. Large-size encryption keys require high computational power and small encryption keys are easy to guess for attackers. Optimizing the size of encryption key to fit in low computational power environment is a key issue in designing the strong security solution. In the view of operating system, again resource-constraint nature is impedance to implement full-fledged security features that to avoid compromise of the IoT devices. Unlike desktop and mobile phones where user is frequently interacting with device through user interface (UI), IoT devices have less human interaction which forbids continuous logging and monitoring of IoT devices.
July, 2017
Feb, 2017
Rube goldberg
Hajime attack [28]
halt heating process apartment buildings in Finland
Nov, 2016
Oct,2016
Cold attack [29]
Reaper [30]
Programmable logic controllers (PLCs)
Stuxnet [34]
2014
IoT devices such as printers, routers, sensors, cameras
Mirai attack [33] Sept, 2016
Routers and wireless IP cameras
Smartphones, automobiles, medical devices
Sonic cyber [32] 2016-17
Devices with weak factory set credentials
Unpatched axis M3004-V network and cameras
Businesses and organizations
Dec, 2017
Crypto-mining attack [27]
IoT device architectures, such as x86, MIPS,MIPSEL, PowerPC, ARM, Superh
Smart homes
IoT devices
Smartphone, smart homes or laptop
Routers and home gateway
Feb, 2018
Linux.ProxyM [31]
Targets
Amazon alexa devices
Satori attack [25] Dec,2017
June, 2019
June, 2019
BIAS Bluetooth attack [20]
Smart deadbolts [22–24]
May, 2020
One-click attack [17, 19]
Silex [21]
Timeline
June, 2020
Attack
Table 3 Statistics of attacks on IoT networks
Windows operating system vulnerability
Telnet access on open ports
Security flaws in the code
No network security
Security flaws in MEMS accelerometer
Poor network and credential management
Inveterate IoT bug
Remote code execution
Remote code execution vulnerability
Default login credentials
Insecure communication
Weak firewall rules
Bluetooth impersonation attack
XSS flaws
Vulnerability targeted
Objective
Uranium enrichment facilities in Iran
Sites unresponsive by overwhelming the Web servers of target Web sites
Making service unavailability of IoT devices
Make the overwhelmed computing of resource by packet flooding that control heating
To deceive by sending signals through MEMS accelerometer that never occurred
To create a peer-to-peer botnet network
Targeting a security camera
Stealing computing power of all the gadgets
Halt the proper function of IoT devices
Sending Email spams, performing phishing attacks through proxy server
Physical risk to the people and property
Destroy storage of IoT devices
Steal personal data
Banking data history or home addresses
Attack type
Worm (cyberattack)
DDoS attack
DDoS attack
DDoS attack
Acoustic attack
Brute-force attack
DoS attack
XSS attack
DDoS attack
Phishing attacks
Interception
Ransomware
Interception attack
Ransomware
Prevention
Maintaining air gap
shield access through TCP/23,TCP/2323, and TCP/103 ports
Update devices and patch all security flaws
Protection of IoT devices
Improving inbuilt security in sensors
Modification of default privacy and security settings
Make more secure remote configuration services
Patch all systems and applications
Security update services
Securing the network
2-Factor authentication
Change the default passwords
Secure authentication method
Prevent cross-site request
230 V. Kumar Jain and J. Gajrani
IoT Security: A Survey of Issues, Attacks and Defences
231
5 Reverse Shell Attack We develop and execute an attack to showcase the seriousness of the problem. The attack leverages update from untrusted sources by Raspberry Pi device.
5.1 Scenario and Vulnerabilities Targeted Figure 1 shows the attack scenario of waste management system designed to demonstrate the attack. The system is composed of ultrasonic sensor, Raspberry Pi with CoAP server configured on waste collection bin, Wi-Fi access point and a system with CoAP client. Ultrasonic sensor senses the status of bin and notifies to Raspberry on overflow of bin which sends it to CoAP client through Wi-Fi access point. The CoAP client further directs the garbage collection center to collect the garbage. Here, the vulnerability which are exploited in our attack is that Raspberry Pi accepts the software update through untrusted server where the server is under control of attacker. Figure 1 depicts the attack scenario where instead of forwarding correct data to CoAP client (overflow of waste bin), the adversary performs modification of bin data with wrong information (underflow of waste bin). Therefore, the CoAP client will not be aware of overflow of bin.
Fig. 1 Attack scenario
232
V. Kumar Jain and J. Gajrani
5.2 Attack Technique and Work Flow The aim of attack is to access the shell of victim IoT device by using vulnerability of update from untrusted sources. The Kali system running Apache server with httpd service is used for attacking victim IoT. The attack is performed in the following steps: [Step-1] Payload Generation We generate payload for victim IoT device having Raspbian OS using msfvenom tool in Kali linux. The payload is constructed in such a way to establish a reverse connection from victim to our Kali machine (server). Listing 1.1 shows the snippet of script for generating payload.
This creates an Executable and Linkable Format (ELF) file on host machine. [Step-2] Listing Reverse Connection Listing 1.2 shows the snippet of script for establishing connection from victim at host machine. The script is running on Kali machine for listing reverse connection from victim.
Step-3 Payload Delivery The victim is lured to visit the attacker’s server. As soon as victim visits server, a popup is displayed on victim machine for showing download of update file. When the victim executes the update file, a reverse connection is established to Kali machine. Thus, we get the control of victim machine. We modified the files such that instead of forwarding correct data to CoAP client (overflow of waste bin), it forwards wrong information (underflow of waste bin). Therefore, the CoAP client is aware of overflow of bin.
5.3 Attack Analysis and Defense Discussion The reverse shell attack is successful when the victim IoT devices will not validate the source of update or download. The payloads of reverse shell programs are delivered to victims in the similar way as other malicious codes. The attackers trick the victims for opening email attachments, downloads or update of software which secretly execute the reverse shell’s payload. Many victims are also tricked by social engineering ways
IoT Security: A Survey of Issues, Attacks and Defences
233
for clicking on malicious HTML links which also installs the reverse shell’s payload. To prevent the attack, it should be ascertained by users that updates and downloads must be done from trusted sources. Also, email attachments and hyperlinks from untrusted sources must not be clicked.
6 Related Work Authors [35] have shown the fundamentals of IoT security. Authors have identified attack surface in four components, i.e., (i) physical object (ii) protocols across IoT stack (iii) data and (iv) software. This work also defines security goals, i.e., confidentiality, integrity and availability. Object will be secure if it fulfills these security goals. If any of these goals are compromised, it leads to attack. The authors classified the attacks based on violation of security goals. National Institute of Standards and Technology (NIST) [36] defines the standards to assess the security of network devices. Open Web Application Security Project (OWASP) [37] aims to improve security of the software. The authors in [38] used NIST and OWASP standards to assess the security issues of Raspberry Pi. Raspberry Pi was found susceptible to seven out of 10 worst vulnerabilities (injection, broken authentication and session management, sensitive data exposure, XML external entity, broken access control, security misconfiguration, cross-site scripting, insecure deserialization, using components with known vulnerabilities, insufficient logging and monitoring) of OWASP during risk assessment. The aim of authors was to enlist the major security issues belonging to various vulnerabilities of Raspberry Pi based on risk assessment. The authors hypothesized the existence of many risks in Raspberry Pi for risk assessment. The risk assessment obtained 53 vulnerabilities; out of these, 31 were at low risk, 13 at moderate risk, and 9 were at a high risk rating. In conclusion, the authors accepted null hypothesis which indicates that Raspberry Pi has lower chance of compromise. The authors in [39] proposed that applying diversification of various internal interfaces and software layers of operating systems in the context of IoT makes interfaces unique. This prevents an adversary from exploiting a large number of devices with the same exploit. A successful attack in this scenario would require the attacker to design multiple versions of the exploit, which is time-consuming and costly. Thus, diversification provides a good protection mechanism which is suitable to the IoT devices having limited capacity and resources. However, the authors analyzed that diversification is not well applicable for all IoT operating systems. As some operating systems lack the interfaces on which diversification is usually applied. The authors concluded that diversification can be comprehensively applied on all devices using Linux and Google’s Brillo.5 Protocols like CoAP also provide features for implementing diversification.
5 https://www.theverge.com/2015/5/28/8677119/google-project-brillo-iot-google-io-2015.
234
V. Kumar Jain and J. Gajrani
Authors in [40] analyzed various security solutions for IoT devices and proposed some improved techniques for further analysis. Authors in [41] presented a security analysis of Arduino Yun and show that it is vulnerable to a number of attacks. A proof-of-concept attack is demonstrated exploiting Arduino Yun vulnerabilities. Authors analyzed the Linux environment from an external attacker’s point of view using two popular open-source tools, Nmap [42] and Nessus [43]. The Arduino IDE compiler (avr-gcc version 4.8.1) does not validate any security during compilation. Therefore, it does not issue any error when a program runs out of memory or a segment violation error occurs. Authors tried to exploit out-of-memory errors and developed ATmega32u4 Heap Buffer Overflow and ATmega32u4 Stack Buffer Overflow attacks. Authors in [44] proposed an approach which leverages edge computing to deploy edge functions that gather information about incoming traffic and communicate that information via a fast path with a nearby detection service. This accelerates the detection of attacks and thus limits their harmful impact. The preliminary investigations have shown ten times faster detection which reduces the Internet traffic owing to IoT DDoS attacks up to 82%. Authors proposed ShadowNet idea that accelerated detection resulting in response to the attack before it reaches the target. ShadowNet represents a significant contribution toward defense against IoT DDoS attacks. Authors in [45] demonstrated two Web-based attacks against local IoT devices using malicious Web page and third-party script. These attacks can be performed even when the devices are behind network address translation (NATs). The reason behind these attacks was IoT devices connected to the local network having open HTTP ports. Authors showed that malicious scripts exploiting error messages were circumventing the same-origin policy on an HTML5 interface. Authors demonstrated that the attacker can gather sensitive information from the IoT devices such as unique device identifiers and location, track the owners and control the devices by playing random videos and rebooting the devices. Authors also proposed promising countermeasures to the proposed attacks that can be used by users, browsers, DNS providers and IoT vendors. In IP-based IoT device detection [46], authors used Internet traffic with awareness of about servers operating by IoT manufactures that IoT devices contact to detect IoT devices. Authors claim that IP-based detection method detects both public and private IP-based IoT devices. In detection of private IP-based devices, IoT users’ privacy is maintained and extracts minimal information. IP-based IoT detection approach is developed with ten device models by seven vendors and controlled experiments. This research also contributes to detect real-world IoT devices using analysis of traffic flow between the intranet and Internet service provided. IP-based IoT device detection helps in understanding the risks of DDoS attacks.
IoT Security: A Survey of Issues, Attacks and Defences
235
7 Conclusion Regardless of the profits offered by IoT, a major drawback is that many of these are widely open to attacks. Various case studies conclude that most of the IoT attacks are the DDoS attacks which utilize large number of available IoT devices as botnets. These attacks leverage the vulnerabilities of poor network authentication and weak encryption algorithm. We suggest that strong network authentication and credentials in IoT devices alleviate the frequency of attacks. Heterogeneity of IoT applications creates the rift in designing strong and common solution. A multi-dimensional effort involving the security of all ecosystem stakeholders such as protocols, security algorithms and network components is necessary in order to alleviate the risks of future attacks. The prime step is to design IoT products with inbuilt security. But, far sides of IoT security devise the opportunity for the researchers to develop strong security solutions keeping viewpoint of IoT constraints. Acknowledgements This study has been carried out with support from the RTU, CRS Project under TEQIP-III India.
References 1. Amazon alexa ‘one-click’ attack can steal pii and voice recordings. https://securereading.com/ amazon-alexa-one-click-attack/. Accessed: August, 2020 2. Amazon alexa ‘one-click’ attack can divulge personal data. https://threatpost.com/amazonalexa-one-click-attack-can-divulge-personal-data/158297. Accessed: August, 2020 3. Arm mbed os. https://www.mbed.com/en/platform/mbed-os/ 4. Contiki. http://www.contiki-os.org/ 5. An elaborate hack shows how much damage iot bugs can do. https://www.wired.com/story/ elaborate-hack-shows-damage-iot-bugs-can-do/ 6. The freertos. https://www.freertos.org/ 7. Hack of high-end hotel smart locks shows iot security fail. https://threatpost.com/hack-ofhigh-end-hotel-smart-locks-shows-iot-security-fail/147178/. Accessed: August, 2019 8. Huawei liteos: Concept and value. https://developer.huawei.com/ict/en/site-iot/article/liteosoverview 9. Internet of things (IoT) connected devices installed base worldwide from 2015 to 2025 (in billions). https://www.statista.com/statistics/471264/iot-number-of-connected-devicesworldwide/. Accessed: November 2016 10. Keeping the gate locked on your iot devices: vulnerabilities found on amazon’s alexa. https:// research.checkpoint.com/2020/amazons-alexa-hacked/. Accessed: August, 2020 11. monero. https://getmonero.org/ 12. The national institute of standards and technology (nist). https://csrc.nist.gov/ 13. Nessus. https://www.tenable.com/products/nessus/nessus-professional 14. nmap. https://nmap.org/ 15. Nucleus rtos. https://www.mentor.com/embedded-software/nucleus/ 16. Open web application security project. https://www.owasp.org/index.php/Main_Page 17. Ostro. https://ostroproject.org/ 18. R7-2019-18: Multiple hickory smart lock vulnerabilities. https://blog.rapid7.com/2019/08/01/ r7-2019-18-multiple-hickory-smart-lock-vulnerabilities/. Accessed: August, 2019
236
V. Kumar Jain and J. Gajrani
19. Rizvi S, Orr R, Cox A, Ashokkumar P, Rizvi MR (2020) Identifying the attack surface for IoT network. Internet of Things 9:100162 20. Tiny os. www.tinyos.net/ 21. Tizen. https://www.tizen.org/ 22. uclinux. http://www.uclinux.org/ 23. Windows 10. https://www.microsoft.com/en-in/software-download/windows10ISO 24. Windriver vxworks. https://www.windriver.com/products/vxworks/ 25. Chadd A (2018) Ddos attacks: past, present and future. Network Secur 2018(7):13–15 26. Acar G, Huang DY, Li F, Narayanan A, Feamster N (2018) Web-based attacks to discover and control local iot devices. In: Proceedings of the 2018 workshop on IoT security and privacy. ACM, pp 29–35 27. Sigler K (2018) Crypto-jacking: how cyber-criminals are exploiting the crypto-currency boom. Comput Fraud Secur 2018(9):12–14 28. Angrishi K (2017) Turning internet of things (IoT) into internet of vulnerabilities (iov): Iot botnets. arXiv preprint arXiv:1702.03681 29. Antonioli D, Tippenhauer NO, Rasmussen K (2020) Bias: bluetooth impersonation attacks. In: Proceedings of the IEEE symposium on security and privacy (S&P) 30. Vlajic N, Zhou D (2018) Iot as a land of opportunity for ddos hackers. Computer 51(7):26–34 31. Bormann C, Ersue M, Keranen A (2014) Erminology for constrained-node networks. Internet Engineering Task Force (IETF): Fremont, CA, USA, pp 2070–1721 32. Guri M, Elovici Y (2018) Bridgeware: the air-gap malware. Commun ACM 61(4):74–82 33. Kolias C, Kambourakis G, Stavrou A, Voas J (2017) Ddos in the IoT: Mirai and other botnets. Computer 50(7):80–84 34. Costin A, Zaddach J (2015) Iot malware: comprehensive survey, analysis framework and case studies 35. Edwards S, Profetis I (2016) Hajime: analysis of a decentralized internet worm for iot devices. Rapidity Networks 16 36. Guo H, Heidemann J (2018) Ip-based iot device detection 37. Guri M, Elovici Y (2018) Bridgeware: the air-gap malware. Communications of the ACM 61(4):74–82 38. Williams MG (2015) A risk assessment on raspberry pi using NIST standards. Int J Comput Sci Network Sec (IJCSNS) 15(6):22 39. Koivunen L, Rauti S, Leppänen V (2016) Applying internal interface diversification to iot operating systems. In: 2016 International conference on software security and assurance (ICSSA). IEEE, pp 1–5 40. Kolias C, Kambourakis G, Stavrou A, Voas J (2017) Ddos in the IoT: Mirai and other botnets. Computer 50(7):80–84 41. Lounis K, Zulkernine M (2019) Bluetooth low energy makes “just works” not work. In: 2019 3rd Cyber Security in Networking Conference (CSNet). IEEE, pp 99–106 42. Rizvi S, Orr R, Cox A, Ashokkumar P, Rizvi MR (2020) Identifying the attack surface for IoT network. Internet of Things 9:100162 43. Sadeghi A-R, Wachsmann C, Waidner M (2015) Security and privacy challenges in industrial internet of things. In: 2015 52nd ACM/EDAC/IEEE Design automation conference (DAC). IEEE, pp 1–6 44. Sigler K (2018) Crypto-jacking: how cyber-criminals are exploiting the crypto-currency boom. Comput Fraud Secur 2018(9):12–14 45. Vlajic N, Zhou D (2018) IoT as a land of opportunity for ddos hackers. Computer 51(7):26–34 46. Williams MG (2015) A risk assessment on raspberry pi using NIST standards. Int J Comput Sci Network Secur (IJCSNS) 15(6):22
Detecting the Nuclei in Different Pictures Using Region Convolutional Neural Networks Naiswita Parmar
Abstract Recognizing the cells’ nuclei at the initial stage is the most crucial process for most examinations. Reason for this being the fact that a human body comprises about 30 trillion cells and each of them encompasses a nucleus, which is brimming with DNA, the hereditary code that programs every cell. Recognizing nuclei enables researchers to distinguish every individual cell under investigation, and by estimating how cells respond to different medicines, the scientist can comprehend the fundamental biological procedures at work. Envision accelerating research for pretty much every infection, from lung malignant growth and coronary illness to uncommon disorders, by automating the detection of nuclei this vision can be achieved and we can speed up the entire process. We propose to use image segmentation technique to achieve this using Region convolution neural network algorithm. We will be improving this algorithm specifically for detecting microscopic nuclei in images which varies in size and modality. Our dataset contains large number of nuclei images which were gained under varied conditions such as different imaging methodology (brightfield versus fluorescence), type of cells and magnification level. Training data is selected in a way that maximizes algorithms ability to detect nuclei from these generalized images. Keywords ROI—Region of Interest · Bioscience · IHC—Immunohistochemistry · Cell segmentation · Image segmentation · RCNN—Region Convolutional Neural Network
1 Introduction Microscopy pictures and computerized medical science undertake a huge job in basic leadership for disease analysis, broad information can be provided by it, for computer aided diagnoses, which helps in achieving high throughput dispensation rate by empowering quantitative investigation of digital images. These days programmed N. Parmar (B) Indus University, Ahmedabad, Gujarat 382115, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_19
237
238
N. Parmar
computerized medical science including picture examination, which can extraordinarily profit pathologists and patients, has pulled in numerous considerations in both experimental and research-oriented practice [1, 2]. In correlation with manual evaluation that is work concentrated and tedious, modernized methodologies [3–6] give quicker and reproducible picture investigation with the end goal that the essential science specialists and clinician researchers can be discharged from exhausting and rehashed routine endeavors. Even more critically, the complex nature of microscopy and medical science pictures presents critical difficulties for manual picture observation, which may prompt to vast differences between eye witness varieties [7]; then again, the precise portrayal of illness and the inclination can be decreased significantly by computeraided diagnosis [8]. Also, it permits customized medications that can fundamentally profit the patients. All together to deal with huge scale picture datasets, for highthroughput medical science picture examination, framework processing [9–11] and computationally adaptable calculations [12–14] are accounted. Another favorable position of computerized strategies is that they can without much of stretch give reproducible and thorough estimations of significant picture highlights, which will be utilized with clinical follow-ups, and accordingly permits similar examination and potential visualization and customized prescription. In computer-aided diagnosis, basic essential is cell or nucleus identification and dismemberment, which is generally considered as the premise of mechanized picture research. It gives provisions to dissimilar quantitative researches well as determining cell structure, for example, shape, surface, measure, and other various parameters related to picture. Be that as it may, it is hard to accomplish sturdy and meticulous cell or nucleus dismemberment. To begin with, medical science, particularly cellular pathology, and microscopy pictures frequently demonstration background mess with numerous commotions, antiques (e.g., obscured locales) presented amid picture accession, and possible poor disparity between the background and the frontal area. Other issue, there exist critical minor disparity from cell or nucleus shape, size, and inside or within the cell intensity diversity. Lastly, cells or nuclei are frequently bunched into clusters, so they may in part cover with one another. Numerous endeavors have been made to accomplish computerized cell or nucleus recognition and division, expecting to handle a few or these difficulties. Right now, there are a few surveys on automated medical science picture examination. The research in [15] abridges the computer-aided diagnosis framework innovations for cellular pathology picture examination, which covers organ and nucleus dismemberment, preliminary processing of data (shading and light standardization), extracting features, and classifying. Some other computer-aided diagnosis frameworks on cellular pathological pictures are mentioned in [16, 17]. A particular survey in [18] introduces investigation on cellular images for breast cancer, which also covers mitosis location and expansion evaluation. A technique mentioned in [19] has given a more extensive diagram of a work process in computational pathology, which comprises of three segments: ground truth creation and picture information securing, picture research including object identification, dismemberment and identification, and restorative insights regarding survival research. A method introduced a survey
Detecting the Nuclei in Different Pictures …
239
on cellular pathological entire slide imaging informatics strategies, Kothari et al. [20], which incorporates picture quality control, prescient demonstrating, extracting feature, and representation. These distributions are not explicitly abridged for cell or nucleus recognition and dismemberment, and in this way numerous ongoing best in class identification and dismemberment calculations are not examined. As of late, a study on the techniques for core recognition, division, highlight extraction, and characterization on immunohistochemistry (IHC) and hematoxylin and eosin (H&E) recolored cellular pathology pictures, announced by Irshad et al. [21]; however numerous ongoing cell or nucleus discovery dismemberment algorithms on different sorts of recoloring pictures are yet missed. In the bioscience, the procedure of nucleus identification is normally performed by discrete clinician physically experiencing sweeps. Computerizing this procedure utilizing deep learning methods can possibly accelerate the way toward discovering solutions for ailments going from cancer to Alzheimer’s. This paper will focus on creating an effective deep learning model using region convolutional neural network algorithm.
2 Method 2.1 Dataset Image Analysis The dataset we’ll be utilizing comprises of a huge quantity of images of nucleus scan. The pictures have differing qualities, including amplification, cell type, and imaging methodology, to test the speculation capacity of the AI systems connected to the issue. Figure 1 show examples of the images used for this algorithm. Figure 2 show various dimensions of images used in this model. There are 669 pictures with 29,441 individual masks in the training set used for the model designed in this paper, and 64 pictures in the testing set. The dissemination of masks for each picture is askew to one side with a mean of 43 masks (distinct nuclei) per picture and standard deviation of 47. There are a few pictures where number of masks per picture is as high as 375, which may demonstrate hard for the system to precisely recognize the distinct nucleus. About 540 of the pictures have alike measurements. The dispersal of picture region pixels is askew to one side with standard deviation of 190,836 px2, and an average of 154,747 px2.1, 443,519 px2 is the pixel area of the image with the finest magnitudes. Having extensive assortment in the raster zone of the pictures could affect the execution of certain neural network structures. In any case, since we’re making a calculation that sums up well, it should perform well on pictures with changing pixel-region. The further introduction that the system has to a methodology of the nuclei picture, the more adroit it will be at distinguishing nucleus in a single sort of picture from another. Hence, we figured it would be imperative to take a gander at the recurrence
240
Fig. 1 Images used for RCNN model
Fig. 2 Example of image dimension distribution
N. Parmar
Detecting the Nuclei in Different Pictures …
241
of the various modalities of the pictures in our set of images which are used for the model. We made five classes dependent on the tone of the foundation and closer view of the picture. We found the accompanying clear-cut circulation of pictures: dark frontal area and white foundation (16), white closer view and dark foundation (599), purple frontal area and white foundation (41), purple foundation and purple closer view (71), purple frontal area and yellow foundation (8). We would foresee that our model would be better at perceiving cores in pictures with a white frontal area and dark foundation and more regrettable at those with a purple frontal area and yellow foundation. Figure 1 shows the example of different modalities for the pictures in our dataset. The dataset used for this model itself comprises of nuclei mask and cell pictures. The dismembered masks relate to every distinct nucleus in a picture, and each mask is just a single nucleus. These masks are just incorporated into the training set and were physically named by a group of researchers, so there exists some space for human mistake in the first comment of the nuclei. The testing set just incorporates pictures of the cells themselves.
2.2 Mask RCCN Configuration See Table 1.
2.3 Evaluation This model was evaluated using mean average precision (mAP) on intersection over union thresholds. In each picture for nuclei identification, figure the Intersection of Union measurement with each ground truth cover in the picture. Figure whether this veil fits at a scope of IoU limits. At every limit, ascertain the precision over all your predicted masks. Normal the accuracy crosswise over limits. Over the informational collection figure, the mean of the average accuracy for each picture. Intersection over Union estimates the overlay between two limits. We utilize that to quantify how much our anticipated limit overlays with the ground truth (the genuine article limit). In some datasets, we predefine an IoU limit (say 0.5) in ordering whether the expectation is a genuine positive or a bogus positive. Figure 3 shows the example of intersections and unions. From Fig. 3 precedent, we can see how the Union will dependably be more noteworthy than or equivalent to the span of the ground truth object, and the intersection will dependably be not exactly or equivalent to that of the estimate. Below is an example calculation on IoU metric for this nucleus in Fig. 3. IoU(X, Y ) = (X ∩ Y )/(X ∪ Y ) = 565/848 = 0.667
(1)
242
N. Parmar
Table 1 Configuration details Configuration
Value
Description
Backbone
‘resnet50’
A ResNet feature extractor, where early layers have ability to identify low-level features and later layer identify the details
Backbone_strides
[4, 8, 16, 32, 64]
The strides of each layer of the FPN pyramid. These values are based on a Resnet101 backbone
GPU Count
1
Number of GPUs used
Images per GPU
2
Change dependent on your GPU memory and picture sizes. Utilize the most elevated number that your GPU can deal with for best execution
Steps per epoch
301
Training steps per epoch, validation stats are calculated at each epoch so setting them at lower value can lead to consume more time. Also, tensor board updates are stored at each epoch
Validation Steps
33
Times validation steps are run at the end of each epoch. Higher number will have higher accuracy and will be time consuming
Number of Class
2
Number of classification class
RPN Anchor scales
(8, 16, 32, 64, 128) RPN—Region proposal network. Length of square anchor sides in pixels
RPN Anchor ratio
[0.5, 1, 2]
Ratio of width/height of anchors at each cell
RPN Anchor stride
1
Anchors are made for every cell in the backbone feature map
RPN NMS threshold
0.7
NMS—Non-max suppression to filter RPN proposals
RPN Train anchors per image 320
Number of anchors per image to be used for region proposal network training
Post NMS ROIS training
2048
ROI—Region of Interest. ROIs kept after non-max suppression for training
Post NMS ROIS inference
2048
ROIs kept after non-max suppression for inference
Use mini mask
True
Resizes instance masks to a reduced size to lessen memory load when enabled
Mini mask shape
(56, 56)
Height and width of mini mask
Image min dimension
512
Dimension of the image has to be greater than this value
Image max dimension
512
Dimension of the image has to be lower than this value (continued)
Detecting the Nuclei in Different Pictures …
243
Table 1 (continued) Configuration
Value
Description
Image padding
True
Pad the image with zeroes to reach proper dimension
Image color
RGB
Spectrum of color used for images
Train ROIS per image
512
The classifier head is fed this many number of ROIS per image RPN NMS THRESHOLD must be tuned according to the value in this
ROI positive ratio
0.33
Percent of positive ROIs used to train the classifier
Pool size
7
Configuration for the ROI pools
Mask pool size
14
Mask shape
[28]
Max GT instance
256
Maximum number of ground truth in one image
RPN BBOX STD DEV
[0.1, 0.1, 0.2, 0.2]
BBOX STD DEV
[0.1, 0.1, 0.2, 0.2]
RPN and final detection’s value for Bounding box refinement standard deviation
Detection max instance
400
Maximum number of final detections
Detection min confidence
0.75
This is the threshold for ROIs, below this are ignored
Detection NMS threshold
0.3
Non-max suppression threshold for the detection
Detection mask threshold
0.35
Threshold set for detecting the mask
Learning rate
0.001
Learning momentum
0.9
Optimizer
ADAM
Learning rate and the momentum set for the algorithm. As we use TensorFlow keeping value of learning rate higher than this will affect the performance of the algorithm poorly
Weight decay
0.0001
Fig. 3 Example of image dimension distribution
Value set for weight decay regularization
244
N. Parmar
We clear over a scope of IoU limits to get a vector for each veil correlation. The limit esteems run from 0.5 to 0.95 with a stage size of 0.05: (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95). As it were, at an edge of 0.5, an anticipated article is considered a ‘hit’ if its crossing point over association with a ground truth object is more prominent than 0.5.
3 Conclusion This task included first investigating the highlights of the cell filters (e.g., number of nucleus per picture, their modality), executing highlight building/division methods, lastly building profound learning models to naturally recognize what parts of each output was foundation and what were cores. One especially fascinating piece of the venture was understanding the assorted variety in the qualities of cell checks that scientists need to physically distinguish and building up a more noteworthy gratefulness for the measure of work that scientists put in to physically explaining these outputs. The mechanization of nuclei recognizable proof utilizing AI methods could spare researcher a gigantic piece of time, enabling them to concentrate their endeavors more on looking into solutions for sicknesses or other increasingly profitable errands. Figure 4 shows the example of the prediction made by the model. This model provides a precision of 0.63 in Intersection over Union image dismemberment. Amid the most recent couple of decades, many condition of expressions of the human experience have been proposed for cell or nucleus identification and
Fig. 4 Prediction
Detecting the Nuclei in Different Pictures …
245
dismemberment in advanced medical science and microscopy pictures, yet not every one of them are connected to the equivalent dataset. Rather, a considerable lot of them are assessed without anyone else datasets, also, in this manner it is hard to decide if one methodology is superior to another. Also, since various recognition and division estimations are utilized in the written works, it is not direct to quantitatively think about the present strategies. Hence, normal benchmarks of picture information are required for similar assessment of different identification and division approaches. As of now, their ways out a few open benchmarks for core/cell discovery and division approaches: ICPR 2012 mitotic cell recognition challenge [22, 23], AMIDAI13 dataset [24, 25], UCSB Bio-Segmentation benchmark [26], hand segmented U2OS/3T3 cell picture dataset [27], and ISBI 2013 cell following test [28].
References 1. Händchen V et al (2012) Observation of one-way Einstein–Podolsky–Rosen steering. Nat Photon 6(8):596 2. Rojo MG (2012) State of the art and trends for digital pathology. Stud Health Technol Inform 179:15–28 3. May M (2010) A better lens on disease. Sci Am 302(5):74–77 4. Katouzian A et al (2012) A state-of-the-art review on segmentation algorithms in intravascular ultrasound (IVUS) images. IEEE Trans Inf Technol Biomed 16(5):823–834 5. Principe JC, Brockmeier AJ (2015) Representing and decomposing neural potential signals. Curr Opin Neurobiol 31:13–17 6. Yang L et al (2013) Parallel content-based sub-image retrieval using hierarchical searching. Bioinformatics 30(7):996–1002 7. López C et al (2012) Digital image analysis in breast cancer: an example of an automated methodology and the effects of image compression. Stud Health Technol Inf 179:155–171 8. Foran DJ et al (2011) Imageminer: a software system for comparative analysis of tissue microarrays using content-based image retrieval, high-performance computing, and grid technology. J Am Med Inf Assoc 18(4):403–415 9. Yang L et al (2009) Virtual microscopy and grid-enabled decision support for large-scale analysis of imaged pathology specimens. IEEE Trans Inf Technol Biomed 13(4):636–644 10. Yang L et al (2007) High throughput analysis of breast cancer specimens on the grid. In: International conference on medical image computing and computer-assisted intervention. Springer, Berlin, Heidelberg 11. Bueno G et al (2012) Emerging trends: grid technology in pathology. Stud Health Technol Inform 179:218–229 12. Liu J et al (2016) Scalable mammogram retrieval using composite anchor graph hashing with iterative quantization. IEEE Trans Circuits Syst Video Technol 27(11):2450–2460 13. Zhang X et al (2015) Fusing heterogeneous features for the image-guided diagnosis of intraductal breast lesions. In: 2015 IEEE 12th international symposium on biomedical imaging (ISBI). IEEE 14. Zhang X et al (2014) Towards large-scale histopathological image analysis: hashing-based image retrieval. IEEE Trans Med Imag 34(2):496–506 15. Gurcan M N et al (2009) Histopathological image analysis: a review. IEEE Rev Biomed Eng 2:147 16. Demir C, Yener B (2005) Automated cancer diagnosis based on histopathological images: a systematic survey. Rensselaer Polytechnic Institute, Tech. Rep
246
N. Parmar
17. Xing F, Yang L (2016) Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: a comprehensive review. IEEE Rev Biomed Eng 9:234–263 18. Veta M et al (2014) Breast cancer histopathology image analysis: a review. IEEE Trans Biomed Eng 61(5):1400–1411 19. Fuchs TJ, Buhmann JM (2011) Computational pathology: challenges and promises for tissue analysis. Comput Med Imaging Graph 35(7–8):515–530 20. Kothari S et al (2013) Pathology imaging informatics for quantitative analysis of whole-slide images. J Am Med Inf Assoc 20(6):1099–1108 21. Irshad H et al (2013) Methods for nuclei detection, segmentation, and classification in digital histopathology: a review—current status and future potential. IEEE Rev Biomed Eng 7:97–114 22. Roux L et al (2013) Mitosis detection in breast cancer histological images An ICPR 2012 contest. J Pathol Inf 4 23. MITOS (2012) MITOS dataset. Available at https://ludo17.free.fr/mitos_2012/ 24. Veta M et al (2015) Assessment of algorithms for mitosis detection in breast cancer histopathology images. Med Image Anal 20(1):237–248 25. AMIDA (2013) MICCAI 2013 Grand Challenge. Available at https://amida13.isi.uu.nl/ 26. Gelasca ED et al (2019) A biosegmentation benchmark for evaluation of bioimage analysis methods. BMC Bioinformatics 10(1):368 27. Coelho LP, Shariff A, Murphy RF (2009) Nuclear segmentation in microscope cell images: a hand-segmented dataset and comparison of algorithms. In: 2009 IEEE international symposium on biomedical imaging: from nano to macro. IEEE 28. Maška M et al (2014) A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11):1609–1617
Chaotic Henry Gas Solubility Optimization Algorithm Nand Kishor Yadav and Mukesh Saraswat
Abstract Meta-heuristic methods have been successfully applied in many realworld optimization problems. Recently, Henry gas solubility optimization algorithm has been introduced which is based on Henry’s law. To improve its solution precision, a new chaotic Henry gas solubility optimization method has been presented in this paper. The proposed variant is validated on 47 benchmark problems of distinct modalities, i.e., unimodal, multi-modal, and fixed-dimension multi-modal, using mean fitness value and standard deviation. The experimental outcomes show the superiority of the proposed method over the existing ones. Keywords Henry gas solubility optimization · Meta-heuristic optimization · Chaotic behavior
1 Introduction In an optimization process, the best solution is explored by minimization or maximization of a fitness function according to the problem constraints. In recent decades, many optimization algorithms have been proposed to address various engineering problems. Due to slow convergence rate and extremely time exhausting mathematical calculations, scholastic optimization algorithms have been unfavorable. Moreover, dealing with extensive multi-dimensional problems, these classical algorithms needed work for several years [1]. To mitigate this, several meta-heuristic algorithms have been developed to overcome the limitations such as converging local optima, single-based solution, and unknown search space problem which existed within traditional optimization algorithms. Although meta-heuristic algorithms have N. Kishor Yadav (B) · M. Saraswat Department of Computer science and Engineering, Jaypee Institute of Information Technology, Noida, India e-mail: [email protected] M. Saraswat e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 H. Sharma et al. (eds.), Intelligent Learning for Computer Vision, Lecture Notes on Data Engineering and Communications Technologies 61, https://doi.org/10.1007/978-981-33-4582-9_20
247
248
N. Kishor Yadav and M. Saraswat
been modeled for generic problems, these have been applied in searching of the optimal solution in a complex and high-dimensional problem [2]. The excellency of these meta-heuristic algorithms over deterministic algorithm is the avoidance of local optimum, overcoming stagnation, fast convergence rate, and better solution precision [3]. Normally, meta-heuristic optimization algorithms started with an initial set of solutions, known as the population, and then attaining the global optimal solution of the fitness function [4]. To reach the optimal solution, each meta-heuristic algorithm includes two specific phases, namely exploration phase and exploitation phase. In the exploration phase, different promising regions are searched for better solution space coverage, while exploitation ensures the algorithm in searching the local region for the optimal solution [3]. In general, all the meta-heuristic algorithms try to find the balance between these two phases, i.e., exploration phase and exploitation phase. In the literature, era of meta-heuristic algorithms is started with genetic algorithm (GA), developed by Holland et al. [5, 6]. Further, Kirkpatrick et al. [7] developed the simulated annealing optimization technique. Selim et al. [8] proposed basic simulated annealing optimization. Moreover, Storn et al. [9] proposed the differential evolution (DE) algorithm based on stochastic search methods. Further, it is perfectly applied by Rogalsky et al.[10] and Joshi et al. [11] to solve diverse mechanical engineering problems. The particle swarm optimization (PSO) [12] models the birds flocking and fish schooling. Yoshida et al. [13] reported difficulty to stabilizing exploration and exploitation capabilities in PSO algorithm, to solve different scientific problems. Dan Simon [14] proposed biogeography-based optimization (BBO) algorithm, based on the geographical distribution of biological species. Moreover, Karaboga et al. [15, 16] proposed an artificial bee colony (ABC) algorithm based on the foraging behavior of honey bees for numerical optimization problems, while Rashedi et al. [17] proposed gravitational search algorithm (GSA) based on the Newtonian concept of gravity and motion. The work is followed by Kumar et al. [18] to solve various engineering problems with fast converging and less computational cost. Meanwhile, Zhang et al. [19] reported a problem to stuck in local optima and low precision in this approach. For the same, Mittal et al. [20, 21] proposed a new chaotic kbest gravitational search algorithm. Henry gas solubility optimization (HGSO) method is inspired by a physical process called Henry’s law. To reveal the solution of complex engineering problem, HGSO has been evaluated over 47 benchmark functions of different modularities and CEC’17 test suite. However, algorithm achieves good ratio between exploration and exploitation phases to achieve better convergence rate and low computational cost. While in the absence of chaotic maps and multi-objective capabilities, HGSO has some challenges in real-scale optimization problems [3]. To mitigate these issues, a new variant of this algorithm has been introduced in this paper by using the chaotic behavior of an algorithm and termed as chaotic Henry gas solubility optimization (CHGSO) algorithm. The proposed variant shows better convergence behavior and global search ability. The rest of various details of the paper are organized in five
Chaotic Henry Gas Solubility Optimization Algorithm
249
sections. The basic HGSO algorithm has been presented in Sect. 2, while the proposed CHGSO has been explained in Sect. 3. Section 4 presents and analyzes the experimental results followed by the conclusion in Sect. 5.
2 Henry Gas Solubility Optimization Algorithm Fatma et al. [3, 22] have proposed a HGSO algorithm which mimics the physical process called Henry’s law of gases. According to this, “Solubility of the gases in a liquid is directly proportional to the partial pressure of respective gases, where temperature, type and volume of liquid are unchanged” [23]. Mathematically, this algorithm has the following steps: Step 1 Initialization of gas population: Position of gas population (N) is defined in Eq. (1): (1) X i (t + 1) = X min + r × (X max − X min ) where X i is ith gas particle position and r is random number in between [0,1]. X min and X max depict the bounds of the search space. t is an iterator. Each gas particle (i) in jth cluster has Henry’s constant H j (t), partial pressure Pi, j , and ∇sol E/R constant value of type j(C j ) which are initialized by Eqs. (2) to (4), respectively. H j (t) = l1 × rand(0, 1)
(2)
Pi, j (t) = l2 × rand(0, 1)
(3)
C j (t) = l3 × rand(0, 1)
(4)
where l1 = 5E − 02, l2 = 100, l3 = 1E − 02. Step 2 Clustering according to gas types: Divide the population into k clusters in which each cluster has a different type of gas. Each gas type has same value of Henry’s constant, i.e., H j . Step 3 Fitness evaluation with ranking: The ith gas agent is evaluated in jth cluster according to considered objective function, while each gas agent is ranked with respect to fitness value for identifying the best gas agent in the population of each cluster. Step 4 Renew Henry’s coefficient: Update the Henry’s coefficient H j , after each iteration for cluster j using Eq. (5). H j (t + 1) = H j (t + 1) × exp(−C j × (1/T (t) − 1/T θ )) T (t) = exp(−t/tmax )
(5)
250
N. Kishor Yadav and M. Saraswat
Here, constant temperature T θ is298.15, T is the temperature, and tmax is the maximum number of iterations. Step 5 Renew solubility: The solubility (Si, j ) of each gas agent (i) is renewed in jth cluster for each iteration (t) and is calculated by using Eq. (6). Si, j (t) = K × H j (t + 1) × Pi, j (t)
(6)
where Pi, j is the partial pressure and K is the constant taken by user input with default value 1. Step 6 Renew position: After renewing the attributes of each particle (i) in jth cluster, the position X i. j is updated in (t + 1) iteration using Eq. (7): X i. j (t + 1) = X i, j (t) + F × r1 × γ × (X j,best (t) − X i, j (t)) +F × r2 × α × (Si, j (t) × X best (t) − X i, j (t))
(t)+ε γ = β × exp − FFbest i, j (t)+ε ε = 0.05
(7)
here, F denotes flag to regulate search direction and also give search diversity. r1 and r2 are random constants (0